Creating Datasets
How to upload or curate datasets in HoneyHive
A dataset in HoneyHive is a collection of arbitrary inputs
& outputs
(along with metadata
). Our extensible dataset model allows you to store any kind of data in a structured way. This allows you to create custom datasets for each part of your pipeline, whether for fine-tuning or experimentation.
Upload a dataset
We currently support JSON
file uploads in HoneyHive.
The file must contain a list of JSON objects on each line.
JSONL
format directly, so you’ll have to drop the square brackets at the start & end before uploadingHere’s an example file that you can upload:
{ "user_query": "What's the history of AI?", "response": "The history of AI is a long one." }
{ "user_query": "What is AI?", "response": "AI is the simulation of human intelligence in machines." }
{ "user_query": "What is the future of AI?", "response": "The future of AI is bright." }
{ "user_query": "How can I build AI?", "response": "You can build AI by learning the basics of programming." }
{ "user_query": "How does AI work?", "response": "AI works by learning from data." }
Expected time: few minutes
Steps:
Create a file with your JSON data
We will use a file called AI_bot_queries.json
with the content as shown above.
You can name it something else if you want, but make sure it’s a .json
file.
Upload & view your dataset
Follow the steps after to upload & view your dataset:
Curate a dataset from traces
You can curate datasets for your overall session, completions or any particular step of your pipeline.
In the following example, we will do so for the overall session. You can simply add a filter for event_name
or go to the Completions
tab to curate model requests.
Expected time: few minutes
Steps: