A dataset in HoneyHive is a collection of arbitrary inputs & outputs (along with metadata). Our extensible dataset model allows you to store any kind of data in a structured way. This allows you to create custom datasets for each part of your pipeline, whether for fine-tuning or experimentation.

Upload a dataset

We currently support JSON file uploads in HoneyHive.

The file must contain a list of JSON objects on each line.

We don’t support the JSONL format directly, so you’ll have to drop the square brackets at the start & end before uploading

Here’s an example file that you can upload:

{ "user_query": "What's the history of AI?", "response": "The history of AI is a long one." }
{ "user_query": "What is AI?", "response": "AI is the simulation of human intelligence in machines." }
{ "user_query": "What is the future of AI?", "response": "The future of AI is bright." }
{ "user_query": "How can I build AI?", "response": "You can build AI by learning the basics of programming." }
{ "user_query": "How does AI work?", "response": "AI works by learning from data." }

Expected time: few minutes



Create a file with your JSON data

We will use a file called AI_bot_queries.json with the content as shown above.

You can name it something else if you want, but make sure it’s a .json file.


Upload & view your dataset

Follow the steps after to upload & view your dataset:

Curate a dataset from traces

You can curate datasets for your overall session, completions or any particular step of your pipeline.

In the following example, we will do so for the overall session. You can simply add a filter for event_name or go to the Completions tab to curate model requests.

Expected time: few minutes