We support uploading datasets to HoneyHive both through the UI and the SDK.

Upload a dataset through the UI

We currently support JSON, JSONL and CSV file uploads in HoneyHive.

Here’s an example JSONL file that you can upload:

{ "user_query": "What's the history of AI?", "response": "The history of AI is a long one." }
{ "user_query": "What is AI?", "response": "AI is the simulation of human intelligence in machines." }
{ "user_query": "What is the future of AI?", "response": "The future of AI is bright." }
{ "user_query": "How can I build AI?", "response": "You can build AI by learning the basics of programming." }
{ "user_query": "How does AI work?", "response": "AI works by learning from data." }

Here’s an example CSV file that you can upload:

user_query,response
What's the history of AI?,The history of AI is a long one.
What is AI?,AI is the simulation of human intelligence in machines.
What is the future of AI?,The future of AI is bright.
How can I build AI?,You can build AI by learning the basics of programming.
How does AI work?,AI works by learning from data.

In the below tutorial, we will use the JSON file format.

Expected time: few minutes

Steps:

1

Create a file with your JSON data

We will use a file called AI_bot_queries.json with the content as shown above.

2

Upload & view your dataset

Follow the steps after to upload & view your dataset:

Upload a dataset through the SDK

Both our TypeScript and Python SDKs have been designed to ingest completely custom JSON lists.

All you need to do is to define which fields in each row map to inputs, ground truth, conversation history. All other fields are placed in metadata.

Prerequisites

  • You have already created a project in HoneyHive, as explained here.
  • You have an API key for your project, as explained here.

Expected time: few minutes

1

Installation

To install our SDKs, run the following commands in the shell.

2

Authentication & Imports

To authenticate your SDK, you need to pass your API key.

3

Create the dataset object

Give your new dataset a name and pass the project name to which you want to associate the dataset.

Keep the generated dataset_id handy for future reference.

4

Pass your data and provide a mapping

Now, using the dataset_id, you can pass your data list and provide a mapping to the fields.

We’ll create unique datapoints for each entry in the JSON list. The datapoint_id on those entries will be used for joining traces in experiment runs in the future.

Any field not defined in the mapping is set on the metadata of the datapoint.

You have successfully uploaded your dataset to HoneyHive using the SDK.

You can now view your dataset in the HoneyHive UI.

Next steps