Since HoneyHive’s datasets don’t follow a fixed schema format, we have an automatic integration with HuggingFace datasets (or any kind of dataset management tool) to import datasets into HoneyHive.

Upload a dataset through the SDK

On a high level, all we need to do is

  • define our mapping of inputs-outputs
  • importing batch size to setup the integration.
We recommend importing the data in batches of 100 rows at a time.

Prerequisites

  • You have already created a project in HoneyHive, as explained here.
  • You have an API key for your project, as explained here.

Expected time: few minutes

1

Installation

To install our SDK, run the following commands in the shell.

2

Authentication & Imports

To authenticate your SDK, you need to pass your API key.

3

Create the HoneyHive dataset

Give your new dataset a name and pass the project name to which you want to associate the dataset.

Keep the generated dataset_id handy for future reference.

4

Pass your data in batches with a mapping

Now, using the dataset_id, you can pass your data list and provide a mapping to the fields.

We’ll create unique datapoints for each entry in the JSON list. The datapoint_id on those entries will be used for joining traces in experiment runs in the future.

Any field not defined in the mapping is set on the metadata of the datapoint.

You have successfully uploaded your HuggingFace dataset to HoneyHive using the SDK.

You can now view your dataset in the HoneyHive UI.

Next steps