Since HoneyHive’s datasets don’t follow a fixed schema format, we have an automatic integration with HuggingFace datasets (or any kind of dataset management tool) to import datasets into HoneyHive.

Upload a dataset through the SDK

On a high level, all we need to do is

  • define our mapping of inputs-outputs
  • importing batch size to setup the integration.
We recommend importing the data in batches of 100 rows at a time.

Prerequisites

  • You have already created a project in HoneyHive, as explained here.
  • You have an API key for your project, as explained here.

Expected time: few minutes

1

Installation

To install our SDK, run the following commands in the shell.

pip install honeyhive datasets
2

Authentication & Imports

To authenticate your SDK, you need to pass your API key.

import honeyhive
from honeyhive.models import components, operations
from datasets import load_dataset

hhai = honeyhive.HoneyHive(bearer_auth='YOUR_API_KEY')
3

Create the HoneyHive dataset

Give your new dataset a name and pass the project name to which you want to associate the dataset.

Keep the generated dataset_id handy for future reference.

eval_dataset = hhai.datasets.create_dataset(request=components.CreateDatasetRequest(
  project='YOUR_PROJECT_NAME',
  name='DATASET_NAME',
))

dataset_id = eval_dataset.object.result.inserted_id
4

Pass your data in batches with a mapping

Now, using the dataset_id, you can pass your data list and provide a mapping to the fields.

We’ll create unique datapoints for each entry in the JSON list. The datapoint_id on those entries will be used for joining traces in experiment runs in the future.

Any field not defined in the mapping is set on the metadata of the datapoint.
dataset = load_dataset("lhoestq/demo1") 
dataset = list(dataset['train'])  # turn the dataset into a list of dictionaries
datapoint_ids = []

for i in range(0, len(dataset), 100):
    dataset_request = operations.AddDatapointsRequestBody(
        project = 'YOUR_PROJECT_NAME',
        data = dataset[i:i+100], # list of dictionaries
        mapping = operations.Mapping(
            inputs=[
              'review', # input fields
            ],
            ground_truth=[],
            history=[]
        ),
    )

    datapoints = hhai.datasets.add_datapoints(
        dataset_id = dataset_id, # dataset_id from the previous step
        request_body = dataset_request
    )

    datapoint_ids.append(datapoints.object.datapoint_ids)

You have successfully uploaded your HuggingFace dataset to HoneyHive using the SDK.

You can now view your dataset in the HoneyHive UI.

Next steps