HoneyHive Docs

Since HoneyHive’s datasets don’t follow a fixed schema format, we have an automatic integration with HuggingFace datasets (or any kind of dataset management tool) to import datasets into HoneyHive.

Upload a dataset through the SDK

On a high level, all we need to do is

define our mapping of inputs-outputs
importing batch size to setup the integration.

We recommend importing the data in batches of 100 rows at a time.

Prerequisites

You have already created a project in HoneyHive, as explained here.
You have an API key for your project, as explained here.

Expected time: few minutes

Installation

To install our SDK, run the following commands in the shell.

pip install honeyhive datasets

Authentication & Imports

To authenticate your SDK, you need to pass your API key.

import honeyhive
from honeyhive.models import components, operations
from datasets import load_dataset

hhai = honeyhive.HoneyHive(
  bearer_auth='YOUR_API_KEY',
  server_url='HONEYHIVE_SERVER_URL' # Optional / Required for self-hosted or dedicated deployments
)

Create the HoneyHive dataset

Give your new dataset a name and pass the project name to which you want to associate the dataset.Keep the generated dataset_id handy for future reference.

eval_dataset = hhai.datasets.create_dataset(request=components.CreateDatasetRequest(
  project='YOUR_PROJECT_NAME',
  name='DATASET_NAME',
))

dataset_id = eval_dataset.object.result.inserted_id

Pass your data in batches with a mapping

Now, using the dataset_id, you can pass your data list and provide a mapping to the fields.We’ll create unique datapoints for each entry in the JSON list. The datapoint_id on those entries will be used for joining traces in experiment runs in the future.

Any field not defined in the mapping is set on the metadata of the datapoint.

dataset = load_dataset("lhoestq/demo1") 
dataset = list(dataset['train'])  # turn the dataset into a list of dictionaries
datapoint_ids = []

for i in range(0, len(dataset), 100):
    dataset_request = operations.AddDatapointsRequestBody(
        project = 'YOUR_PROJECT_NAME',
        data = dataset[i:i+100], # list of dictionaries
        mapping = operations.Mapping(
            inputs=[
              'review', # input fields
            ],
            ground_truth=[],
            history=[]
        ),
    )

    datapoints = hhai.datasets.add_datapoints(
        dataset_id = dataset_id, # dataset_id from the previous step
        request_body = dataset_request
    )

    datapoint_ids.append(datapoints.object.datapoint_ids)

You have successfully uploaded your HuggingFace dataset to HoneyHive using the SDK. You can now view your dataset in the HoneyHive UI.

Next steps

Running experiments

Learn how to run experiments on your dataset.

Introduction

Guides

Tutorials

Learn more

Import from Hugging Face

Upload a dataset through the SDK

Next steps

Running experiments

Introduction

Guides

Tutorials

Learn more

​Upload a dataset through the SDK

​Next steps

Running experiments

Upload a dataset through the SDK

Next steps