Skip to main content
Upload datasets to HoneyHive through the web UI or programmatically via the SDK. If your dataset is managed outside HoneyHive (S3, Google Sheets, internal tools) and you want to keep it synced over time, see Sync datasets from external sources.

Upload via UI

HoneyHive supports JSON, JSONL, and CSV file uploads.

Supported Formats

[
    {"user_query": "What's the history of AI?", "response": "The history of AI is a long one."},
    {"user_query": "What is AI?", "response": "AI is the simulation of human intelligence in machines."}
]

Steps

1

Navigate to Datasets

Go to your project in HoneyHive and click Datasets in the sidebar.
2

Create new dataset

Click New Dataset and give it a name.
3

Upload your file

Click Upload File and select your file. Map your fields to input, ground truth, or metadata categories.

Upload via SDK

Use the SDK to programmatically create datasets with custom field mappings.

Prerequisites

Create Dataset and Add Datapoints

import os
from honeyhive import HoneyHive
from honeyhive.models import CreateDatasetRequest, CreateDatapointRequest

client = HoneyHive(api_key=os.environ["HH_API_KEY"])

# Step 1: Create datapoints
datapoints_data = [
    {"inputs": {"question": "How do I make tables?"}, "ground_truth": {"answer": "Use the Table component"}},
    {"inputs": {"question": "How do I make modals?"}, "ground_truth": {"answer": "Use the Modal component"}},
    {"inputs": {"question": "How do I make forms?"}, "ground_truth": {"answer": "Use the Form component"}},
]

datapoint_ids = []
for dp in datapoints_data:
    response = client.datapoints.create(CreateDatapointRequest(
        inputs=dp["inputs"],
        ground_truth=dp.get("ground_truth"),
    ))
    datapoint_ids.append(response.result["insertedId"])

# Step 2: Create dataset with those datapoints
dataset = client.datasets.create(CreateDatasetRequest(
    name="My Q&A Dataset",
    description="Questions and answers for evaluation",
    datapoints=datapoint_ids,
))

print(f"Created dataset: {dataset.result['insertedId']}")

Field Structure

Each datapoint can have these fields:
FieldTypeDescription
inputsDictData fed into your function during evaluation
ground_truthDictExpected output for evaluators to compare against
historyList[Dict]Chat history for conversational use cases
metadataDictAdditional context (not used in evaluation)
Fields in inputs and ground_truth are available to your function and evaluators. Everything else is stored as metadata.

Next Steps