HoneyHive Docs

Upload datasets to HoneyHive through the web UI or programmatically via the SDK. If your dataset is managed outside HoneyHive (S3, Google Sheets, internal tools) and you want to keep it synced over time, see Sync datasets from external sources.

Upload via UI

HoneyHive supports JSON, JSONL, and CSV file uploads.

Supported Formats

[
    {"user_query": "What's the history of AI?", "response": "The history of AI is a long one."},
    {"user_query": "What is AI?", "response": "AI is the simulation of human intelligence in machines."}
]

Steps

Navigate to Datasets

Go to your project in HoneyHive and click Datasets in the sidebar.

Create new dataset

Click New Dataset and give it a name.

Upload your file

Click Upload File and select your file. Map your fields to input, ground truth, or metadata categories.

Upload via SDK

Use the SDK to programmatically create datasets and add datapoints with field mappings.

Prerequisites

HoneyHive API key
An existing project

Create Dataset and Add Datapoints

import os
from honeyhive import HoneyHive
from honeyhive.models import (
    CreateDatasetRequest,
    AddDatapointsToDatasetRequest,
    DatapointMapping,
)

client = HoneyHive(api_key=os.environ["HH_API_KEY"])

# Step 1: Create an empty dataset
dataset = client.datasets.create(CreateDatasetRequest(
    name="My Q&A Dataset",
    description="Questions and answers for evaluation",
))
dataset_id = dataset.result.insertedId

# Step 2: Add datapoints with field mapping
response = client.datasets.add_datapoints(
    dataset_id,
    AddDatapointsToDatasetRequest(
        data=[
            {"question": "How do I make tables?", "answer": "Use the Table component"},
            {"question": "How do I make modals?", "answer": "Use the Modal component"},
            {"question": "How do I make forms?", "answer": "Use the Form component"},
        ],
        mapping=DatapointMapping(
            inputs=["question"],
            ground_truth=["answer"],
        )
    )
)

print(f"Created dataset {dataset_id} with {len(response.datapoint_ids)} datapoints")

Field Mapping

DatapointMapping controls how your raw data fields are categorized:

Mapping field	Type	Description
`inputs`	List[str]	Field names mapped as inputs to your function during evaluation
`ground_truth`	List[str]	Field names mapped as expected outputs for evaluators
`history`	List[str]	Field names mapped as chat history for conversational use cases

All mapping fields are optional and default to None. Any data fields not listed in the mapping are automatically stored as metadata.

# Example: multiple input fields
mapping = DatapointMapping(
    inputs=["context", "question"],   # Both become part of inputs
    ground_truth=["answer"],          # Mapped to ground_truth
)

# Row:
# {"context": "...", "question": "...", "answer": "...", "source": "wiki"}
#
# Becomes datapoint:
# {
#   "inputs":       {"context": "...", "question": "..."},
#   "ground_truth": {"answer": "..."},
#   "metadata":     {"source": "wiki"},
# }

Mapping keys must match keys in your data rows exactly (case-sensitive).

Fields in inputs and ground_truth are available to your function and evaluators. Everything else is stored as metadata.

Manage Datasets via SDK

After creating a dataset, use the SDK to find, extend, and prune it programmatically.

Find a dataset by name

import os
from honeyhive import HoneyHive

client = HoneyHive(api_key=os.environ["HH_API_KEY"])

# Filter by exact name
datasets = client.datasets.list(name="My Q&A Dataset")
dataset_id = datasets.datasets[0].id

Add datapoints to an existing dataset

from honeyhive.models import AddDatapointsToDatasetRequest

client.datasets.add_datapoints(
    dataset_id=dataset_id,
    request=AddDatapointsToDatasetRequest(
        data=[
            {"inputs": {"question": "How do I add charts?"}, "ground_truth": {"answer": "Use the Chart component"}},
        ],
        mapping={
            "inputs": ["question"],
            "ground_truth": ["answer"],
        },
    ),
)

Remove a datapoint

client.datasets.remove_datapoint(
    dataset_id=dataset_id,
    datapoint_id="<datapoint-id>",
)

All three methods have async variants: list_async(), add_datapoints_async(), and remove_datapoint_async().

Next Steps

Run Experiments

Use your dataset to evaluate your AI application

Curate from Traces

Build datasets from production logs

Sync from External Sources

Keep a dataset synced from S3 or Google Sheets

Documentation Index

​Upload via UI

​Supported Formats

​Steps

​Upload via SDK

​Prerequisites

​Create Dataset and Add Datapoints

​Field Mapping

​Manage Datasets via SDK

​Find a dataset by name

​Add datapoints to an existing dataset

​Remove a datapoint

​Next Steps

Run Experiments

Curate from Traces

Sync from External Sources

Upload via UI

Supported Formats

Steps

Upload via SDK

Prerequisites

Create Dataset and Add Datapoints

Field Mapping

Manage Datasets via SDK

Find a dataset by name

Add datapoints to an existing dataset

Remove a datapoint

Next Steps