> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Upload Datasets

> Upload datasets to HoneyHive through the web UI or Python SDK. Import JSON, JSONL, and CSV files with inputs, ground truth, and metadata for evals.

Upload datasets to HoneyHive through the web UI or programmatically via the SDK.

If your dataset is managed outside HoneyHive (S3, Google Sheets, internal tools) and you want to keep it synced over time, see [Sync datasets from external sources](/v2/datasets/sync).

## Upload via UI

HoneyHive supports `JSON`, `JSONL`, and `CSV` file uploads.

### Supported Formats

<CodeGroup>
  ```json JSON theme={null}
  [
      {"user_query": "What's the history of AI?", "response": "The history of AI is a long one."},
      {"user_query": "What is AI?", "response": "AI is the simulation of human intelligence in machines."}
  ]
  ```

  ```json JSONL theme={null}
  {"user_query": "What's the history of AI?", "response": "The history of AI is a long one."}
  {"user_query": "What is AI?", "response": "AI is the simulation of human intelligence in machines."}
  ```

  ```csv CSV theme={null}
  user_query,response
  What's the history of AI?,The history of AI is a long one.
  What is AI?,AI is the simulation of human intelligence in machines.
  ```
</CodeGroup>

### Steps

<Steps>
  <Step title="Navigate to Datasets">
    Go to your project in HoneyHive and click **Datasets** in the sidebar.
  </Step>

  <Step title="Create new dataset">
    Click **New Dataset** and give it a name.
  </Step>

  <Step title="Upload your file">
    Click **Upload File** and select your file. Map your fields to input, ground truth, or metadata categories.
  </Step>
</Steps>

***

## Upload via SDK

Use the SDK to programmatically create datasets and add datapoints with field mappings.

### Prerequisites

* [HoneyHive API key](/v2/introduction/tracing-quickstart)
* An existing [project](/v2/workspace/projects)

### Create Dataset and Add Datapoints

```python theme={null}
import os
from honeyhive import HoneyHive
from honeyhive.models import (
    CreateDatasetRequest,
    AddDatapointsToDatasetRequest,
    DatapointMapping,
)

client = HoneyHive(api_key=os.environ["HH_API_KEY"])

# Step 1: Create an empty dataset
dataset = client.datasets.create(CreateDatasetRequest(
    name="My Q&A Dataset",
    description="Questions and answers for evaluation",
))
dataset_id = dataset.result.insertedId

# Step 2: Add datapoints with field mapping
response = client.datasets.add_datapoints(
    dataset_id,
    AddDatapointsToDatasetRequest(
        data=[
            {"question": "How do I make tables?", "answer": "Use the Table component"},
            {"question": "How do I make modals?", "answer": "Use the Modal component"},
            {"question": "How do I make forms?", "answer": "Use the Form component"},
        ],
        mapping=DatapointMapping(
            inputs=["question"],
            ground_truth=["answer"],
        )
    )
)

print(f"Created dataset {dataset_id} with {len(response.datapoint_ids)} datapoints")
```

### Field Mapping

`DatapointMapping` controls how your raw data fields are categorized:

| Mapping field  | Type       | Description                                                     |
| -------------- | ---------- | --------------------------------------------------------------- |
| `inputs`       | List\[str] | Field names mapped as inputs to your function during evaluation |
| `ground_truth` | List\[str] | Field names mapped as expected outputs for evaluators           |
| `history`      | List\[str] | Field names mapped as chat history for conversational use cases |

All mapping fields are optional and default to `None`. Any data fields not listed in the mapping are automatically stored as `metadata`.

```python theme={null}
# Example: multiple input fields
mapping = DatapointMapping(
    inputs=["context", "question"],   # Both become part of inputs
    ground_truth=["answer"],          # Mapped to ground_truth
)

# Row:
# {"context": "...", "question": "...", "answer": "...", "source": "wiki"}
#
# Becomes datapoint:
# {
#   "inputs":       {"context": "...", "question": "..."},
#   "ground_truth": {"answer": "..."},
#   "metadata":     {"source": "wiki"},
# }
```

Mapping keys must match keys in your `data` rows exactly (case-sensitive).

<Note>
  Fields in `inputs` and `ground_truth` are available to your function and evaluators. Everything else is stored as metadata.
</Note>

***

## Manage Datasets via SDK

After creating a dataset, use the SDK to find, extend, and prune it programmatically.

### Find a dataset by name

```python theme={null}
import os
from honeyhive import HoneyHive

client = HoneyHive(api_key=os.environ["HH_API_KEY"])

# Filter by exact name
datasets = client.datasets.list(name="My Q&A Dataset")
dataset_id = datasets.datasets[0].id
```

### Add datapoints to an existing dataset

```python theme={null}
from honeyhive.models import AddDatapointsToDatasetRequest

client.datasets.add_datapoints(
    dataset_id=dataset_id,
    request=AddDatapointsToDatasetRequest(
        data=[
            {"inputs": {"question": "How do I add charts?"}, "ground_truth": {"answer": "Use the Chart component"}},
        ],
        mapping={
            "inputs": ["question"],
            "ground_truth": ["answer"],
        },
    ),
)
```

### Remove a datapoint

```python theme={null}
client.datasets.remove_datapoint(
    dataset_id=dataset_id,
    datapoint_id="<datapoint-id>",
)
```

<Note>
  All three methods have async variants: `list_async()`, `add_datapoints_async()`, and `remove_datapoint_async()`.
</Note>

***

## Next Steps

<CardGroup cols={3}>
  <Card title="Run with HoneyHive Datasets" icon="database" href="/v2/datasets/run-experiments">
    Pass dataset\_id to evaluate() against HoneyHive datasets
  </Card>

  <Card title="Curate from Traces" icon="filter" href="/v2/datasets/dataset-curation">
    Build datasets from production logs
  </Card>

  <Card title="Sync from External Sources" icon="arrows-rotate" href="/v2/datasets/sync">
    Keep a dataset synced from S3 or Google Sheets
  </Card>
</CardGroup>