> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Import from Hugging Face

> Import datasets from Hugging Face Datasets into HoneyHive for experiments and evaluation. Load HF splits via the SDK and map columns to datapoints.

HoneyHive supports flexible dataset schemas, making it easy to import datasets from Hugging Face or any other data source.

## Prerequisites

* [HoneyHive API key](/v2/introduction/tracing-quickstart)
* An existing [project](/v2/workspace/projects)
* `datasets` library installed: `pip install datasets`

***

## Import a Dataset

<Steps>
  <Step title="Install dependencies">
    ```bash theme={null}
    pip install honeyhive datasets
    ```
  </Step>

  <Step title="Load and import">
    ```python Python theme={null}
    import os
    from datasets import load_dataset
    from honeyhive import HoneyHive
    from honeyhive.models import (
        CreateDatasetRequest,
        AddDatapointsToDatasetRequest,
        DatapointMapping,
    )

    client = HoneyHive(api_key=os.environ["HH_API_KEY"])

    # Load HuggingFace dataset
    hf_dataset = load_dataset("lhoestq/demo1", split="train[:100]")

    # Step 1: Create an empty dataset
    dataset = client.datasets.create(CreateDatasetRequest(
        name="HuggingFace Demo Dataset",
        description="Imported from lhoestq/demo1",
    ))
    dataset_id = dataset.result.insertedId

    # Step 2: Add datapoints in batches with field mapping
    batch_size = 100
    total = len(hf_dataset)

    for i in range(0, total, batch_size):
        batch = hf_dataset[i:i + batch_size]
        rows = [
            {"review": review, "star": star}
            for review, star in zip(batch["review"], batch["star"])
        ]

        client.datasets.add_datapoints(
            dataset_id,
            AddDatapointsToDatasetRequest(
                data=rows,
                mapping=DatapointMapping(
                    inputs=["review"],
                    ground_truth=["star"],
                )
            )
        )
        print(f"Imported {min(i + batch_size, total)}/{total} datapoints")

    print(f"Created dataset with {total} datapoints")
    ```
  </Step>
</Steps>

***

## Field Mapping

Use `DatapointMapping` to map HuggingFace dataset columns to HoneyHive datapoint fields:

| HuggingFace          | DatapointMapping field | Use For                         |
| -------------------- | ---------------------- | ------------------------------- |
| Input columns        | `inputs`               | Data fed to your function       |
| Label/answer columns | `ground_truth`         | Expected outputs for evaluation |
| Chat history columns | `history`              | Conversational context          |

Any columns not listed in the mapping are automatically stored as `metadata`.

### Example: Q\&A Dataset

```python Python theme={null}
# For a Q&A dataset with "question", "context", and "answers" columns
hf_dataset = load_dataset("squad", split="train[:100]")

# Step 1: Create an empty dataset
dataset = client.datasets.create(CreateDatasetRequest(
    name="SQuAD Q&A Dataset",
    description="Imported from squad",
))

# Step 2: Flatten answers and add datapoints
rows = [
    {
        "question": row["question"],
        "context": row["context"],
        "answer": row["answers"]["text"][0],
        "source": "squad",
    }
    for row in hf_dataset
]

client.datasets.add_datapoints(
    dataset.result.insertedId,
    AddDatapointsToDatasetRequest(
        data=rows,
        mapping=DatapointMapping(
            inputs=["question", "context"],
            ground_truth=["answer"],
        )
    )
)
# "source" is automatically stored as metadata
```

### Example: Classification Dataset

```python Python theme={null}
# For a classification dataset with "text" and "label" columns
hf_dataset = load_dataset("imdb", split="test[:100]")

# Step 1: Create an empty dataset
dataset = client.datasets.create(CreateDatasetRequest(
    name="IMDB Classification Dataset",
    description="Imported from imdb",
))

# Step 2: Add datapoints
rows = [
    {
        "text": row["text"],
        "label": "positive" if row["label"] == 1 else "negative",
    }
    for row in hf_dataset
]

client.datasets.add_datapoints(
    dataset.result.insertedId,
    AddDatapointsToDatasetRequest(
        data=rows,
        mapping=DatapointMapping(
            inputs=["text"],
            ground_truth=["label"],
        )
    )
)
```

***

## Best Practices

<Tip>
  **Batch imports**: For large datasets (1000+ rows), use the batching pattern from the main example above — split your data into chunks of 100 rows per `add_datapoints` call to avoid timeouts.
</Tip>

| Recommendation      | Reason                                                            |
| ------------------- | ----------------------------------------------------------------- |
| Start with a subset | Test your mapping with 100 rows before importing the full dataset |
| Add metadata        | Include source information for traceability                       |
| Validate fields     | Check that your field mapping produces valid datapoints           |

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Run Experiments" icon="flask" href="/v2/introduction/experiments-quickstart">
    Evaluate your application using the imported dataset
  </Card>

  <Card title="Export Datasets" icon="download" href="/v2/datasets/export">
    Export datasets for external use
  </Card>
</CardGroup>