> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Import from Hugging Face

> Import datasets from Hugging Face Datasets to HoneyHive

HoneyHive supports flexible dataset schemas, making it easy to import datasets from Hugging Face or any other data source.

## Prerequisites

* [HoneyHive API key](/v2/introduction/tracing-quickstart)
* An existing [project](/v2/workspace/projects)
* `datasets` library installed: `pip install datasets`

***

## Import a Dataset

<Steps>
  <Step title="Install dependencies">
    ```bash theme={null}
    pip install honeyhive datasets
    ```
  </Step>

  <Step title="Load and import">
    ```python Python theme={null}
    import os
    from datasets import load_dataset
    from honeyhive import HoneyHive
    from honeyhive.models import (
        CreateDatasetRequest,
        AddDatapointsToDatasetRequest,
        DatapointMapping,
    )

    client = HoneyHive(api_key=os.environ["HH_API_KEY"])

    # Load HuggingFace dataset
    hf_dataset = load_dataset("lhoestq/demo1", split="train[:100]")

    # Step 1: Create an empty dataset
    dataset = client.datasets.create(CreateDatasetRequest(
        name="HuggingFace Demo Dataset",
        description="Imported from lhoestq/demo1",
    ))
    dataset_id = dataset.result.insertedId

    # Step 2: Add datapoints in batches with field mapping
    batch_size = 100
    total = len(hf_dataset)

    for i in range(0, total, batch_size):
        batch = hf_dataset[i:i + batch_size]
        rows = [
            {"review": review, "star": star}
            for review, star in zip(batch["review"], batch["star"])
        ]

        client.datasets.add_datapoints(
            dataset_id,
            AddDatapointsToDatasetRequest(
                data=rows,
                mapping=DatapointMapping(
                    inputs=["review"],
                    ground_truth=["star"],
                )
            )
        )
        print(f"Imported {min(i + batch_size, total)}/{total} datapoints")

    print(f"Created dataset with {total} datapoints")
    ```
  </Step>
</Steps>

***

## Field Mapping

Use `DatapointMapping` to map HuggingFace dataset columns to HoneyHive datapoint fields:

| HuggingFace          | DatapointMapping field | Use For                         |
| -------------------- | ---------------------- | ------------------------------- |
| Input columns        | `inputs`               | Data fed to your function       |
| Label/answer columns | `ground_truth`         | Expected outputs for evaluation |
| Chat history columns | `history`              | Conversational context          |

Any columns not listed in the mapping are automatically stored as `metadata`.

### Example: Q\&A Dataset

```python Python theme={null}
# For a Q&A dataset with "question", "context", and "answers" columns
hf_dataset = load_dataset("squad", split="train[:100]")

# Step 1: Create an empty dataset
dataset = client.datasets.create(CreateDatasetRequest(
    name="SQuAD Q&A Dataset",
    description="Imported from squad",
))

# Step 2: Flatten answers and add datapoints
rows = [
    {
        "question": row["question"],
        "context": row["context"],
        "answer": row["answers"]["text"][0],
        "source": "squad",
    }
    for row in hf_dataset
]

client.datasets.add_datapoints(
    dataset.result.insertedId,
    AddDatapointsToDatasetRequest(
        data=rows,
        mapping=DatapointMapping(
            inputs=["question", "context"],
            ground_truth=["answer"],
        )
    )
)
# "source" is automatically stored as metadata
```

### Example: Classification Dataset

```python Python theme={null}
# For a classification dataset with "text" and "label" columns
hf_dataset = load_dataset("imdb", split="test[:100]")

# Step 1: Create an empty dataset
dataset = client.datasets.create(CreateDatasetRequest(
    name="IMDB Classification Dataset",
    description="Imported from imdb",
))

# Step 2: Add datapoints
rows = [
    {
        "text": row["text"],
        "label": "positive" if row["label"] == 1 else "negative",
    }
    for row in hf_dataset
]

client.datasets.add_datapoints(
    dataset.result.insertedId,
    AddDatapointsToDatasetRequest(
        data=rows,
        mapping=DatapointMapping(
            inputs=["text"],
            ground_truth=["label"],
        )
    )
)
```

***

## Best Practices

<Tip>
  **Batch imports**: For large datasets (1000+ rows), use the batching pattern from the main example above — split your data into chunks of 100 rows per `add_datapoints` call to avoid timeouts.
</Tip>

| Recommendation      | Reason                                                            |
| ------------------- | ----------------------------------------------------------------- |
| Start with a subset | Test your mapping with 100 rows before importing the full dataset |
| Add metadata        | Include source information for traceability                       |
| Validate fields     | Check that your field mapping produces valid datapoints           |

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Run Experiments" icon="flask" href="/v2/introduction/experiments-quickstart">
    Evaluate your application using the imported dataset
  </Card>

  <Card title="Export Datasets" icon="download" href="/v2/datasets/export">
    Export datasets for external use
  </Card>
</CardGroup>
