> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Experiments Quickstart

> Run your first experiment with HoneyHive in 5 minutes

## What You'll Learn

By the end of this tutorial, you will learn how to:

* Run an experiment with `evaluate()` on a test dataset
* Score outputs automatically with a custom evaluator
* View results in the HoneyHive dashboard

**Time:** \~5 minutes

***

## Step 1: Setup

Install dependencies and configure your environment:

```bash theme={null}
pip install honeyhive openai
```

Go to [**Settings > Project > API Keys**](https://app.us.honeyhive.ai/settings/project/keys) and click **Create API Key**. Copy the key from the modal - it will only be shown once.

Set your environment variables:

```bash theme={null}
export HH_API_KEY="your-honeyhive-api-key"
export HH_PROJECT="my-project"
export OPENAI_API_KEY="your-openai-api-key"
```

```python theme={null}
from openai import OpenAI
from honeyhive import evaluate

client = OpenAI()
```

<Note>
  If you have existing code with `HoneyHiveTracer.init()`, you don't need it here - `evaluate()` handles tracing automatically.
</Note>

***

## Step 2: Define Your Function

Write the function you want to evaluate. Here we'll build an intent classifier:

```python theme={null}
def classify_intent(datapoint):
    text = datapoint["inputs"]["text"]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"""Classify this customer support message into ONE category:
- billing: payment issues, invoices, charges, refunds
- technical: bugs, errors, how to use features
- account: login, password, profile, settings
- general: other questions, feedback

Reply with ONLY the category name.

Message: {text}
Category:"""}],
        temperature=0
    )
    return {"intent": response.choices[0].message.content.strip().lower()}
```

***

## Step 3: Create Your Dataset

Define test cases with inputs and expected outputs:

```python theme={null}
dataset = [
    {
        "inputs": {"text": "I was charged twice for my subscription this month."},
        "ground_truth": {"intent": "billing"}
    },
    {
        "inputs": {"text": "The export button isn't working. Getting error code 500."},
        "ground_truth": {"intent": "technical"}
    },
    {
        "inputs": {"text": "I forgot my password and the reset email never arrived."},
        "ground_truth": {"intent": "account"}
    },
    {
        "inputs": {"text": "Just wanted to say your support team was amazing. Thanks!"},
        "ground_truth": {"intent": "general"}
    },
]
```

***

## Step 4: Create an Evaluator

Evaluators score your function's outputs against ground truth:

```python theme={null}
def intent_match(outputs, inputs, ground_truth):
    """Check if the classified intent matches expected."""
    actual = outputs.get("intent", "").lower()
    expected = ground_truth.get("intent", "").lower()
    return 1.0 if actual == expected else 0.0
```

**Evaluator signature:** `(outputs, inputs, ground_truth)`. Returns a score (typically 0.0 to 1.0).

***

## Step 5: Run Your Experiment

Run the experiment with `evaluate()`:

```python theme={null}
result = evaluate(
    function=classify_intent,
    dataset=dataset,
    evaluators=[intent_match],
    name="intent-classifier-v1"
)
```

You'll see a results table printed to the console with scores for each datapoint.

***

## Step 6: View Results in Dashboard

Go to [app.us.honeyhive.ai](https://app.us.honeyhive.ai) and open **Experiments** to see your run, scores, and individual traces.

<Frame>
  <img src="https://mintcdn.com/honeyhiveai/8CSzfyX-NUZzkr98/images/eval-structured-prompt.png?fit=max&auto=format&n=8CSzfyX-NUZzkr98&q=85&s=ecfeb94b4fec524ec93670315e5765d9" alt="Experiment results showing intent classification accuracy" width="1736" height="644" data-path="images/eval-structured-prompt.png" />
</Frame>

***

<Accordion title="Complete Code">
  First, set your environment variables:

  ```bash theme={null}
  export HH_API_KEY="your-honeyhive-api-key"
  export HH_PROJECT="my-project"
  export OPENAI_API_KEY="your-openai-api-key"
  ```

  ```python theme={null}
  from openai import OpenAI
  from honeyhive import evaluate

  client = OpenAI()

  def classify_intent(datapoint):
      text = datapoint["inputs"]["text"]
      response = client.chat.completions.create(
          model="gpt-4o-mini",
          messages=[{"role": "user", "content": f"""Classify this customer support message into ONE category:
  - billing: payment issues, invoices, charges, refunds
  - technical: bugs, errors, how to use features
  - account: login, password, profile, settings
  - general: other questions, feedback

  Reply with ONLY the category name.

  Message: {text}
  Category:"""}],
          temperature=0
      )
      return {"intent": response.choices[0].message.content.strip().lower()}

  dataset = [
      {"inputs": {"text": "I was charged twice for my subscription this month."}, "ground_truth": {"intent": "billing"}},
      {"inputs": {"text": "The export button isn't working. Getting error code 500."}, "ground_truth": {"intent": "technical"}},
      {"inputs": {"text": "I forgot my password and the reset email never arrived."}, "ground_truth": {"intent": "account"}},
      {"inputs": {"text": "Just wanted to say your support team was amazing. Thanks!"}, "ground_truth": {"intent": "general"}},
  ]

  def intent_match(outputs, inputs, ground_truth):
      return 1.0 if outputs.get("intent", "").lower() == ground_truth.get("intent", "").lower() else 0.0

  result = evaluate(
      function=classify_intent,
      dataset=dataset,
      evaluators=[intent_match],
      name="intent-classifier-v1"
  )
  ```
</Accordion>

***

## What You Learned

* **Define a function** that receives a datapoint and returns outputs
* **Create a dataset** with inputs and ground truths
* **Write an evaluator** that scores outputs automatically
* **Run an experiment** with `evaluate()` and view results in the dashboard

***

## What's Next?

<CardGroup cols={2}>
  <Card title="Compare Experiments" icon="code-compare" href="/v2/evaluation/comparing_evals">
    Run a second experiment with a different prompt and compare results side-by-side
  </Card>

  <Card title="Evaluator Types" icon="robot" href="/v2/evaluators/introduction">
    Code evaluators, LLM-as-judge, and human review
  </Card>

  <Card title="Managed Datasets" icon="database" href="/v2/datasets/introduction">
    Version and manage datasets in HoneyHive
  </Card>

  <Card title="Server-Side Evaluators" icon="cloud" href="/v2/evaluators/python">
    Run evaluators on HoneyHive infrastructure
  </Card>
</CardGroup>
