> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Online Evaluations

> Run evaluators automatically on ingested traces to continuously monitor quality.

Online evaluations run your [evaluators](/v2/evaluators/introduction) automatically on ingested traces. This gives you continuous quality scores alongside your cost and latency metrics, without adding latency to your application.

## How It Works

When you enable an evaluator, HoneyHive runs it asynchronously on incoming traces:

1. Your application sends traces to HoneyHive
2. HoneyHive matches traces against your evaluator's [event filters](#event-filters)
3. Matching events are evaluated (subject to your [sampling rate](#sampling))
4. Results appear as metrics in your [dashboard](/v2/monitoring/charts) and on individual traces

<Note>Online evaluations run on all ingested traces that match your evaluator's event filters, including both production and experiment traces.</Note>

## Enabling Online Evaluation

You can enable online evaluation on any server-side evaluator (Python or LLM):

<Steps>
  <Step title="Go to the Evaluators page">
    Navigate to the [**Evaluators**](https://app.us.honeyhive.ai/metrics) tab in HoneyHive.
  </Step>

  <Step title="Create or select an evaluator">
    Create a new [Python](/v2/evaluators/python) or [LLM](/v2/evaluators/llm) evaluator, or select an existing one. Configure event filters, return type, and your evaluation logic.

    <Frame caption="LLM evaluator configuration with event filters, sampling percentage, and prompt editor">
      <img src="https://mintcdn.com/honeyhiveai/EWG3R5yYrwNnHjQ7/images/product-llm.png?fit=max&auto=format&n=EWG3R5yYrwNnHjQ7&q=85&s=01cc471bc2e45ffa063e5ab7a8fe26b9" alt="HoneyHive LLM evaluator editor showing event filters set to model type, OpenAI gpt-4o provider, evaluation prompt with template syntax, sampling percentage, and return type configuration" width="3024" height="1568" data-path="images/product-llm.png" />
    </Frame>
  </Step>

  <Step title="Enable the evaluator">
    Toggle the **Enabled** switch in the evaluators table. This tells HoneyHive to run this evaluator on all matching traces.
  </Step>

  <Step title="Set a sampling percentage">
    Set the **Sampling percentage** to control what fraction of matching events get evaluated (e.g., 25%). This controls cost for LLM-based evaluators at high volumes.
  </Step>
</Steps>

## Event Filters

Each evaluator has event filters that determine which traces it runs on. You can filter by event type, event name, and any event property from your schema. For example, you might run a hallucination evaluator only on `model` events named `generate_response`, or add a filter like `metadata.environment is production` to limit evaluation to specific contexts.

See [Event Filters](/v2/evaluators/llm#event-filters) for the full list of supported filter options and operators (which vary by field type).

## Sampling

LLM-based evaluators incur model costs for every evaluation. At production scale, use sampling to control spend:

| Volume              | Suggested Sampling | Rationale                                      |
| ------------------- | ------------------ | ---------------------------------------------- |
| \< 1K events/day    | 100%               | Full coverage is affordable                    |
| 1K - 10K events/day | 25 - 50%           | Good signal with moderate cost                 |
| 10K+ events/day     | 5 - 25%            | Statistical significance with controlled spend |

<Tip>Python evaluators are much cheaper to run than LLM evaluators. You can often run Python evaluators at 100% sampling even at high volumes.</Tip>

## Viewing Results

Online evaluation results are available in two places:

* **Dashboard charts**: Select your evaluator as a metric in [Custom Charts](/v2/monitoring/charts) to track quality over time, group by properties, and set up [alerts](/v2/monitoring/alerts/alerts_overview)
* **Individual traces**: Each evaluated trace shows its evaluator scores alongside inputs, outputs, and other metadata

<Frame caption="Dashboard with evaluator metrics like Search Relevance and Agent Execution Quality charted over time">
  <img src="https://mintcdn.com/honeyhiveai/EWG3R5yYrwNnHjQ7/images/product-dashboard.png?fit=max&auto=format&n=EWG3R5yYrwNnHjQ7&q=85&s=a96b23c318597e7650d96ca6151e1892" alt="HoneyHive monitoring dashboard showing charts for session duration, LLM call duration, token usage, and custom evaluator metrics like Search Relevance and Agent Execution Quality" width="3024" height="1566" data-path="images/product-dashboard.png" />
</Frame>

You can also use the Discover view to build custom queries on evaluator scores, filter by source, and drill into individual events.

<Frame caption="Charting a Search Relevance evaluator score in the Discover view, grouped by source">
  <img src="https://mintcdn.com/honeyhiveai/9BxiwYxg7j6yRoey/images/monitoringquery.png?fit=max&auto=format&n=9BxiwYxg7j6yRoey&q=85&s=98598eab2ff48ede9fc4c502a43c99d8" alt="HoneyHive Discover view showing a Search Relevance evaluator metric charted over time for a tool_search_web event, grouped by source" width="3024" height="1556" data-path="images/monitoringquery.png" />
</Frame>

## Choosing Between Client-Side and Server-Side

|                    | Client-Side                              | Server-Side (Online)                  |
| ------------------ | ---------------------------------------- | ------------------------------------- |
| **Runs**           | In your application                      | On HoneyHive after ingestion          |
| **Latency impact** | Adds to request time                     | None                                  |
| **Best for**       | Guardrails, format checks, PII detection | LLM-as-judge, complex quality scoring |
| **Managed in**     | Your code                                | HoneyHive UI                          |

Use [client-side evaluators](/v2/evaluators/client_side) for checks that need to happen during execution (guardrails, blocking unsafe responses). Use online evaluations for quality scoring that can happen asynchronously.

## Troubleshooting

### Evaluator not running on expected events

* **Check event filters**: Verify the evaluator's event type and event name filters match your traces. Filters are AND-ed, so all conditions must match.
* **Check enabled status**: The evaluator must be toggled **Enabled** in the evaluators table.
* **Check sampling**: At low sampling percentages, some matching events are intentionally skipped. Increase sampling to verify the evaluator works, then reduce.
* **Check event properties**: Property-based filters use dot-path matching (e.g. `metadata.environment`). Verify the property exists on your events and the value matches.

### Evaluator was auto-disabled

If an evaluator fails 100+ times within 1 hour, HoneyHive automatically disables it and creates a version snapshot. This prevents a broken evaluator from consuming resources across all your traces.

To recover:

1. Go to the **Evaluators** table and find the disabled evaluator
2. Check the error by running the evaluator manually against a sample event
3. Fix the evaluation logic
4. Re-enable the evaluator

### Results not appearing on traces

* Evaluations run asynchronously after ingestion. There is a short delay before scores appear.
* Check the evaluator's return type matches your expected output (boolean, numeric, or string).

## Next Steps

<CardGroup cols={2}>
  <Card title="Python Evaluators" icon="python" href="/v2/evaluators/python">
    Create code-based evaluators for programmatic checks
  </Card>

  <Card title="LLM Evaluators" icon="robot" href="/v2/evaluators/llm">
    Use LLMs to score quality, relevance, and tone
  </Card>

  <Card title="Custom Charts" icon="chart-simple" href="/v2/monitoring/charts">
    Visualize evaluator scores in dashboards
  </Card>

  <Card title="Alerts" icon="bell" href="/v2/monitoring/alerts/alerts_overview">
    Get notified when quality metrics drop
  </Card>
</CardGroup>
