> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM Evaluators

> Create LLM-powered evaluators to evaluate AI outputs using custom prompts

LLM evaluators use large language models to evaluate the quality of AI-generated responses based on custom criteria. They're ideal for qualitative evaluations like coherence, relevance, faithfulness, and tone.

## Creating an LLM Evaluator

1. Navigate to the [**Evaluators**](https://app.us.honeyhive.ai/metrics) tab in the HoneyHive console.
2. Click `Add Evaluator` and select `LLM Evaluator`.

<Frame>
  <img src="https://mintcdn.com/honeyhiveai/EWG3R5yYrwNnHjQ7/images/product-llm.png?fit=max&auto=format&n=EWG3R5yYrwNnHjQ7&q=85&s=01cc471bc2e45ffa063e5ab7a8fe26b9" alt="LLM evaluator editor showing event filters, AI provider selection (OpenAI/gpt-4o), prompt editor with template syntax, and configuration options" width="3024" height="1568" data-path="images/product-llm.png" />
</Frame>

<Note>LLM evaluators use your configured AI provider. Set up provider keys in [Provider Keys](/v2/workspace/provider-keys) to use models from OpenAI, Anthropic, or other providers.</Note>

## Event Schema

LLM evaluators operate on event objects from your traces. Use `{{ }}` syntax to reference event properties in your prompt.

| Property     | Description                                           | Example                       |
| ------------ | ----------------------------------------------------- | ----------------------------- |
| `event_type` | Type of event: `model`, `tool`, `chain`, or `session` | `{{ event_type }}`            |
| `event_name` | Name of the event or session                          | `{{ event_name }}`            |
| `inputs`     | Input data (prompt, query, context, etc.)             | `{{ inputs.question }}`       |
| `outputs`    | Output data (completion, response, etc.)              | `{{ outputs.content }}`       |
| `feedback`   | User feedback and ground truth                        | `{{ feedback.ground_truth }}` |

<Tip>Click `Show Schema` in the evaluator console to explore all available event properties for your project.</Tip>

<Note>For detailed event schema documentation and tracing setup, see [Configuring Tracing for Server-Side Evaluators](/v2/evaluators/evaluator-templates#configuring-tracing-for-server-side-evaluators).</Note>

## Evaluation Prompt

Define your evaluation prompt using the `{{ }}` syntax to inject event data:

```markdown theme={null}
[Instruction]
Evaluate the AI assistant's answer based on:
1. Relevance to the question
2. Accuracy of information
3. Clarity and coherence
4. Completeness of the answer

Provide a brief explanation and rate the response on a scale of 1 to 5.

[Question]
{{ inputs.question }}

[Context]
{{ inputs.context }}

[AI Assistant's Answer]
{{ outputs.content }}

[Evaluation]
Explanation:
Rating: [[X]]
```

<Tip>Use the `[[X]]` pattern for ratings. The evaluator automatically extracts the value inside the brackets.</Tip>

<Note>Looking for ready-made examples? Check out our [LLM Evaluator Templates](/v2/evaluators/evaluator-templates#llm-evaluator-templates).</Note>

## Configuration

### Return Type

* **Boolean**: For true/false evaluations
* **Numeric**: For scores or ratings (e.g., 1-5)
* **String**: For categorical labels or text responses

### Passing Range

Define the range of scores that indicate a passing evaluation. Useful for CI/CD pipelines and identifying failed test cases.

### Enabled

Toggle to run this evaluator on all traces that match your event filters.

### Sampling Percentage

Run your evaluator on a percentage of matching events to manage costs. New evaluators default to **10%** sampling. Adjust based on event volume and cost budget - for example, set 25% to evaluate one in four matching events.

<Note>Sampling applies to all traces that match your event filters. To evaluate only a subset of events, combine sampling with specific event filters.</Note>

## Event Filters

Use **Set Up Filters** to specify which events trigger this evaluator. Filters are ANDed together - an event must match all filters to be evaluated.

### Preset Filters

Every evaluator includes two preset filters by default:

* **Event Type**: Filter by `model`, `tool`, `chain`, or `session`
* **Event Name**: Target a specific event name, or use "All" (e.g., "All Models") to match any event of that type

### Additional Filters

Click the **+** button to add filters on any event property. You can filter on any field available in your event schema, including nested properties using dot notation (e.g., `inputs.question`, `metadata.model`, `outputs.content`).

Each filter consists of:

* **Field**: Any property from the event schema
* **Operator**: Depends on the field type (see below)
* **Value**: The value to compare against

**Operators by field type:**

| Field Type   | Operators                                                           |
| ------------ | ------------------------------------------------------------------- |
| **String**   | `is`, `is not`, `contains`, `not contains`, `exists`, `not exists`  |
| **Number**   | `is`, `is not`, `greater than`, `less than`, `exists`, `not exists` |
| **Boolean**  | `is`, `exists`, `not exists`                                        |
| **Datetime** | `is`, `is not`, `after`, `before`, `exists`, `not exists`           |

<Tip>Click **Show Schema** in the evaluator editor to browse all available event properties you can filter on.</Tip>

## Next Steps

<CardGroup cols={2}>
  <Card title="Python Evaluators" icon="python" href="/v2/evaluators/python">
    Create code-based evaluators for programmatic checks
  </Card>

  <Card title="Evaluator Templates" icon="copy" href="/v2/evaluators/evaluator-templates">
    Ready-to-use LLM and Python evaluator templates
  </Card>

  <Card title="Run Experiments" icon="flask" href="/v2/introduction/experiments-quickstart">
    Use evaluators in offline experiments
  </Card>

  <Card title="Human Annotation" icon="user" href="/v2/evaluators/human">
    Set up human review workflows
  </Card>
</CardGroup>
