> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Composite Evaluators

> Combine multiple evaluators into a single aggregated score

Composite evaluators aggregate results from multiple Python, LLM, and Human evaluators into a single score. Use them to create holistic quality metrics that combine different evaluation criteria.

**When to use composite evaluators:**

* Combining multiple quality dimensions into one score
* Creating weighted quality indexes (e.g., accuracy + helpfulness + safety)
* Building hierarchical pass/fail criteria (must pass A before B matters)
* Tracking worst-case or best-case performance across evaluators

## Creating a Composite Evaluator

1. Navigate to the [**Evaluators**](https://app.us.honeyhive.ai/metrics) tab
2. Click **Add Evaluator** and select **Composite Evaluator**
3. Configure the aggregate function and select child evaluators

<Note>
  **Child evaluators**: Only evaluators with numeric, boolean, or categorical return types can be added. String evaluators and other composites are excluded.

  **Composite return type**: Composites can only return **Numeric** or **Boolean**. When set to Boolean, Weighted Average and Weighted Sum are disabled.
</Note>

## Configuration

### Event Filters

Filter which events this composite evaluates using event type, event name, and additional property filters. The composite only aggregates child evaluator results from matching events. See [Event Filters](/v2/evaluators/llm#event-filters) for the full list of supported filter options and operators.

### Aggregate Function

| Function                      | Use Case                      | Ignores Weights       |
| ----------------------------- | ----------------------------- | --------------------- |
| **Weighted Average**          | Balanced overall score        | No                    |
| **Weighted Sum**              | Cumulative importance         | No                    |
| **Hierarchical Highest True** | Sequential pass/fail criteria | No (uses as priority) |
| **Minimum**                   | Worst-case performance        | Yes                   |
| **Maximum**                   | Best-case performance         | Yes                   |

### Child Evaluators

Select evaluators to include and set their weights. Browse by type: Python, LLM, or Human.

## Aggregate Functions

### Weighted Average

Calculates `Σ(score × weight) / Σ(weights)`.

| Evaluator  | Weight | Score | Contribution           |
| ---------- | ------ | ----- | ---------------------- |
| Accuracy   | 2      | 4     | 8                      |
| Clarity    | 1      | 3     | 3                      |
| **Result** |        |       | **(8 + 3) / 3 = 3.67** |

### Weighted Sum

Calculates `Σ(score × weight)`.

| Evaluator  | Weight | Score | Contribution   |
| ---------- | ------ | ----- | -------------- |
| Accuracy   | 2      | 4     | 8              |
| Clarity    | 1      | 3     | 3              |
| **Result** |        |       | **8 + 3 = 11** |

### Hierarchical Highest True

For boolean evaluators only. Returns the priority level of the highest consecutive true result, starting from priority 1. Useful for tiered pass/fail criteria where earlier checks must pass before later ones matter.

| Evaluator          | Priority (Weight) | Result  |
| ------------------ | ----------------- | ------- |
| No PII             | 1                 | ✓ True  |
| Factually Correct  | 2                 | ✓ True  |
| Follows Guidelines | 3                 | ✗ False |
| Has Citations      | 4                 | ✓ True  |

**Result: 2** (Priorities 1-2 passed consecutively, priority 3 failed, so the chain breaks at 2)

<Tip>Use for tiered quality gates: basic safety checks at priority 1, correctness at 2, style at 3. The score tells you how far the response got before failing.</Tip>

### Minimum / Maximum

Returns the lowest or highest score among all child evaluators. Weights are ignored.

## Related

<CardGroup cols={2}>
  <Card title="Python Evaluators" icon="python" href="/v2/evaluators/python">
    Create code-based evaluators
  </Card>

  <Card title="LLM Evaluators" icon="sparkles" href="/v2/evaluators/llm">
    Use AI for qualitative assessment
  </Card>

  <Card title="Human Evaluators" icon="user" href="/v2/evaluators/human">
    Enable expert review workflows
  </Card>

  <Card title="Evaluators Introduction" icon="flask" href="/v2/evaluators/introduction">
    Overview of all evaluator types
  </Card>
</CardGroup>
