> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Composite Evaluators

> Combine multiple HoneyHive evaluators into one composite score with weighted aggregation. Roll up accuracy, safety, and style checks into one metric.

Composite evaluators aggregate results from multiple Python, LLM, and Human evaluators into a single score. Use them to create holistic quality metrics that combine different evaluation criteria.

**When to use composite evaluators:**

* Combining multiple quality dimensions into one score
* Creating weighted quality indexes (e.g., accuracy + helpfulness + safety)
* Building hierarchical pass/fail criteria (must pass A before B matters)
* Tracking worst-case or best-case performance across evaluators

## Creating a Composite Evaluator

1. Navigate to the [**Evaluators**](https://app.us.honeyhive.ai/metrics) tab
2. Click **Add Evaluator** and select **Composite Evaluator**
3. Configure the aggregate function and select child evaluators

<Note>
  **Child evaluators**: Only evaluators with numeric, boolean, or categorical return types can be added. String evaluators and other composites are excluded.

  **Composite return type**: Composites can only return **Numeric** or **Boolean**. When set to Boolean, Weighted Average and Weighted Sum are disabled.
</Note>

## Configuration

### Event Filters

Filter which events this composite evaluates using event type, event name, and additional property filters. The composite only aggregates child evaluator results from matching events. See [Event Filters](/v2/evaluators/llm#event-filters) for the full list of supported filter options and operators.

### Aggregate Function

| Function                      | Use Case                      | Ignores Weights       |
| ----------------------------- | ----------------------------- | --------------------- |
| **Weighted Average**          | Balanced overall score        | No                    |
| **Weighted Sum**              | Cumulative importance         | No                    |
| **Hierarchical Highest True** | Sequential pass/fail criteria | No (uses as priority) |
| **Minimum**                   | Worst-case performance        | Yes                   |
| **Maximum**                   | Best-case performance         | Yes                   |

### Child Evaluators

Select evaluators to include and set their weights. Browse by type: Python, LLM, or Human.

## Aggregate Functions

### Weighted Average

Calculates `Σ(score × weight) / Σ(weights)`.

| Evaluator  | Weight | Score | Contribution           |
| ---------- | ------ | ----- | ---------------------- |
| Accuracy   | 2      | 4     | 8                      |
| Clarity    | 1      | 3     | 3                      |
| **Result** |        |       | **(8 + 3) / 3 = 3.67** |

### Weighted Sum

Calculates `Σ(score × weight)`.

| Evaluator  | Weight | Score | Contribution   |
| ---------- | ------ | ----- | -------------- |
| Accuracy   | 2      | 4     | 8              |
| Clarity    | 1      | 3     | 3              |
| **Result** |        |       | **8 + 3 = 11** |

### Hierarchical Highest True

For boolean evaluators only. Returns the priority level of the highest consecutive true result, starting from priority 1. Useful for tiered pass/fail criteria where earlier checks must pass before later ones matter.

| Evaluator          | Priority (Weight) | Result  |
| ------------------ | ----------------- | ------- |
| No PII             | 1                 | ✓ True  |
| Factually Correct  | 2                 | ✓ True  |
| Follows Guidelines | 3                 | ✗ False |
| Has Citations      | 4                 | ✓ True  |

**Result: 2** (Priorities 1-2 passed consecutively, priority 3 failed, so the chain breaks at 2)

<Tip>Use for tiered quality gates: basic safety checks at priority 1, correctness at 2, style at 3. The score tells you how far the response got before failing.</Tip>

### Minimum / Maximum

Returns the lowest or highest score among all child evaluators. Weights are ignored.

## Related

<CardGroup cols={2}>
  <Card title="Python Evaluators" icon="python" href="/v2/evaluators/python">
    Create code-based evaluators
  </Card>

  <Card title="LLM Evaluators" icon="sparkles" href="/v2/evaluators/llm">
    Use AI for qualitative assessment
  </Card>

  <Card title="Human Evaluators" icon="user" href="/v2/evaluators/human">
    Enable expert review workflows
  </Card>

  <Card title="Evaluators Introduction" icon="flask" href="/v2/evaluators/introduction">
    Overview of all evaluator types
  </Card>
</CardGroup>