- Combining multiple quality dimensions into one score
- Creating weighted quality indexes (e.g., accuracy + helpfulness + safety)
- Building hierarchical pass/fail criteria (must pass A before B matters)
- Tracking worst-case or best-case performance across evaluators
Creating a Composite Evaluator
- Navigate to the Evaluators tab
- Click Add Evaluator and select Composite Evaluator
- Configure the aggregate function and select child evaluators
Child evaluators: Only evaluators with numeric, boolean, or categorical return types can be added. String evaluators and other composites are excluded.Composite return type: Composites can only return Numeric or Boolean. When set to Boolean, Weighted Average and Weighted Sum are disabled.
Configuration
Event Filters
Filter which events this composite evaluates using event type, event name, and additional property filters. The composite only aggregates child evaluator results from matching events. See Event Filters for the full list of supported filter options and operators.Aggregate Function
| Function | Use Case | Ignores Weights |
|---|---|---|
| Weighted Average | Balanced overall score | No |
| Weighted Sum | Cumulative importance | No |
| Hierarchical Highest True | Sequential pass/fail criteria | No (uses as priority) |
| Minimum | Worst-case performance | Yes |
| Maximum | Best-case performance | Yes |
Child Evaluators
Select evaluators to include and set their weights. Browse by type: Python, LLM, or Human.Aggregate Functions
Weighted Average
CalculatesΣ(score × weight) / Σ(weights).
| Evaluator | Weight | Score | Contribution |
|---|---|---|---|
| Accuracy | 2 | 4 | 8 |
| Clarity | 1 | 3 | 3 |
| Result | (8 + 3) / 3 = 3.67 |
Weighted Sum
CalculatesΣ(score × weight).
| Evaluator | Weight | Score | Contribution |
|---|---|---|---|
| Accuracy | 2 | 4 | 8 |
| Clarity | 1 | 3 | 3 |
| Result | 8 + 3 = 11 |
Hierarchical Highest True
For boolean evaluators only. Returns the priority level of the highest consecutive true result, starting from priority 1. Useful for tiered pass/fail criteria where earlier checks must pass before later ones matter.| Evaluator | Priority (Weight) | Result |
|---|---|---|
| No PII | 1 | ✓ True |
| Factually Correct | 2 | ✓ True |
| Follows Guidelines | 3 | ✗ False |
| Has Citations | 4 | ✓ True |
Minimum / Maximum
Returns the lowest or highest score among all child evaluators. Weights are ignored.Related
Python Evaluators
Create code-based evaluators
LLM Evaluators
Use AI for qualitative assessment
Human Evaluators
Enable expert review workflows
Evaluators Introduction
Overview of all evaluator types

