Organization Templates are only available on the Enterprise plan.
How templates work
Templates are configured via a YAML manifest in Settings > Organization > Templates. The manifest has two parts:- Template definitions - Reusable blueprints for evaluators and charts
- Project templates - Which definitions to apply when a new project is created
Manifest structure
Evaluator definitions
Define evaluators undertemplate_definitions.metric. HoneyHive supports four evaluator types:
| Type | YAML type value | Key fields | Use case |
|---|---|---|---|
| Human | HUMAN | criteria, scale | Domain expert annotation |
| Python | CUSTOM | code_snippet | Programmatic checks, format validation |
| LLM | MODEL | prompt | Qualitative assessments via AI |
| Composite | COMPOSITE | aggregation_function, details | Aggregate multiple evaluator scores |
Common fields
Every evaluator definition supports these fields:| Field | Description |
|---|---|
type | Evaluator type: HUMAN, CUSTOM, MODEL, or COMPOSITE |
description | Short description of what the evaluator measures |
enabled_in_prod | Whether the evaluator runs on production traces |
needs_ground_truth | Whether ground truth data is required |
return_type | Data type: string, float, or boolean |
threshold | Passing range with min and max, or null for no threshold |
filters.filterArray | Event filters that control which events trigger this evaluator |
sampling_percentage | Percentage of production events to evaluate (1-100) |
Type-specific fields
Human evaluators
Human evaluators
| Field | Description |
|---|---|
criteria | The evaluation question shown to reviewers |
scale | Numeric scale upper bound (e.g., 5 for 1-5 rating). null for non-numeric types. |
Python evaluators
Python evaluators
| Field | Description |
|---|---|
code_snippet | Python function that receives an event dict and returns a value |
pandas, scikit-learn, jsonschema, sqlglot, and requests.LLM evaluators
LLM evaluators
| Field | Description |
|---|---|
prompt | Evaluation prompt using {{ }} syntax to reference event properties |
Composite evaluators
Composite evaluators
| Field | Description |
|---|---|
aggregation_function | How to combine scores: weighted_average, weighted_sum, min, max, hierarchical_highest_true |
details | List of child evaluators with metric_name and weight |
Chart definitions
Define charts undertemplate_definitions.chart. Each chart specifies what to measure, how to aggregate, and how to filter.
| Field | Description |
|---|---|
metric | What to measure: count, duration, or a dotted path like metadata.total_tokens |
func | Aggregation: sum, avg, cumsum, min, max, p50, p95, p99 |
bucketing | Time bucket: minute, hour, day, week, month |
dateRange.relative | Default time range: 1d, 7d, 30d |
groupBy | Optional. Group results by a field (e.g., event_name) |
query | Filters with field, value, type, and operator (is, is not, contains, exists) |
Example chart definitions
Example chart definitions
Project templates
Theproject_templates section lists which definitions are applied when a project is created. Reference definitions by name:

