Organization Templates let platform teams define a standard set of evaluators and monitoring charts that automatically populate across new projects. Instead of each team configuring observability from scratch, every new project starts with the resources your organization has standardized on.Documentation Index
Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
Use this file to discover all available pages before exploring further.
Organization Templates are only available on the Enterprise plan.
How templates work
Templates are configured via a YAML manifest in Settings > Organization > Templates. The manifest has two parts:- Template definitions - Reusable blueprints for evaluators and charts
- Project templates - Which definitions to apply when a new project is created
Manifest structure
Evaluator definitions
Define evaluators undertemplate_definitions.metric. HoneyHive supports four evaluator types:
| Type | YAML type value | Key fields | Use case |
|---|---|---|---|
| Human | HUMAN | criteria, scale | Domain expert annotation |
| Python | CUSTOM | code_snippet | Programmatic checks, format validation |
| LLM | MODEL | prompt | Qualitative assessments via AI |
| Composite | COMPOSITE | aggregation_function, details | Aggregate multiple evaluator scores |
Common fields
Every evaluator definition supports these fields:| Field | Description |
|---|---|
type | Evaluator type: HUMAN, CUSTOM, MODEL, or COMPOSITE |
description | Short description of what the evaluator measures |
enabled_in_prod | Whether the evaluator runs on production traces |
needs_ground_truth | Whether ground truth data is required |
return_type | Data type: string, float, or boolean |
threshold | Passing range with min and max, or null for no threshold |
filters.filterArray | Event filters that control which events trigger this evaluator |
sampling_percentage | Percentage of production events to evaluate (1-100) |
Type-specific fields
Human evaluators
Human evaluators
| Field | Description |
|---|---|
criteria | The evaluation question shown to reviewers |
scale | Numeric scale upper bound (e.g., 5 for 1-5 rating). null for non-numeric types. |
Python evaluators
Python evaluators
| Field | Description |
|---|---|
code_snippet | Python function that receives an event dict and returns a value |
pandas, scikit-learn, jsonschema, sqlglot, and requests.LLM evaluators
LLM evaluators
| Field | Description |
|---|---|
prompt | Evaluation prompt using {{ }} syntax to reference event properties |
Composite evaluators
Composite evaluators
| Field | Description |
|---|---|
aggregation_function | How to combine scores: weighted_average, weighted_sum, min, max, hierarchical_highest_true |
details | List of child evaluators with metric_name and weight |
Chart definitions
Define charts undertemplate_definitions.chart. Each chart specifies what to measure, how to aggregate, and how to filter.
| Field | Description |
|---|---|
metric | What to measure: count, duration, or a dotted path like metadata.total_tokens |
func | Aggregation: sum, avg, cumsum, min, max, p50, p95, p99 |
bucketing | Time bucket: minute, hour, day, week, month |
dateRange.relative | Default time range: 1d, 7d, 30d |
groupBy | Optional. Group results by a field (e.g., event_name) |
query | Filters with field, value, type, and operator (is, is not, contains, exists) |
Example chart definitions
Example chart definitions
Project templates
Theproject_templates section lists which definitions are applied when a project is created. Reference definitions by name:

