> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Templates

> Configure standard evaluators and monitoring charts as org-wide templates so every HoneyHive project inherits baseline evaluators, charts, and dashboards.

Organization Templates let platform teams define a standard set of evaluators and monitoring charts that automatically populate across new projects. Instead of each team configuring observability from scratch, every new project starts with the resources your organization has standardized on.

<Info>
  Organization Templates are only available on the **Enterprise** plan. Managing templates requires `workspace.templates.get` and `workspace.templates.set`; Workspace Admins receive these by default on newly created organizations. On existing organizations, an org admin may need to add `workspace.templates.*` to the Workspace Admin role under **Settings > Organization > Roles**.
</Info>

## How templates work

Templates are configured via a YAML manifest in **Settings > Organization > Templates**.

The manifest has two parts:

1. **Template definitions** - Reusable blueprints for evaluators and charts
2. **Project templates** - Which definitions to apply when a new project is created

When someone creates a new project, HoneyHive reads the manifest and creates the listed resources automatically.

## Manifest structure

```yaml theme={null}
template_definitions:
  metric:
    # ... evaluator definitions
  chart:
    # ... chart definitions

config:
  merge_strategy: "replace" # or "merge"

project_templates:
  metric:
    - # evaluator names from definitions above
  chart:
    - # chart names from definitions above
```

## Evaluator definitions

Define evaluators under `template_definitions.metric`. HoneyHive supports four evaluator types:

| Type          | YAML `type` value | Key fields                        | Use case                               |
| ------------- | ----------------- | --------------------------------- | -------------------------------------- |
| **Human**     | `HUMAN`           | `criteria`, `scale`               | Domain expert annotation               |
| **Python**    | `CUSTOM`          | `code_snippet`                    | Programmatic checks, format validation |
| **LLM**       | `MODEL`           | `prompt`                          | Qualitative assessments via AI         |
| **Composite** | `COMPOSITE`       | `aggregation_function`, `details` | Aggregate multiple evaluator scores    |

### Common fields

Every evaluator definition supports these fields:

| Field                 | Description                                                    |
| --------------------- | -------------------------------------------------------------- |
| `type`                | Evaluator type: `HUMAN`, `CUSTOM`, `MODEL`, or `COMPOSITE`     |
| `description`         | Short description of what the evaluator measures               |
| `enabled_in_prod`     | Whether the evaluator runs on production traces                |
| `needs_ground_truth`  | Whether ground truth data is required                          |
| `return_type`         | Data type: `string`, `float`, or `boolean`                     |
| `threshold`           | Passing range with `min` and `max`, or `null` for no threshold |
| `filters.filterArray` | Event filters that control which events trigger this evaluator |
| `sampling_percentage` | Percentage of production events to evaluate (1-100)            |

### Type-specific fields

<Accordion title="Human evaluators">
  | Field      | Description                                                                         |
  | ---------- | ----------------------------------------------------------------------------------- |
  | `criteria` | The evaluation question shown to reviewers                                          |
  | `scale`    | Numeric scale upper bound (e.g., `5` for 1-5 rating). `null` for non-numeric types. |

  ```yaml theme={null}
  Rating:
    type: "HUMAN"
    description: "How would you rate this overall?"
    enabled_in_prod: true
    needs_ground_truth: false
    return_type: "float"
    threshold:
      min: 3
      max: 5
    filters:
      filterArray: []
    sampling_percentage: 100
    criteria: "How would you rate this overall?"
    scale: 5
  ```
</Accordion>

<Accordion title="Python evaluators">
  | Field          | Description                                                       |
  | -------------- | ----------------------------------------------------------------- |
  | `code_snippet` | Python function that receives an `event` dict and returns a value |

  The function has access to Python's standard library and packages including `pandas`, `scikit-learn`, `jsonschema`, `sqlglot`, and `requests`.

  ```yaml theme={null}
  Format Check:
    type: "CUSTOM"
    description: "Checks output is valid JSON"
    enabled_in_prod: true
    needs_ground_truth: false
    return_type: "boolean"
    threshold: null
    filters:
      filterArray: []
    sampling_percentage: 100
    code_snippet: |
      import json
      def evaluate(event):
          try:
              json.loads(event["outputs"]["content"])
              return True
          except (json.JSONDecodeError, KeyError):
              return False
      result = evaluate(event)
  ```
</Accordion>

<Accordion title="LLM evaluators">
  | Field    | Description                                                          |
  | -------- | -------------------------------------------------------------------- |
  | `prompt` | Evaluation prompt using `{{ }}` syntax to reference event properties |

  ```yaml theme={null}
  Answer Relevance:
    type: "MODEL"
    description: "Rates how relevant the answer is to the question"
    enabled_in_prod: true
    needs_ground_truth: false
    return_type: "float"
    threshold:
      min: 3
      max: 5
    filters:
      filterArray: []
    sampling_percentage: 25
    prompt: |
      Evaluate the AI assistant's answer for relevance to the question.
      Rate on a scale of 1 to 5.

      [Question]
      {{ inputs.question }}

      [Answer]
      {{ outputs.content }}

      Rating: [[X]]
  ```
</Accordion>

<Accordion title="Composite evaluators">
  | Field                  | Description                                                                                          |
  | ---------------------- | ---------------------------------------------------------------------------------------------------- |
  | `aggregation_function` | How to combine scores: `weighted_average`, `weighted_sum`, `min`, `max`, `hierarchical_highest_true` |
  | `details`              | List of child evaluators with `metric_name` and `weight`                                             |

  ```yaml theme={null}
  Overall Quality:
    type: "COMPOSITE"
    description: "Weighted average of relevance and format checks"
    enabled_in_prod: true
    needs_ground_truth: false
    return_type: "float"
    threshold:
      min: 3
      max: 5
    filters:
      filterArray: []
    sampling_percentage: 100
    aggregation_function: "weighted_average"
    details:
      - metric_name: "Answer Relevance"
        weight: 2
      - metric_name: "Format Check"
        weight: 1
  ```
</Accordion>

## Chart definitions

Define charts under `template_definitions.chart`. Each chart specifies what to measure, how to aggregate, and how to filter.

| Field                | Description                                                                                  |
| -------------------- | -------------------------------------------------------------------------------------------- |
| `metric`             | What to measure: `count`, `duration`, or a dotted path like `metadata.total_tokens`          |
| `func`               | Aggregation: `sum`, `avg`, `cumsum`, `min`, `max`, `p50`, `p95`, `p99`                       |
| `bucketing`          | Time bucket: `minute`, `hour`, `day`, `week`, `month`                                        |
| `dateRange.relative` | Default time range: `1d`, `7d`, `30d`                                                        |
| `groupBy`            | Optional. Group results by a field (e.g., `event_name`)                                      |
| `query`              | Filters with `field`, `value`, `type`, and `operator` (`is`, `is not`, `contains`, `exists`) |

<Accordion title="Example chart definitions">
  ```yaml theme={null}
  "Daily Session Count":
    metric: "count"
    func: "sum"
    bucketing: "day"
    dateRange:
      relative: "7d"
    query:
      - field: "event_type"
        value: "session"
        type: "string"
        operator: "is"

  "Average LLM Call Duration":
    metric: "duration"
    func: "avg"
    bucketing: "hour"
    dateRange:
      relative: "7d"
    query:
      - field: "event_type"
        value: "model"
        type: "string"
        operator: "is"

  "Cumulative Total Tokens Usage":
    metric: "metadata.total_tokens"
    func: "cumsum"
    bucketing: "day"
    dateRange:
      relative: "7d"
    query:
      - field: "event_type"
        value: "model"
        type: "string"
        operator: "is"

  "Average Grouped Event Duration":
    metric: "duration"
    func: "avg"
    groupBy: "event_name"
    bucketing: "hour"
    dateRange:
      relative: "7d"
    query:
      - field: "event_type"
        value: "session"
        type: "string"
        operator: "is not"
  ```
</Accordion>

## Project templates

The `project_templates` section lists which definitions are applied when a project is created. Reference definitions by name:

```yaml theme={null}
project_templates:
  metric:
    - Rating
    - Format Check
    - Answer Relevance
  chart:
    - "Daily Session Count"
    - "Average LLM Call Duration"
    - "Cumulative Total Tokens Usage"
```