> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Config as Code

> Learn to define HoneyHive evaluators, datasets, and platform resources as version-controlled files, then apply and sync them to your workspace with the CLI.

Keep your HoneyHive resources (evaluators, datasets, datapoints, experiment runs) checked into your repo as YAML or JSON, and apply them with the [HoneyHive CLI](/v2/cli-reference/getting-started). The CLI publishes a JSON Schema for every command, so the file format stays in lockstep with the public API.

This guide shows how to lay out a `.honeyhive/` directory, discover each resource's schema, and roll out changes from the terminal or CI.

## Why config as code

* **Reviewable**: changes to an evaluator's Python code or an LLM prompt go through the same PR review as the rest of your app.
* **Reproducible**: the resource definitions live next to the code they evaluate, pinned to a commit.
* **Portable**: the same files apply to staging and production projects by swapping `HH_DATA_PLANE_URL` and `HH_API_KEY`.
* **Agent-friendly**: every command exposes its JSON Schema, so coding agents like Cursor and Claude Code can author and validate files without guessing.

## Directory layout

A common convention is to keep one file per resource under `.honeyhive/`:

```
.honeyhive/
├── evaluators/
│   ├── keyword-check.yaml
│   └── relevance-llm.yaml
├── datasets/
│   └── qa-eval-set.yaml
└── datapoints/
    ├── q1.yaml
    └── q2.yaml
```

The directory names map one-to-one to CLI namespaces:

| Folder         | CLI namespace           | API resource                                     |
| -------------- | ----------------------- | ------------------------------------------------ |
| `evaluators/`  | `honeyhive metrics`     | [Metrics](/v2/cli-reference/ref/metrics)         |
| `datasets/`    | `honeyhive datasets`    | [Datasets](/v2/cli-reference/ref/datasets)       |
| `datapoints/`  | `honeyhive datapoints`  | [Datapoints](/v2/cli-reference/ref/datapoints)   |
| `experiments/` | `honeyhive experiments` | [Experiments](/v2/cli-reference/ref/experiments) |

<Tip>
  "Evaluator" and "metric" are the same resource. The product UI and docs call them evaluators; the API and CLI call them metrics.
</Tip>

## Discover the schema

Every CLI command that takes arguments supports two read-only flags:

* `--show-file-schema`, which prints the JSON Schema for the full request object (the exact shape `--filename` accepts).
* `--show-argument-schema <flag-name>`, which prints the JSON Schema for a single argument. Pass the kebab-case flag name without the leading `--`.

Both write pure JSON to stdout and never call the API.

```bash theme={null}
# Schema for a new evaluator file
honeyhive metrics create --show-file-schema

# Schema for just the `criteria` field
honeyhive metrics create --show-argument-schema criteria
```

The output is plain JSON Schema. Pipe it to `jq`, save it to disk, or hand it to a coding agent so it can scaffold a valid file:

```bash theme={null}
honeyhive metrics create --show-file-schema > .honeyhive/schemas/metric.schema.json
```

<Note>
  File schemas use the API field names (`snake_case` or `camelCase`, matching the [public OpenAPI spec](/v2/sdk-reference/openapi-sdks)) rather than the CLI's `--kebab-case` flag names. The CLI's `--filename` flag passes the parsed file straight through to the API with no field translation.
</Note>

## Define an evaluator

[Python evaluators](/v2/evaluators/python) are functions that score events on the server. To define one as a file, capture the same fields you would set in the [Evaluators UI](https://app.us.honeyhive.ai/metrics):

```yaml .honeyhive/evaluators/keyword-check.yaml theme={null}
name: keyword-check
type: PYTHON
return_type: boolean
needs_ground_truth: false
description: Checks whether the response mentions the word "honey".
criteria: |
  def keyword_check():
      return "honey" in outputs["content"].lower()
```

[LLM evaluators](/v2/evaluators/llm) follow the same pattern. The prompt goes in `criteria`, and the model is selected with `model_provider` and `model_name`. `return_type` is `float` for numeric scores, `boolean` for pass/fail, `string` for free-form, or `categorical` when paired with a `categories` list:

```yaml .honeyhive/evaluators/relevance-llm.yaml theme={null}
name: relevance-llm
type: LLM
return_type: float
scale: 5
model_provider: openai
model_name: gpt-4o
sampling_percentage: 25
description: Rates how well the answer addresses the question, 1-5.
criteria: |
  [Instruction]
  Rate the assistant's answer for relevance to the question on a scale of 1 to 5.

  [Question]
  {{ inputs.question }}

  [Answer]
  {{ outputs.content }}

  [Evaluation]
  Rating: [[X]]
```

<Note>
  `scale` is required for any `return_type: float` evaluator (LLM, Python, or composite) because the API needs the upper bound of the rating range. Set it to match the maximum value your prompt asks the model to produce (`5` here, since the prompt rates `1 to 5`). The JSON Schema marks `scale` as nullable, but the API rejects float evaluators that omit it with a bare `400`.
</Note>

To see every field the API accepts (categories, thresholds, child metrics for composites, event filters, etc.), inspect the schema:

```bash theme={null}
honeyhive metrics create --show-file-schema | jq '.properties | keys'
```

## Apply a file

Pass `--filename` (or `-f`) to send the entire file as the request body. The CLI picks the parser from the file extension and accepts `.yaml`, `.yml`, `.json`, and `.jsonc` (comments and trailing commas are allowed in both `.json` and `.jsonc`).

```bash theme={null}
# Create the evaluator
honeyhive metrics create --filename .honeyhive/evaluators/keyword-check.yaml
# {
#   "inserted": true,
#   "metric_id": "01KRJB6SX9YA4J51NRFT6M27RC"
# }
```

The response shape varies per namespace, and the field that carries the assigned ID is named differently in `create` responses, `list` responses, and the `update` file body. To extract IDs reliably, match the namespace to its `jq` expression:

| Namespace    | `create` returns                                                                     | Extract ID with                  | `list` field       | `update` file requires |
| ------------ | ------------------------------------------------------------------------------------ | -------------------------------- | ------------------ | ---------------------- |
| `metrics`    | `{inserted, metric_id}`                                                              | `jq -r '.metric_id'`             | `.metrics[].id`    | `metric_id`            |
| `datasets`   | `{inserted, result: {insertedId}}`                                                   | `jq -r '.result.insertedId'`     | `.datasets[].id`   | `dataset_id`           |
| `datapoints` | `{inserted, result: {insertedIds: [...]}}` (always an array, even for one datapoint) | `jq -r '.result.insertedIds[0]'` | `.datapoints[].id` | `datapoint_id`         |

For the full schema of any namespace, consult the [CLI reference](/v2/cli-reference/namespaces).

The response includes the assigned `metric_id`. To make subsequent applies idempotent, pick **one** of these two patterns and stick with it across your repo:

* **Embedded ID**: write the assigned `metric_id` back into the YAML as a top-level field, since `metrics update` reads it directly from the file body. Simple, but files with `metric_id` only validate against the `metrics update` schema (see [Validate before applying](#validate-before-applying)).
* **Lockfile**: keep IDs in a checked-in `.honeyhive/state.json` and look them up at apply time (see [A simple sync script](#a-simple-sync-script)). Pairs well with YAML generated by other tools that shouldn't be mutated, and the YAMLs themselves still validate against the `metrics create` schema.

The two patterns aren't meant to compose: embedding `metric_id` *and* using a lockfile produces YAML that fails the create-schema validator the moment the ID is added.

To update an existing evaluator, include `metric_id` in the file and call `metrics update`:

```yaml .honeyhive/evaluators/keyword-check.yaml theme={null}
metric_id: 01KRJB6SX9YA4J51NRFT6M27RC
name: keyword-check
type: PYTHON
return_type: boolean
description: Checks whether the response mentions the word "honey" or "hive".
criteria: |
  def keyword_check():
      content = outputs["content"].lower()
      return "honey" in content or "hive" in content
```

```bash theme={null}
honeyhive metrics update --filename .honeyhive/evaluators/keyword-check.yaml
```

The same `--filename` flow works for `datasets create`, `datasets update`, `datapoints create`, `datapoints update`, and every other namespace. Run `--show-file-schema` on the command you want to use to see the exact shape.

## Define a dataset

A dataset definition only needs a name, an optional description, and the datapoint IDs it should include:

```yaml .honeyhive/datasets/qa-eval-set.yaml theme={null}
name: qa-eval-set
description: Question/answer pairs used for the relevance evaluator.
datapoints: []
```

Apply it the same way:

```bash theme={null}
DATASET_ID=$(honeyhive datasets create \
  --filename .honeyhive/datasets/qa-eval-set.yaml \
  | jq -r '.result.insertedId')
```

For each datapoint, define inputs and ground truth and link it to the dataset on create:

```yaml .honeyhive/datapoints/q1.yaml theme={null}
inputs:
  question: What is the capital of France?
ground_truth:
  answer: Paris
metadata:
  external_id: q1
linked_datasets:
  - "01KRJB7WD8E2H4M9X3K2Y7Q1A5"  # paste the dataset_id printed by the create above
```

```bash theme={null}
honeyhive datapoints create --filename .honeyhive/datapoints/q1.yaml
```

<Note>
  Both sides of the dataset/datapoint relationship are writable: `datasets create` accepts an initial `datapoints: [<id>...]` array, and `datapoints create` accepts `linked_datasets: [<id>...]`. For the config-as-code flow, link from the datapoint side as shown above. It keeps the dataset YAML stable across runs and matches the order the API expects when you create resources from empty.
</Note>

<Note>
  This page covers the **shape** of your dataset as code. For keeping the **contents** of a dataset in sync with an external source (S3, a database, an internal tool), see [Sync from External Sources](/v2/datasets/sync). The two patterns combine well: define the dataset metadata as code, then populate datapoints from an external system.
</Note>

## A simple sync script

The CLI does not ship with a single `honeyhive apply` command yet, so most teams wrap their `.honeyhive/` directory in a small script. The script below implements the lockfile pattern from [Apply a file](#apply-a-file): it upserts every evaluator under `.honeyhive/evaluators/`, tracking IDs in a checked-in `.honeyhive/state.json`. The YAMLs themselves stay free of `metric_id`; the script writes it into a temporary copy before calling `metrics update`. Commit the lockfile alongside your YAML so collaborators and CI share the same IDs.

```bash sync-evaluators.sh theme={null}
#!/usr/bin/env bash
# Usage: HH_API_KEY=... bash sync-evaluators.sh
set -euo pipefail

STATE_FILE=".honeyhive/state.json"
[ -f "$STATE_FILE" ] || echo '{"evaluators": {}}' > "$STATE_FILE"

for file in .honeyhive/evaluators/*.yaml; do
  name=$(yq '.name' "$file")
  existing_id=$(jq -r --arg n "$name" '.evaluators[$n] // ""' "$STATE_FILE")

  if [ -n "$existing_id" ]; then
    # Update in place; CLI reads metric_id from the file body.
    # mktemp + .yaml suffix appended manually so this works on macOS (BSD
    # mktemp does not support --suffix).
    tmp="$(mktemp).yaml"
    yq ". + {\"metric_id\": \"$existing_id\"}" "$file" > "$tmp"
    honeyhive metrics update --filename "$tmp" > /dev/null
    rm -f "$tmp"
    echo "Updated $name ($existing_id)"
  else
    new_id=$(honeyhive metrics create --filename "$file" | jq -r '.metric_id')
    tmp=$(mktemp)
    jq --arg n "$name" --arg id "$new_id" \
      '.evaluators[$n] = $id' "$STATE_FILE" > "$tmp"
    mv "$tmp" "$STATE_FILE"
    echo "Created $name ($new_id)"
  fi
done
```

The CLI requires the file extension to match the parser (`.yaml`, `.yml`, `.json`, or `.jsonc`), which is why the update branch appends `.yaml` to `mktemp`'s output before writing.

<Tip>
  Use `--verbose` (or `HH_VERBOSE=true`) when debugging to log the resolved API URL and masked key for each invocation. This makes it obvious whether a CI job is hitting staging or production.
</Tip>

## Run from CI

Once `.honeyhive/` is in version control, applying changes is a single CLI step. A minimal GitHub Actions job:

```yaml .github/workflows/honeyhive-sync.yml theme={null}
name: Sync HoneyHive resources

on:
  push:
    branches: [main]
    paths: [".honeyhive/**"]

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install HoneyHive CLI
        # Pin to a specific release so CI runs are reproducible.
        run: curl -fsSL https://github.com/honeyhiveai/honeyhive-cli/releases/download/v1.0.0/install.sh | sh
      - name: Apply evaluators
        env:
          HH_API_KEY: ${{ secrets.HH_API_KEY }}
        run: bash sync-evaluators.sh
```

For staging/production parity, pass different `HH_API_KEY` and `HH_DATA_PLANE_URL` values per environment without changing any file under `.honeyhive/`. `HH_API_URL` is deprecated and still works for older setups, but new configuration should use `HH_DATA_PLANE_URL`.

## Validate before applying

Validate that a file matches the API's schema before pushing changes by piping it through `--show-file-schema` plus a JSON Schema validator (`ajv`, `check-jsonschema`, etc.). Each command publishes its own schema: `metrics create` requires `name`, `type`, `criteria` and rejects unknown fields, while `metrics update` requires `metric_id` and accepts the same body keys.

Pick the schema that matches the pattern you chose in [Apply a file](#apply-a-file):

<CodeGroup>
  ```bash Lockfile pattern theme={null}
  # Files have no metric_id; validate against the create schema.
  honeyhive metrics create --show-file-schema > /tmp/metric-create.schema.json
  check-jsonschema --schemafile /tmp/metric-create.schema.json .honeyhive/evaluators/*.yaml
  ```

  ```bash Embedded-ID pattern theme={null}
  # Files have metric_id embedded; validate against the update schema.
  honeyhive metrics update --show-file-schema > /tmp/metric-update.schema.json
  check-jsonschema --schemafile /tmp/metric-update.schema.json .honeyhive/evaluators/*.yaml
  ```
</CodeGroup>

This catches typos, missing required fields, and invalid enum values locally, before the request reaches the API.

<Note>
  The create schema sets `additionalProperties: false`, so a file written for the embedded-ID update flow will fail validation against it. Validate against whichever schema matches the pattern you picked in [Apply a file](#apply-a-file).
</Note>

## Use with coding agents

The schema introspection flags are designed for AI coding agents. An agent that wants to add a new evaluator can:

1. Run `honeyhive metrics create --show-file-schema` to get the JSON Schema.
2. Generate a YAML file that conforms to it, placed under `.honeyhive/evaluators/`.
3. Apply it with `honeyhive metrics create --filename ...`.

See [AI Coding Agents](/v2/introduction/ai-coding-agents#honeyhive-cli) for the pre-built HoneyHive Skills that bundle this workflow into agent-friendly slash commands.

## Related references

<CardGroup cols={2}>
  <Card title="HoneyHive CLI" icon="terminal" href="/v2/cli-reference/getting-started">
    Install the CLI and run your first commands.
  </Card>

  <Card title="CLI Command Reference" icon="book" href="/v2/cli-reference/namespaces">
    Every namespace, command, and flag in one place.
  </Card>

  <Card title="Sync Datasets from External Sources" icon="rotate" href="/v2/datasets/sync">
    Keep dataset contents in sync with S3, databases, or internal tools.
  </Card>

  <Card title="Python and LLM Evaluators" icon="flask" href="/v2/evaluators/introduction">
    Background on the evaluator model the YAML files describe.
  </Card>
</CardGroup>