.honeyhive/ directory, discover each resource’s schema, and roll out changes from the terminal or CI.
The CLI is
@honeyhive/cli. See Install & Quickstart to install it, and the full command reference for every namespace.Why config as code
- Reviewable: changes to an evaluator’s Python code or an LLM prompt go through the same PR review as the rest of your app.
- Reproducible: the resource definitions live next to the code they evaluate, pinned to a commit.
- Portable: the same files apply to staging and production projects by swapping
HH_DATA_PLANE_URLandHH_API_KEY. - Agent-friendly: every command exposes its JSON Schema, so coding agents like Cursor and Claude Code can author and validate files without guessing.
Directory layout
A common convention is to keep one file per resource under.honeyhive/:
| Folder | CLI namespace | API resource |
|---|---|---|
evaluators/ | honeyhive metrics | Metrics |
datasets/ | honeyhive datasets | Datasets |
datapoints/ | honeyhive datapoints | Datapoints |
experiments/ | honeyhive experiments | Experiments |
Discover the schema
Every CLI command that takes arguments supports two read-only flags:--show-file-schema, which prints the JSON Schema for the full request object (the exact shape--filenameaccepts).--show-argument-schema <flag-name>, which prints the JSON Schema for a single argument. Pass the kebab-case flag name without the leading--.
jq, save it to disk, or hand it to a coding agent so it can scaffold a valid file:
File schemas use the API field names (
snake_case or camelCase, matching the public OpenAPI spec) rather than the CLI’s --kebab-case flag names. The CLI’s --filename flag passes the parsed file straight through to the API with no field translation.Define an evaluator
Python evaluators are functions that score events on the server. To define one as a file, capture the same fields you would set in the Evaluators UI:.honeyhive/evaluators/keyword-check.yaml
criteria, and the model is selected with model_provider and model_name. return_type is float for numeric scores, boolean for pass/fail, string for free-form, or categorical when paired with a categories list:
.honeyhive/evaluators/relevance-llm.yaml
scale is required for any return_type: float evaluator (LLM, Python, or composite) because the API needs the upper bound of the rating range. Set it to match the maximum value your prompt asks the model to produce (5 here, since the prompt rates 1 to 5). The JSON Schema marks scale as nullable, but the API rejects float evaluators that omit it with a bare 400.Apply a file
Pass--filename (or -f) to send the entire file as the request body. The CLI picks the parser from the file extension and accepts .yaml, .yml, .json, and .jsonc (comments and trailing commas are allowed in both .json and .jsonc).
create responses, list responses, and the update file body. To extract IDs reliably, match the namespace to its jq expression:
| Namespace | create returns | Extract ID with | list field | update file requires |
|---|---|---|---|---|
metrics | {inserted, metric_id} | jq -r '.metric_id' | .metrics[].id | metric_id |
datasets | {inserted, result: {insertedId}} | jq -r '.result.insertedId' | .datasets[].id | dataset_id |
datapoints | {inserted, result: {insertedIds: [...]}} (always an array, even for one datapoint) | jq -r '.result.insertedIds[0]' | .datapoints[].id | datapoint_id |
metric_id. To make subsequent applies idempotent, pick one of these two patterns and stick with it across your repo:
- Embedded ID: write the assigned
metric_idback into the YAML as a top-level field, sincemetrics updatereads it directly from the file body. Simple, but files withmetric_idonly validate against themetrics updateschema (see Validate before applying). - Lockfile: keep IDs in a checked-in
.honeyhive/state.jsonand look them up at apply time (see A simple sync script). Pairs well with YAML generated by other tools that shouldn’t be mutated, and the YAMLs themselves still validate against themetrics createschema.
metric_id and using a lockfile produces YAML that fails the create-schema validator the moment the ID is added.
To update an existing evaluator, include metric_id in the file and call metrics update:
.honeyhive/evaluators/keyword-check.yaml
--filename flow works for datasets create, datasets update, datapoints create, datapoints update, and every other namespace. Run --show-file-schema on the command you want to use to see the exact shape.
Define a dataset
A dataset definition only needs a name, an optional description, and the datapoint IDs it should include:.honeyhive/datasets/qa-eval-set.yaml
.honeyhive/datapoints/q1.yaml
Both sides of the dataset/datapoint relationship are writable:
datasets create accepts an initial datapoints: [<id>...] array, and datapoints create accepts linked_datasets: [<id>...]. For the config-as-code flow, link from the datapoint side as shown above. It keeps the dataset YAML stable across runs and matches the order the API expects when you create resources from empty.This page covers the shape of your dataset as code. For keeping the contents of a dataset in sync with an external source (S3, a database, an internal tool), see Sync from External Sources. The two patterns combine well: define the dataset metadata as code, then populate datapoints from an external system.
A simple sync script
The CLI does not ship with a singlehoneyhive apply command yet, so most teams wrap their .honeyhive/ directory in a small script. The script below implements the lockfile pattern from Apply a file: it upserts every evaluator under .honeyhive/evaluators/, tracking IDs in a checked-in .honeyhive/state.json. The YAMLs themselves stay free of metric_id; the script writes it into a temporary copy before calling metrics update. Commit the lockfile alongside your YAML so collaborators and CI share the same IDs.
sync-evaluators.sh
.yaml, .yml, .json, or .jsonc), which is why the update branch appends .yaml to mktemp’s output before writing.
Run from CI
Once.honeyhive/ is in version control, applying changes is a single CLI step. A minimal GitHub Actions job:
.github/workflows/honeyhive-sync.yml
HH_API_KEY and HH_DATA_PLANE_URL values per environment without changing any file under .honeyhive/. HH_API_URL is deprecated and still works for older setups, but new configuration should use HH_DATA_PLANE_URL.
Validate before applying
Validate that a file matches the API’s schema before pushing changes by piping it through--show-file-schema plus a JSON Schema validator (ajv, check-jsonschema, etc.). Each command publishes its own schema: metrics create requires name, type, criteria and rejects unknown fields, while metrics update requires metric_id and accepts the same body keys.
Pick the schema that matches the pattern you chose in Apply a file:
The create schema sets
additionalProperties: false, so a file written for the embedded-ID update flow will fail validation against it. Validate against whichever schema matches the pattern you picked in Apply a file.Use with coding agents
The schema introspection flags are designed for AI coding agents. An agent that wants to add a new evaluator can:- Run
honeyhive metrics create --show-file-schemato get the JSON Schema. - Generate a YAML file that conforms to it, placed under
.honeyhive/evaluators/. - Apply it with
honeyhive metrics create --filename ....
Related references
HoneyHive CLI
Install the CLI and run your first commands.
CLI Command Reference
Every namespace, command, and flag in one place.
Sync Datasets from External Sources
Keep dataset contents in sync with S3, databases, or internal tools.
Python and LLM Evaluators
Background on the evaluator model the YAML files describe.

