Keep your HoneyHive resources (evaluators, datasets, datapoints, experiment runs) checked into your repo as YAML or JSON, and apply them with the HoneyHive CLI. The CLI publishes a JSON Schema for every command, so the file format stays in lockstep with the public API. This guide shows how to lay out aDocumentation Index
Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
Use this file to discover all available pages before exploring further.
.honeyhive/ directory, discover each resource’s schema, and roll out changes from the terminal or CI.
The CLI is
@honeyhive/cli. See Install & Quickstart to install it, and the full command reference for every namespace.Why config as code
- Reviewable: changes to an evaluator’s Python code or an LLM prompt go through the same PR review as the rest of your app.
- Reproducible: the resource definitions live next to the code they evaluate, pinned to a commit.
- Portable: the same files apply to staging and production projects by swapping
HH_API_URLandHH_API_KEY. - Agent-friendly: every command exposes its JSON Schema, so coding agents like Cursor and Claude Code can author and validate files without guessing.
Directory layout
A common convention is to keep one file per resource under.honeyhive/:
| Folder | CLI namespace | API resource |
|---|---|---|
evaluators/ | honeyhive metrics | Metrics |
datasets/ | honeyhive datasets | Datasets |
datapoints/ | honeyhive datapoints | Datapoints |
experiments/ | honeyhive experiments | Experiments |
Discover the schema
Every CLI command that takes arguments supports two read-only flags:--show-file-schema, which prints the JSON Schema for the full request object (the exact shape--filenameaccepts).--show-argument-schema <flag-name>, which prints the JSON Schema for a single argument. Pass the kebab-case flag name without the leading--.
jq, save it to disk, or hand it to a coding agent so it can scaffold a valid file:
File schemas use the API field names (
snake_case or camelCase, matching the public OpenAPI spec) rather than the CLI’s --kebab-case flag names. The CLI’s --filename flag passes the parsed file straight through to the API with no field translation.Define an evaluator
Python evaluators are functions that score events on the server. To define one as a file, capture the same fields you would set in the Evaluators UI:.honeyhive/evaluators/keyword-check.yaml
criteria, and the model is selected with model_provider and model_name. return_type is float for numeric scores, boolean for pass/fail, string for free-form, or categorical when paired with a categories list:
.honeyhive/evaluators/relevance-llm.yaml
Apply a file
Pass--filename (or -f) to send the entire file as the request body. The CLI picks the parser from the file extension and accepts .yaml, .yml, .json, and .jsonc (comments and trailing commas are allowed in both .json and .jsonc).
metrics create returns a flat {inserted, metric_id}, datasets create returns {result: {insertedId}}). Run the command once and pipe through jq to see the exact shape, or consult the CLI reference for that namespace.
The response includes the assigned metric_id. The cleanest way to make subsequent applies idempotent is to write the ID back into the YAML itself as a top-level metric_id field, since metrics update reads it directly from the file body. A .honeyhive/state.json lockfile is a reasonable alternative when the YAML is generated by another tool and shouldn’t be mutated.
To update an existing evaluator, include metric_id in the file and call metrics update:
.honeyhive/evaluators/keyword-check.yaml
--filename flow works for datasets create, datasets update, datapoints create, datapoints update, and every other namespace. Run --show-file-schema on the command you want to use to see the exact shape.
Define a dataset
A dataset definition only needs a name, an optional description, and the datapoint IDs it should include:.honeyhive/datasets/qa-eval-set.yaml
.honeyhive/datapoints/q1.yaml
Both sides of the dataset/datapoint relationship are writable:
datasets create accepts an initial datapoints: [<id>...] array, and datapoints create accepts linked_datasets: [<id>...]. For the config-as-code flow, link from the datapoint side as shown above. It keeps the dataset YAML stable across runs and matches the order the API expects when you create resources from empty.This page covers the shape of your dataset as code. For keeping the contents of a dataset in sync with an external source (S3, a database, an internal tool), see Sync from External Sources. The two patterns combine well: define the dataset metadata as code, then populate datapoints from an external system.
A simple sync script
The CLI does not ship with a singlehoneyhive apply command yet, so most teams wrap their .honeyhive/ directory in a small script. The script below upserts every evaluator under .honeyhive/evaluators/, tracking IDs in a checked-in .honeyhive/state.json lockfile. Commit the lockfile alongside your YAML so collaborators and CI share the same IDs; the alternative is to inline metric_id into each YAML on create and skip the lockfile entirely.
sync-evaluators.sh
.yaml, .yml, .json, or .jsonc), which is why the update branch appends .yaml to mktemp’s output before writing.
Run from CI
Once.honeyhive/ is in version control, applying changes is a single CLI step. A minimal GitHub Actions job:
.github/workflows/honeyhive-sync.yml
HH_API_KEY and HH_API_URL values per environment without changing any file under .honeyhive/.
Validate before applying
Validate that a file matches the API’s schema before pushing changes by piping it through--show-file-schema plus a JSON Schema validator (ajv, check-jsonschema, etc.). For example, with check-jsonschema:
Use with coding agents
The schema introspection flags are designed for AI coding agents. An agent that wants to add a new evaluator can:- Run
honeyhive metrics create --show-file-schemato get the JSON Schema. - Generate a YAML file that conforms to it, placed under
.honeyhive/evaluators/. - Apply it with
honeyhive metrics create --filename ....
Related references
HoneyHive CLI
Install the CLI and run your first commands.
CLI Command Reference
Every namespace, command, and flag in one place.
Sync Datasets from External Sources
Keep dataset contents in sync with S3, databases, or internal tools.
Python and LLM Evaluators
Background on the evaluator model the YAML files describe.

