Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt

Use this file to discover all available pages before exploring further.

Keep your HoneyHive resources (evaluators, datasets, datapoints, experiment runs) checked into your repo as YAML or JSON, and apply them with the HoneyHive CLI. The CLI publishes a JSON Schema for every command, so the file format stays in lockstep with the public API. This guide shows how to lay out a .honeyhive/ directory, discover each resource’s schema, and roll out changes from the terminal or CI.
The CLI is @honeyhive/cli. See Install & Quickstart to install it, and the full command reference for every namespace.

Why config as code

  • Reviewable: changes to an evaluator’s Python code or an LLM prompt go through the same PR review as the rest of your app.
  • Reproducible: the resource definitions live next to the code they evaluate, pinned to a commit.
  • Portable: the same files apply to staging and production projects by swapping HH_API_URL and HH_API_KEY.
  • Agent-friendly: every command exposes its JSON Schema, so coding agents like Cursor and Claude Code can author and validate files without guessing.

Directory layout

A common convention is to keep one file per resource under .honeyhive/:
.honeyhive/
├── evaluators/
│   ├── keyword-check.yaml
│   └── relevance-llm.yaml
├── datasets/
│   └── qa-eval-set.yaml
└── datapoints/
    ├── q1.yaml
    └── q2.yaml
The directory names map one-to-one to CLI namespaces:
FolderCLI namespaceAPI resource
evaluators/honeyhive metricsMetrics
datasets/honeyhive datasetsDatasets
datapoints/honeyhive datapointsDatapoints
experiments/honeyhive experimentsExperiments
“Evaluator” and “metric” are the same resource. The product UI and docs call them evaluators; the API and CLI call them metrics.

Discover the schema

Every CLI command that takes arguments supports two read-only flags:
  • --show-file-schema, which prints the JSON Schema for the full request object (the exact shape --filename accepts).
  • --show-argument-schema <flag-name>, which prints the JSON Schema for a single argument. Pass the kebab-case flag name without the leading --.
Both write pure JSON to stdout and never call the API.
# Schema for a new evaluator file
honeyhive metrics create --show-file-schema

# Schema for just the `criteria` field
honeyhive metrics create --show-argument-schema criteria
The output is plain JSON Schema. Pipe it to jq, save it to disk, or hand it to a coding agent so it can scaffold a valid file:
honeyhive metrics create --show-file-schema > .honeyhive/schemas/metric.schema.json
File schemas use the API field names (snake_case or camelCase, matching the public OpenAPI spec) rather than the CLI’s --kebab-case flag names. The CLI’s --filename flag passes the parsed file straight through to the API with no field translation.

Define an evaluator

Python evaluators are functions that score events on the server. To define one as a file, capture the same fields you would set in the Evaluators UI:
.honeyhive/evaluators/keyword-check.yaml
name: keyword-check
type: PYTHON
return_type: boolean
needs_ground_truth: false
description: Checks whether the response mentions the word "honey".
criteria: |
  def keyword_check():
      return "honey" in outputs["content"].lower()
LLM evaluators follow the same pattern. The prompt goes in criteria, and the model is selected with model_provider and model_name. return_type is float for numeric scores, boolean for pass/fail, string for free-form, or categorical when paired with a categories list:
.honeyhive/evaluators/relevance-llm.yaml
name: relevance-llm
type: LLM
return_type: float
model_provider: openai
model_name: gpt-4o
sampling_percentage: 25
description: Rates how well the answer addresses the question, 1-5.
criteria: |
  [Instruction]
  Rate the assistant's answer for relevance to the question on a scale of 1 to 5.

  [Question]
  {{ inputs.question }}

  [Answer]
  {{ outputs.content }}

  [Evaluation]
  Rating: [[X]]
To see every field the API accepts (categories, thresholds, child metrics for composites, event filters, etc.), inspect the schema:
honeyhive metrics create --show-file-schema | jq '.properties | keys'

Apply a file

Pass --filename (or -f) to send the entire file as the request body. The CLI picks the parser from the file extension and accepts .yaml, .yml, .json, and .jsonc (comments and trailing commas are allowed in both .json and .jsonc).
# Create the evaluator
honeyhive metrics create --filename .honeyhive/evaluators/keyword-check.yaml
# {
#   "inserted": true,
#   "metric_id": "01KRJB6SX9YA4J51NRFT6M27RC"
# }
The response shape varies per namespace (e.g. metrics create returns a flat {inserted, metric_id}, datasets create returns {result: {insertedId}}). Run the command once and pipe through jq to see the exact shape, or consult the CLI reference for that namespace. The response includes the assigned metric_id. The cleanest way to make subsequent applies idempotent is to write the ID back into the YAML itself as a top-level metric_id field, since metrics update reads it directly from the file body. A .honeyhive/state.json lockfile is a reasonable alternative when the YAML is generated by another tool and shouldn’t be mutated. To update an existing evaluator, include metric_id in the file and call metrics update:
.honeyhive/evaluators/keyword-check.yaml
metric_id: 01KRJB6SX9YA4J51NRFT6M27RC
name: keyword-check
type: PYTHON
return_type: boolean
description: Checks whether the response mentions the word "honey" or "hive".
criteria: |
  def keyword_check():
      content = outputs["content"].lower()
      return "honey" in content or "hive" in content
honeyhive metrics update --filename .honeyhive/evaluators/keyword-check.yaml
The same --filename flow works for datasets create, datasets update, datapoints create, datapoints update, and every other namespace. Run --show-file-schema on the command you want to use to see the exact shape.

Define a dataset

A dataset definition only needs a name, an optional description, and the datapoint IDs it should include:
.honeyhive/datasets/qa-eval-set.yaml
name: qa-eval-set
description: Question/answer pairs used for the relevance evaluator.
datapoints: []
Apply it the same way:
DATASET_ID=$(honeyhive datasets create \
  --filename .honeyhive/datasets/qa-eval-set.yaml \
  | jq -r '.result.insertedId')
For each datapoint, define inputs and ground truth and link it to the dataset on create:
.honeyhive/datapoints/q1.yaml
inputs:
  question: What is the capital of France?
ground_truth:
  answer: Paris
metadata:
  external_id: q1
linked_datasets:
  - "01KRJB7WD8E2H4M9X3K2Y7Q1A5"  # paste the dataset_id printed by the create above
honeyhive datapoints create --filename .honeyhive/datapoints/q1.yaml
Both sides of the dataset/datapoint relationship are writable: datasets create accepts an initial datapoints: [<id>...] array, and datapoints create accepts linked_datasets: [<id>...]. For the config-as-code flow, link from the datapoint side as shown above. It keeps the dataset YAML stable across runs and matches the order the API expects when you create resources from empty.
This page covers the shape of your dataset as code. For keeping the contents of a dataset in sync with an external source (S3, a database, an internal tool), see Sync from External Sources. The two patterns combine well: define the dataset metadata as code, then populate datapoints from an external system.

A simple sync script

The CLI does not ship with a single honeyhive apply command yet, so most teams wrap their .honeyhive/ directory in a small script. The script below upserts every evaluator under .honeyhive/evaluators/, tracking IDs in a checked-in .honeyhive/state.json lockfile. Commit the lockfile alongside your YAML so collaborators and CI share the same IDs; the alternative is to inline metric_id into each YAML on create and skip the lockfile entirely.
sync-evaluators.sh
#!/usr/bin/env bash
# Usage: HH_API_KEY=... bash sync-evaluators.sh
set -euo pipefail

STATE_FILE=".honeyhive/state.json"
[ -f "$STATE_FILE" ] || echo '{"evaluators": {}}' > "$STATE_FILE"

for file in .honeyhive/evaluators/*.yaml; do
  name=$(yq '.name' "$file")
  existing_id=$(jq -r --arg n "$name" '.evaluators[$n] // ""' "$STATE_FILE")

  if [ -n "$existing_id" ]; then
    # Update in place; CLI reads metric_id from the file body.
    # mktemp + .yaml suffix appended manually so this works on macOS (BSD
    # mktemp does not support --suffix).
    tmp="$(mktemp).yaml"
    yq ". + {\"metric_id\": \"$existing_id\"}" "$file" > "$tmp"
    honeyhive metrics update --filename "$tmp" > /dev/null
    rm -f "$tmp"
    echo "Updated $name ($existing_id)"
  else
    new_id=$(honeyhive metrics create --filename "$file" | jq -r '.metric_id')
    tmp=$(mktemp)
    jq --arg n "$name" --arg id "$new_id" \
      '.evaluators[$n] = $id' "$STATE_FILE" > "$tmp"
    mv "$tmp" "$STATE_FILE"
    echo "Created $name ($new_id)"
  fi
done
The CLI requires the file extension to match the parser (.yaml, .yml, .json, or .jsonc), which is why the update branch appends .yaml to mktemp’s output before writing.
Use --verbose (or HH_VERBOSE=true) when debugging to log the resolved API URL and masked key for each invocation. This makes it obvious whether a CI job is hitting staging or production.

Run from CI

Once .honeyhive/ is in version control, applying changes is a single CLI step. A minimal GitHub Actions job:
.github/workflows/honeyhive-sync.yml
name: Sync HoneyHive resources

on:
  push:
    branches: [main]
    paths: [".honeyhive/**"]

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install HoneyHive CLI
        # Pin to a specific release so CI runs are reproducible.
        run: curl -fsSL https://github.com/honeyhiveai/honeyhive-cli/releases/download/v1.0.0/install.sh | sh
      - name: Apply evaluators
        env:
          HH_API_KEY: ${{ secrets.HH_API_KEY }}
        run: bash sync-evaluators.sh
For staging/production parity, pass different HH_API_KEY and HH_API_URL values per environment without changing any file under .honeyhive/.

Validate before applying

Validate that a file matches the API’s schema before pushing changes by piping it through --show-file-schema plus a JSON Schema validator (ajv, check-jsonschema, etc.). For example, with check-jsonschema:
honeyhive metrics create --show-file-schema > /tmp/metric.schema.json
check-jsonschema --schemafile /tmp/metric.schema.json .honeyhive/evaluators/*.yaml
This catches typos, missing required fields, and invalid enum values locally, before the request reaches the API.

Use with coding agents

The schema introspection flags are designed for AI coding agents. An agent that wants to add a new evaluator can:
  1. Run honeyhive metrics create --show-file-schema to get the JSON Schema.
  2. Generate a YAML file that conforms to it, placed under .honeyhive/evaluators/.
  3. Apply it with honeyhive metrics create --filename ....
See AI Coding Agents for the pre-built HoneyHive Skills that bundle this workflow into agent-friendly slash commands.

HoneyHive CLI

Install the CLI and run your first commands.

CLI Command Reference

Every namespace, command, and flag in one place.

Sync Datasets from External Sources

Keep dataset contents in sync with S3, databases, or internal tools.

Python and LLM Evaluators

Background on the evaluator model the YAML files describe.