Quick start
You need:- A HoneyHive API key in
HH_API_KEY - Model provider credentials for the ASSERT pipeline, such as
OPENAI_API_KEYor Azure OpenAI credentials
.env file:
Wrap your agent as an ASSERT target
ASSERT’s recommended integration for agents is a callable target: a Python function with the signaturechat(message, history=None) that ASSERT drives through generated conversations. HoneyHive captures the runs with the standard OpenInference instrumentor: initialize the tracer and instrument once at module top, and every OpenAI call your agent makes is traced.
history argument follows the OpenAI chat-messages format and already contains the current user turn at history[-1], so pass it straight through and don’t re-append message.
This example uses the OpenAI SDK directly. For another provider or framework, swap in the matching OpenInference instrumentor and keep the chat_sync(message, history=None) boundary. If your app already initializes HoneyHive and instrumentors at startup, import that setup here instead of duplicating it.
Configure ASSERT
Save this aseval_config.yaml. Point pipeline.inference.target.callable at the wrapper and enable target.trace:
artifacts/results/support-agent-v1/baseline-1/) when the pipeline completes.
What you see in HoneyHive
Eachassert-ai run produces one HoneyHive session (named by session_name) on the project tied to HH_API_KEY. Open the Traces page to inspect it. The session contains the full inference and judging loop:
- Your agent under named
support_agentchain spans, each grouping the model calls for one turn. - ASSERT’s tester (the simulated user) and judge calls, captured automatically because they run through the OpenAI SDK. These appear as ungrouped
ChatCompletionevents alongside your chains.
ASSERT runs the tester and judge through LiteLLM. They show up here because LiteLLM routes OpenAI models through the OpenAI SDK that
OpenAIInstrumentor patches. If you point tester.model or judge.model at a non-OpenAI provider, add the LiteLLM instrumentor so those calls are still captured:
What ASSERT adds
| Stage | ASSERT artifact | How it helps with HoneyHive |
|---|---|---|
systematize | taxonomy.json | Converts your behavior spec into explicit failure categories. |
test_set | test_set.jsonl | Generates single-turn prompts and multi-turn scenarios you can review before broadening coverage. |
inference | inference_set.jsonl | Runs generated cases against your HoneyHive-traced callable. |
judge | scores.jsonl, metrics.json | Scores policy violations and overrefusals with trace evidence you can debug in HoneyHive. |
Operational notes
- Keep tools safe. ASSERT can generate adversarial and multi-turn probes. Use sandboxed tools, scoped credentials, and synthetic data for evaluation runs.
- Review generated artifacts. Read
taxonomy.jsonandtest_set.jsonlbefore trusting the final score. - Use new run IDs or force stages. ASSERT does not overwrite every artifact automatically. Change
runor run from the earliest changed stage with--force-stage. - Start small. Lower
sample_sizeandmax_turnswhile validating the wrapper, then expand coverage once traces and scores look correct.
Related
OpenAI integration
The OpenInference instrumentor pattern used here
Experiments quickstart
Run HoneyHive evaluations with
evaluate()Framework attribute mapping
See how OpenInference spans map into HoneyHive
Tracer initialization
Choose where to initialize HoneyHive tracing

