Get the code
Clone the cookbook repo to run it yourself
- Python 3.11+
- A HoneyHive account (grab your API key from the dashboard)
- A Google AI API key
- An OpenAI API key (for the evaluation step only)
Setup
.env file:
Walkthrough
Define agents and add tracing
The app has three agents: a coordinator that routes queries, and two specialists with tools.The coordinator uses ADK’s native LLM delegation — the model reads each sub-agent’s Run it:In HoneyHive, you’ll see the full trace hierarchy: coordinator routing decisions, sub-agent delegation, LLM calls, and tool executions — all captured automatically with no changes to your agent code.
description and decides where to route.Adding HoneyHive tracing takes 4 lines. Initialize the tracer and instrumentor before running your agents:Enrich traces with business context
Auto-tracing captures the agent mechanics, but you also need business context to make traces useful in production: which user made this request, what environment is this, what version of your app.This attaches to the session, so every trace in this session carries the context. In HoneyHive you can now:
- Filter traces by user, plan, or environment
- Search for all traces from a specific customer
- Compare behavior across app versions
Add custom spans for business logic
The instrumentor captures ADK internals automatically. For your own code that wraps around agent calls — input validation, formatting, orchestration — use the This creates a span in the trace for
@trace() decorator:preprocess_query, so you can see exactly how long input processing takes alongside the agent spans.When to add custom spans vs rely on auto-instrumentation:| Use auto-instrumentation for | Use @trace() for |
|---|---|
| LLM calls, tool calls, agent runs | Input validation, output formatting |
| Anything inside the ADK framework | Database lookups, API calls to your services |
| Token usage, latency per call | Business logic that wraps agent calls |
Evaluate agent quality
Once your agent is traced, you want to know: is it actually good? The Define evaluators — functions that score each response:Run the experiment:HoneyHive’s
evaluate.py script runs the agent against a test dataset and measures quality.Define a dataset of customer queries with expected categories:evaluate() runs your agent against every datapoint, applies each evaluator, and uploads the results. You can view them in the Experiments UI to see scores per query, aggregate metrics, and compare across runs.
