HoneyHive Docs

Build a multi-agent customer support system using Google ADK and progressively add HoneyHive observability — from auto-tracing through enrichment, custom spans, and evaluation. What you’ll build: A coordinator agent that routes customer queries to specialist sub-agents (billing and technical support), with full tracing and an evaluation pipeline.

Get the code

Clone the cookbook repo to run it yourself

Prerequisites:

Python 3.11+
A HoneyHive account (grab your API key from the dashboard)
A Google AI API key
An OpenAI API key (for the evaluation step only)

Setup

git clone https://github.com/honeyhiveai/cookbook.git
cd cookbook/google-adk-cookbook
pip install -r requirements.txt

Create a .env file:

HH_API_KEY=your-honeyhive-api-key
HH_PROJECT=your-project-name
GOOGLE_API_KEY=your-google-ai-api-key
OPENAI_API_KEY=your-openai-key  # only needed for evaluate.py

Walkthrough

Define agents and add tracing

The app has three agents: a coordinator that routes queries, and two specialists with tools.

from google.adk.agents import Agent

billing_agent = Agent(
    name="billing_agent",
    model="gemini-2.0-flash",
    description="Handles billing inquiries including account balances, "
    "recent charges, invoice questions, and refund status.",
    instruction="You are a billing support specialist...",
    tools=[lookup_billing],
)

technical_agent = Agent(
    name="technical_agent",
    model="gemini-2.0-flash",
    description="Handles technical support including product bugs, "
    "feature questions, error messages, and troubleshooting.",
    instruction="You are a technical support specialist...",
    tools=[search_knowledge_base],
)

coordinator = Agent(
    name="customer_support",
    model="gemini-2.0-flash",
    description="Customer support coordinator that routes queries to specialists.",
    instruction="Route billing questions to billing_agent, "
    "technical issues to technical_agent. Always delegate.",
    sub_agents=[billing_agent, technical_agent],
)

The coordinator uses ADK’s native LLM delegation — the model reads each sub-agent’s description and decides where to route.Adding HoneyHive tracing takes 4 lines. Initialize the tracer and instrumentor before running your agents:

from honeyhive import HoneyHiveTracer
from openinference.instrumentation.google_adk import GoogleADKInstrumentor

tracer = HoneyHiveTracer.init(
    api_key=os.getenv("HH_API_KEY"),
    project=os.getenv("HH_PROJECT"),
    session_name="customer-support",
)

GoogleADKInstrumentor().instrument(tracer_provider=tracer.provider)

Run it:

python main.py

In HoneyHive, you’ll see the full trace hierarchy: coordinator routing decisions, sub-agent delegation, LLM calls, and tool executions — all captured automatically with no changes to your agent code.

Enrich traces with business context

Auto-tracing captures the agent mechanics, but you also need business context to make traces useful in production: which user made this request, what environment is this, what version of your app.

tracer.enrich_session(
    user_properties={
        "user_id": "customer_42",
        "plan": "enterprise",
    },
    metadata={
        "environment": "production",
        "app_version": "2.1.0",
    },
)

This attaches to the session, so every trace in this session carries the context. In HoneyHive you can now:

Filter traces by user, plan, or environment
Search for all traces from a specific customer
Compare behavior across app versions

Add custom spans for business logic

The instrumentor captures ADK internals automatically. For your own code that wraps around agent calls — input validation, formatting, orchestration — use the @trace() decorator:

from honeyhive import trace

@trace()
def preprocess_query(raw_query: str) -> str:
    """Normalize and validate customer input before sending to the agent."""
    cleaned = raw_query.strip()
    if len(cleaned) < 3:
        return "Could you please provide more details about your question?"
    return cleaned

This creates a span in the trace for preprocess_query, so you can see exactly how long input processing takes alongside the agent spans.When to add custom spans vs rely on auto-instrumentation:

Use auto-instrumentation for	Use `@trace()` for
LLM calls, tool calls, agent runs	Input validation, output formatting
Anything inside the ADK framework	Database lookups, API calls to your services
Token usage, latency per call	Business logic that wraps agent calls

Evaluate agent quality

Once your agent is traced, you want to know: is it actually good? The evaluate.py script runs the agent against a test dataset and measures quality.Define a dataset of customer queries with expected categories:

dataset = [
    {
        "inputs": {"query": "I was charged $24.50 but I thought that was refunded?"},
        "ground_truth": {"category": "billing"},
    },
    {
        "inputs": {"query": "The export button gives me error 500."},
        "ground_truth": {"category": "technical"},
    },
    # ... 8 queries total
]

Define evaluators — functions that score each response:

def response_quality(outputs, inputs, ground_truth):
    """Use an LLM to judge whether the agent response is helpful."""
    # Calls GPT-4o-mini to score the response 0.0 - 1.0
    # based on accuracy, helpfulness, and tone
    ...

def correct_routing(outputs, inputs, ground_truth):
    """Check if the query was routed to the right specialist."""
    # LLM judge that verifies the right specialist handled the query
    ...

Run the experiment:

from honeyhive import evaluate

result = evaluate(
    function=run_support_agent,
    dataset=dataset,
    evaluators=[response_quality, correct_routing],
    name="customer-support-eval",
)

python evaluate.py

HoneyHive’s evaluate() runs your agent against every datapoint, applies each evaluator, and uploads the results. You can view them in the Experiments UI to see scores per query, aggregate metrics, and compare across runs.

Next steps

Production Deployment

Configure tracing for serverless, web servers, and Kubernetes

Distributed Tracing

Trace agents across service boundaries

Experiments Quickstart

Deep dive into HoneyHive’s evaluation framework

Other Frameworks

LangChain, LangGraph, Strands, CrewAI, and more

Getting Started

Observability

Evaluation

Prompt Management

Administration

Learn More

End-to-End: Multi-Agent Tracing and Evaluation

Get the code

Setup

Walkthrough

Next steps

Production Deployment

Distributed Tracing

Experiments Quickstart

Other Frameworks

Getting Started

Observability

Evaluation

Prompt Management

Administration

Learn More

Get the code

​Setup

​Walkthrough

​Next steps

Production Deployment

Distributed Tracing

Experiments Quickstart

Other Frameworks

Setup

Walkthrough

Next steps