Skip to main content
Build a multi-agent customer support system using Google ADK and progressively add HoneyHive observability — from auto-tracing through enrichment, custom spans, and evaluation. What you’ll build: A coordinator agent that routes customer queries to specialist sub-agents (billing and technical support), with full tracing and an evaluation pipeline.

Get the code

Clone the cookbook repo to run it yourself
Prerequisites:

Setup

git clone https://github.com/honeyhiveai/cookbook.git
cd cookbook/google-adk-cookbook
pip install -r requirements.txt
Create a .env file:
HH_API_KEY=your-honeyhive-api-key
HH_PROJECT=your-project-name
GOOGLE_API_KEY=your-google-ai-api-key
OPENAI_API_KEY=your-openai-key  # only needed for evaluate.py

Walkthrough

1

Define agents and add tracing

The app has three agents: a coordinator that routes queries, and two specialists with tools.
from google.adk.agents import Agent

billing_agent = Agent(
    name="billing_agent",
    model="gemini-2.0-flash",
    description="Handles billing inquiries including account balances, "
    "recent charges, invoice questions, and refund status.",
    instruction="You are a billing support specialist...",
    tools=[lookup_billing],
)

technical_agent = Agent(
    name="technical_agent",
    model="gemini-2.0-flash",
    description="Handles technical support including product bugs, "
    "feature questions, error messages, and troubleshooting.",
    instruction="You are a technical support specialist...",
    tools=[search_knowledge_base],
)

coordinator = Agent(
    name="customer_support",
    model="gemini-2.0-flash",
    description="Customer support coordinator that routes queries to specialists.",
    instruction="Route billing questions to billing_agent, "
    "technical issues to technical_agent. Always delegate.",
    sub_agents=[billing_agent, technical_agent],
)
The coordinator uses ADK’s native LLM delegation — the model reads each sub-agent’s description and decides where to route.Adding HoneyHive tracing takes 4 lines. Initialize the tracer and instrumentor before running your agents:
from honeyhive import HoneyHiveTracer
from openinference.instrumentation.google_adk import GoogleADKInstrumentor

tracer = HoneyHiveTracer.init(
    api_key=os.getenv("HH_API_KEY"),
    project=os.getenv("HH_PROJECT"),
    session_name="customer-support",
)

GoogleADKInstrumentor().instrument(tracer_provider=tracer.provider)
Run it:
python main.py
In HoneyHive, you’ll see the full trace hierarchy: coordinator routing decisions, sub-agent delegation, LLM calls, and tool executions — all captured automatically with no changes to your agent code.
2

Enrich traces with business context

Auto-tracing captures the agent mechanics, but you also need business context to make traces useful in production: which user made this request, what environment is this, what version of your app.
tracer.enrich_session(
    user_properties={
        "user_id": "customer_42",
        "plan": "enterprise",
    },
    metadata={
        "environment": "production",
        "app_version": "2.1.0",
    },
)
This attaches to the session, so every trace in this session carries the context. In HoneyHive you can now:
  • Filter traces by user, plan, or environment
  • Search for all traces from a specific customer
  • Compare behavior across app versions
3

Add custom spans for business logic

The instrumentor captures ADK internals automatically. For your own code that wraps around agent calls — input validation, formatting, orchestration — use the @trace() decorator:
from honeyhive import trace

@trace()
def preprocess_query(raw_query: str) -> str:
    """Normalize and validate customer input before sending to the agent."""
    cleaned = raw_query.strip()
    if len(cleaned) < 3:
        return "Could you please provide more details about your question?"
    return cleaned
This creates a span in the trace for preprocess_query, so you can see exactly how long input processing takes alongside the agent spans.When to add custom spans vs rely on auto-instrumentation:
Use auto-instrumentation forUse @trace() for
LLM calls, tool calls, agent runsInput validation, output formatting
Anything inside the ADK frameworkDatabase lookups, API calls to your services
Token usage, latency per callBusiness logic that wraps agent calls
4

Evaluate agent quality

Once your agent is traced, you want to know: is it actually good? The evaluate.py script runs the agent against a test dataset and measures quality.Define a dataset of customer queries with expected categories:
dataset = [
    {
        "inputs": {"query": "I was charged $24.50 but I thought that was refunded?"},
        "ground_truth": {"category": "billing"},
    },
    {
        "inputs": {"query": "The export button gives me error 500."},
        "ground_truth": {"category": "technical"},
    },
    # ... 8 queries total
]
Define evaluators — functions that score each response:
def response_quality(outputs, inputs, ground_truth):
    """Use an LLM to judge whether the agent response is helpful."""
    # Calls GPT-4o-mini to score the response 0.0 - 1.0
    # based on accuracy, helpfulness, and tone
    ...

def correct_routing(outputs, inputs, ground_truth):
    """Check if the query was routed to the right specialist."""
    # LLM judge that verifies the right specialist handled the query
    ...
Run the experiment:
from honeyhive import evaluate

result = evaluate(
    function=run_support_agent,
    dataset=dataset,
    evaluators=[response_quality, correct_routing],
    name="customer-support-eval",
)
python evaluate.py
HoneyHive’s evaluate() runs your agent against every datapoint, applies each evaluator, and uploads the results. You can view them in the Experiments UI to see scores per query, aggregate metrics, and compare across runs.

Next steps