> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# End-to-End: Multi-Agent Tracing and Evaluation

> Build a multi-agent customer support bot with Google ADK, add HoneyHive observability, and evaluate agent quality

Build a multi-agent customer support system using [Google ADK](https://google.github.io/adk-docs/) and progressively add HoneyHive observability -- from auto-tracing through enrichment, custom spans, and evaluation.

**What you'll build**: A coordinator agent that routes customer queries to specialist sub-agents (billing and technical support), with full tracing and an evaluation pipeline.

<Card title="Get the code" icon="github" href="https://github.com/honeyhiveai/cookbook/tree/main/google-adk-cookbook">
  Clone the cookbook repo to run it yourself
</Card>

**Prerequisites**:

* Python 3.11+
* A [HoneyHive account](https://app.us.honeyhive.ai) (grab your API key from the dashboard)
* A [Google AI API key](https://aistudio.google.com/apikey)
* An [OpenAI API key](https://platform.openai.com/api-keys) (for the evaluation step only)

## Setup

```bash theme={null}
git clone https://github.com/honeyhiveai/cookbook.git
cd cookbook/google-adk-cookbook
pip install -r requirements.txt
```

Create a `.env` file:

```bash theme={null}
HH_API_KEY=your-honeyhive-api-key
HH_PROJECT=your-project-name
GOOGLE_API_KEY=your-google-ai-api-key
OPENAI_API_KEY=your-openai-key  # only needed for evaluate.py
```

## Walkthrough

<Steps>
  <Step title="Define agents and add tracing">
    The app has three agents: a coordinator that routes queries, and two specialists with tools.

    ```python theme={null}
    from google.adk.agents import Agent

    billing_agent = Agent(
        name="billing_agent",
        model="gemini-2.0-flash",
        description="Handles billing inquiries including account balances, "
        "recent charges, invoice questions, and refund status.",
        instruction="You are a billing support specialist...",
        tools=[lookup_billing],
    )

    technical_agent = Agent(
        name="technical_agent",
        model="gemini-2.0-flash",
        description="Handles technical support including product bugs, "
        "feature questions, error messages, and troubleshooting.",
        instruction="You are a technical support specialist...",
        tools=[search_knowledge_base],
    )

    coordinator = Agent(
        name="customer_support",
        model="gemini-2.0-flash",
        description="Customer support coordinator that routes queries to specialists.",
        instruction="Route billing questions to billing_agent, "
        "technical issues to technical_agent. Always delegate.",
        sub_agents=[billing_agent, technical_agent],
    )
    ```

    The coordinator uses ADK's native [LLM delegation](https://google.github.io/adk-docs/agents/multi-agents/) -- the model reads each sub-agent's `description` and decides where to route.

    **Adding HoneyHive tracing takes 4 lines.** Initialize the tracer and instrumentor before running your agents:

    ```python highlight={1-2,4-8,10} theme={null}
    from honeyhive import HoneyHiveTracer
    from openinference.instrumentation.google_adk import GoogleADKInstrumentor

    tracer = HoneyHiveTracer.init(
        api_key=os.getenv("HH_API_KEY"),
        project=os.getenv("HH_PROJECT"),
        session_name="customer-support",
    )

    GoogleADKInstrumentor().instrument(tracer_provider=tracer.provider)
    ```

    Run it:

    ```bash theme={null}
    python main.py
    ```

    In HoneyHive, you'll see the full trace hierarchy: coordinator routing decisions, sub-agent delegation, LLM calls, and tool executions -- all captured automatically with no changes to your agent code.
  </Step>

  <Step title="Enrich traces with business context">
    Auto-tracing captures the agent mechanics, but you also need business context to make traces useful in production: which user made this request, what environment is this, what version of your app.

    ```python highlight={1-8} theme={null}
    tracer.enrich_session(
        user_properties={
            "user_id": "customer_42",
            "plan": "enterprise",
        },
        metadata={
            "environment": "production",
            "app_version": "2.1.0",
        },
    )
    ```

    This attaches to the session, so every trace in this session carries the context. In HoneyHive you can now:

    * **Filter** traces by user, plan, or environment
    * **Search** for all traces from a specific customer
    * **Compare** behavior across app versions
  </Step>

  <Step title="Add custom spans for business logic">
    The instrumentor captures ADK internals automatically. For your own code that wraps around agent calls -- input validation, formatting, orchestration -- use the `@trace()` decorator:

    ```python highlight={1,3} theme={null}
    from honeyhive import trace

    @trace()
    def preprocess_query(raw_query: str) -> str:
        """Normalize and validate customer input before sending to the agent."""
        cleaned = raw_query.strip()
        if len(cleaned) < 3:
            return "Could you please provide more details about your question?"
        return cleaned
    ```

    This creates a span in the trace for `preprocess_query`, so you can see exactly how long input processing takes alongside the agent spans.

    **When to add custom spans vs rely on auto-instrumentation:**

    | Use auto-instrumentation for      | Use `@trace()` for                           |
    | --------------------------------- | -------------------------------------------- |
    | LLM calls, tool calls, agent runs | Input validation, output formatting          |
    | Anything inside the ADK framework | Database lookups, API calls to your services |
    | Token usage, latency per call     | Business logic that wraps agent calls        |
  </Step>

  <Step title="Evaluate agent quality">
    Once your agent is traced, you want to know: is it actually good? The `evaluate.py` script runs the agent against a test dataset and measures quality.

    **Define a dataset** of customer queries with expected categories:

    ```python theme={null}
    dataset = [
        {
            "inputs": {"query": "I was charged $24.50 but I thought that was refunded?"},
            "ground_truth": {"category": "billing"},
        },
        {
            "inputs": {"query": "The export button gives me error 500."},
            "ground_truth": {"category": "technical"},
        },
        # ... 8 queries total
    ]
    ```

    **Define evaluators** -- functions that score each response:

    ```python theme={null}
    def response_quality(outputs, inputs, ground_truth):
        """Use an LLM to judge whether the agent response is helpful."""
        # Calls GPT-4o-mini to score the response 0.0 - 1.0
        # based on accuracy, helpfulness, and tone
        ...

    def correct_routing(outputs, inputs, ground_truth):
        """Check if the query was routed to the right specialist."""
        # LLM judge that verifies the right specialist handled the query
        ...
    ```

    **Run the experiment:**

    ```python theme={null}
    from honeyhive import evaluate

    result = evaluate(
        function=run_support_agent,
        dataset=dataset,
        evaluators=[response_quality, correct_routing],
        name="customer-support-eval",
    )
    ```

    ```bash theme={null}
    python evaluate.py
    ```

    HoneyHive's `evaluate()` runs your agent against every datapoint, applies each evaluator, and uploads the results. You can view them in the Experiments UI to see scores per query, aggregate metrics, and compare across runs.
  </Step>
</Steps>

## Next steps

<CardGroup cols={2}>
  <Card title="Production Deployment" icon="rocket" href="/v2/tutorials/production-deployment">
    Configure tracing for serverless, web servers, and Kubernetes
  </Card>

  <Card title="Distributed Tracing" icon="share-nodes" href="/v2/tutorials/distributed-tracing">
    Trace agents across service boundaries
  </Card>

  <Card title="Experiments Quickstart" icon="flask" href="/v2/introduction/experiments-quickstart">
    Deep dive into HoneyHive's evaluation framework
  </Card>

  <Card title="Other Frameworks" icon="plug" href="/v2/integrations/langchain">
    LangChain, LangGraph, Strands, CrewAI, and more
  </Card>
</CardGroup>
