Skip to main content
This guide shows you how to log metrics (evaluation scores, guardrail results) computed in your application code.
When to use client-side metrics: Guardrails (format validation, safety checks, PII detection) are ideal to compute client-side at execution time rather than server-side post-ingestion.

Quick Start

Use enrich_session() to add metrics to the entire trace, or enrich_span() to add metrics to a specific operation.

On the Session

Add metrics that apply to the entire trace:
from honeyhive import HoneyHiveTracer
import os

tracer = HoneyHiveTracer.init(
    api_key=os.getenv("HH_API_KEY"),
    project=os.getenv("HH_PROJECT"),
)

# ... your application logic ...

tracer.enrich_session(metrics={
    "json_valid": True,
    "response_length": 150,
    "safety_score": 0.98,
})

On a Span

Add metrics to a specific function or operation:
from honeyhive import HoneyHiveTracer, trace
import os

tracer = HoneyHiveTracer.init(
    api_key=os.getenv("HH_API_KEY"),
    project=os.getenv("HH_PROJECT"),
)

@trace
def generate_response(query: str):
    response = call_llm(query)
    
    # Compute metrics
    tracer.enrich_span(metrics={
        "contains_pii": check_pii(response),
        "relevance_score": compute_relevance(query, response),
        "word_count": len(response.split()),
    })
    
    return response

Concepts

Client-Side vs Server-Side Evaluations

AspectClient-SideServer-Side
WhenDuring executionAfter ingestion
LatencyAdds to request timeNo impact on request
Best forGuardrails, format checksLLM-as-judge, complex evals
SetupCode in your appConfigure in HoneyHive
Client-side metrics are not overwritten by server-side evaluators with the same name.

Metrics Schema

The metrics object accepts any structure:
{
  "json_valid": true,
  "relevance_score": 0.85,
  "latency_ms": 250,
  "step_evals": [
    { "step": 1, "passed": true },
    { "step": 2, "passed": false }
  ]
}

Data Types

TypeAvailable MeasurementsUse Case
BooleanTrue/False percentagePass/fail checks
NumberSum, Avg, Median, Min, Max, P95, P98, P99Scores, latencies
StringFilters and group byClassifications

Nested Data

Access nested fields when charting: metrics.step_evals.0.passed
Nesting limits: Max 5 levels of nested objects, max 2 levels of nested arrays.

Learn More

SDK Reference