Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt

Use this file to discover all available pages before exploring further.

This guide shows you how to log metrics (evaluation scores, guardrail results) computed in your application code.
When to use client-side metrics: Guardrails (format validation, safety checks, PII detection) are ideal to compute client-side at execution time rather than server-side post-ingestion.

Quick Start

Use enrich_session() to add metrics to the entire trace, or enrich_span() to add metrics to a specific operation.

On the Session

Add metrics that apply to the entire trace:
from honeyhive import HoneyHiveTracer
import os

tracer = HoneyHiveTracer.init(
    api_key=os.getenv("HH_API_KEY"),
    project=os.getenv("HH_PROJECT"),
)

# ... your application logic ...

tracer.enrich_session(metrics={
    "json_valid": True,
    "response_length": 150,
    "safety_score": 0.98,
})

On a Span

Add metrics to a specific function or operation:
from honeyhive import HoneyHiveTracer, trace
import os

tracer = HoneyHiveTracer.init(
    api_key=os.getenv("HH_API_KEY"),
    project=os.getenv("HH_PROJECT"),
)

@trace
def generate_response(query: str):
    response = call_llm(query)
    
    # Compute metrics
    tracer.enrich_span(metrics={
        "contains_pii": check_pii(response),
        "relevance_score": compute_relevance(query, response),
        "word_count": len(response.split()),
    })
    
    return response

Concepts

Client-Side vs Server-Side Evaluations

AspectClient-SideServer-Side
WhenDuring executionAfter ingestion
LatencyAdds to request timeNo impact on request
Best forGuardrails, format checksLLM-as-judge, complex evals
SetupCode in your appConfigure in HoneyHive
Client-side metrics are not overwritten by server-side evaluators with the same name.

Metrics Schema

The metrics object accepts any structure:
{
  "json_valid": true,
  "relevance_score": 0.85,
  "latency_ms": 250,
  "step_evals": [
    { "step": 1, "passed": true },
    { "step": 2, "passed": false }
  ]
}

Data Types

TypeAvailable MeasurementsUse Case
BooleanTrue/False percentagePass/fail checks
NumberSum, Avg, Median, Min, Max, P95, P98, P99Scores, latencies
StringFilters and group byClassifications

Nested Data

Access nested fields when charting: metrics.step_evals.0.passed
Nesting limits: Max 5 levels of nested objects, max 2 levels of nested arrays.

Learn More

Chart metrics

Visualize metrics in dashboards

Server-side evaluators

Run evaluations post-ingestion

LLM evaluators

Use LLMs to evaluate outputs

Human annotations

Set up expert review queues

SDK Reference