| Workflow | When to Use | How |
|---|---|---|
| Adding Metrics to Traces | Production monitoring, guardrails | enrich_span(), enrich_session() |
| Evaluator Functions for Experiments | Testing against datasets, CI/CD | Define functions, pass to evaluate() |
Adding Metrics to Traces
Compute scores in your application code and attach them to traces for monitoring and analysis. Use cases: Format validation, safety checks, PII detection, latency tracking, relevance scores.For complete documentation on adding metrics to traces, see Custom Metrics.
Evaluator Functions for Experiments
Define scoring functions that run locally duringevaluate() to score outputs against expected results.
Writing an Evaluator
Evaluators receive three arguments and return a score:Running Evaluators
Pass evaluator functions toevaluate():
For a complete tutorial with real examples, see Run Your First Experiment.
Evaluating Multi-Step Pipelines
For pipelines with multiple steps, combine both approaches:- Session-level: Pass evaluators to
evaluate()for overall scoring - Span-level: Use
enrich_span()within traced functions for step-specific metrics
answer_qualityscores at the session levelretrieval_score,num_docs,answer_lengthat the span level

