Monitoring
Online Evaluations
How to configure online evaluations to monitor your application.
Online evaluations allow you to define domain-specific metrics that can be computed to evaluate your logs asynchronously.
Use encourage using
Sampling
to prevent costs associated with model-graded evaluations at production scaleLLM Evaluators
- What: LLM functions scoring semantic qualities.
- Why: Measure tone, creativity, persuasiveness—things usage metrics miss.
- How: Create LLM Evaluators
Python Evaluators
- What: Code-defined metrics for precise or complex measurements.
- Why: Compute linguistic metrics, domain-specific scores, etc.
- How: Create Python Evaluators