Online evaluations allow you to define domain-specific metrics that can be computed to evaluate your logs asynchronously.Documentation Index
Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
Use this file to discover all available pages before exploring further.
Use encourage using
Sampling to prevent costs associated with model-graded evaluations at production scaleLLM Evaluators
- What: LLM functions scoring semantic qualities.
- Why: Measure tone, creativity, persuasiveness—things usage metrics miss.
- How: Create LLM Evaluators
Python Evaluators
- What: Code-defined metrics for precise or complex measurements.
- Why: Compute linguistic metrics, domain-specific scores, etc.
- How: Create Python Evaluators
LLM Evaluators
Measure the immeasurable with LLM scorers.
Python Evaluators
Ultimate flexibility with custom Python scorers.

