HoneyHive Docs

Online evaluations allow you to define domain-specific metrics that can be computed to evaluate your logs asynchronously.

Use encourage using Sampling to prevent costs associated with model-graded evaluations at production scale

LLM Evaluators

What: LLM functions scoring semantic qualities.
Why: Measure tone, creativity, persuasiveness—things usage metrics miss.
How: Create LLM Evaluators

Python Evaluators

What: Code-defined metrics for precise or complex measurements.
Why: Compute linguistic metrics, domain-specific scores, etc.
How: Create Python Evaluators

LLM Evaluators

Measure the immeasurable with LLM scorers.

Python Evaluators

Ultimate flexibility with custom Python scorers.

IntroductionAn overview of HoneyHive evaluators

On this page

LLM Evaluators
Python Evaluators