> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> An overview of HoneyHive evaluators

Evaluators are tests that measure the quality of inputs and outputs for your AI application or specific steps within it.
They serve as a crucial component for validating whether your models meet performance criteria and align with domain expertise.
Whether you're fine-tuning prompts, comparing different generative models, or monitoring production systems, evaluators help maintain high standards through systematic testing and measurement.

## Key characteristics of HoneyHive evaluators

HoneyHive provides a flexible and comprehensive evaluation framework that can be adapted to various needs and scenarios:

### Development Stages

* **Offline Evaluation**: Used during development and testing phases, including CI/CD pipelines and debugging sessions. You build test suites of curated scenarios (with or without ground truths) and run them via [`evaluate()`](/v2/evaluation/concepts). Evaluators can be client-side (run in your code) or server-side (run on HoneyHive after trace ingestion).
* **Online Evaluation**: Evaluators that run automatically on ingested traces to continuously monitor quality. When you enable a server-side evaluator and configure [event filters](/v2/evaluators/llm#event-filters), it runs on all matching traces — both production and experiment — without any code changes.

<Note> For an example of an offline evaluation with client-side evaluators, see [how to run an experiment here](/v2/introduction/experiments-quickstart). </Note>

### Implementation Methods

Evaluators can be implemented using three primary methods:

* **Python Code Evaluators**: Custom functions that programmatically assess outputs based on specific criteria, such as format validation, content checks, or metric calculations.
* **LLM-Assisted Evaluators**: Leverage language models to perform qualitative assessments, such as checking for coherence, relevance, or alignment with requirements.
* **Domain Expert (Human) Evaluators**: Enable subject matter experts to provide direct feedback and assessments through the HoneyHive platform.

### Execution Environment

Evaluators can be run either locally (client-side) or remotely (server-side), each with its own set of advantages and use cases.

<Frame caption="Comparison of Client-side and Server-side Evaluators">
  <img src="https://mintcdn.com/honeyhiveai/qmpHooEVX6j-ieIE/images/client-server-eval.png?fit=max&auto=format&n=qmpHooEVX6j-ieIE&q=85&s=bc8e3abbd3534abb59d0b9111c407e4e" alt="Diagram comparing client-side evaluators (run in your application, send traces with metrics) versus server-side evaluators (run on HoneyHive infrastructure after trace ingestion)" width="1274" height="1225" data-path="images/client-server-eval.png" />
</Frame>

* **Client-Side Execution**: Evaluators run locally within your application environment, providing immediate feedback and integration with your existing infrastructure.
  * Pros:
    * Quick validations and guardrails
    * Offline experiments and CI/CD pipelines
    * Real-time format checks and PII detection
  * Cons:
    * Limited by local resources and lack centralized management.

<Note>
  See [Client-side Evaluators](/v2/evaluators/client_side) for how to use client-side evaluators in both tracing and experiments scenarios.
</Note>

* **Server-Side Execution**: Evaluators operate remotely on HoneyHive's infrastructure.
  * Pros:
    * Asynchronous processing for resource-intensive tasks
    * Centralized management and versioning
    * Better scalability for large datasets
    * Support for human evaluations and post-ingestion analysis
  * Cons:
    * Higher latency since results aren't immediately available.

<Note>If you want to know more about how to set up server-side Python, LLM, or Human-based evaluators, please refer to the [Python evaluator](/v2/evaluators/python), [LLM Evaluator](/v2/evaluators/llm), [Human Annotation](/v2/evaluators/human) pages.</Note>

### Evaluation Scope

HoneyHive provides flexible granularity in evaluation, allowing you to:

* Assess entire end-to-end pipelines
* Evaluate individual steps within your application flow
* Monitor specific components such as model calls, tool usage, or chain execution
* Track and evaluate sessions that group multiple operations together

For example, in a RAG pipeline you can evaluate the overall response quality at the session level, while also measuring retrieval relevance and generation accuracy at individual step levels.

<Note>For code examples showing how to add metrics at different scopes, see [Client-Side Evaluators](/v2/evaluators/client_side) and [Custom Metrics](/v2/tracing/client-side-evals).</Note>
