HoneyHive Docs

Python evaluators let you write custom evaluation logic that runs on HoneyHive’s infrastructure. Use them for format validation, metric calculations, or any programmatic assessment of your AI outputs.

Creating a Python Evaluator

Navigate to the Evaluators tab in the HoneyHive console.
Click Add Evaluator and select Python Evaluator.

HoneyHive Python evaluator creation interface showing code editor

HoneyHive’s server-side Python evaluators have access to Python’s standard library and packages including pandas, scikit-learn, jsonschema, sqlglot, and requests.

Event Schema

Python evaluators operate on event objects representing spans in your traces.

Property	Description	Example
`inputs`	Input data for the event	`event["inputs"]["query"]`
`outputs`	Output data from the event	`event["outputs"]["content"]`
`feedback`	User feedback and ground truth	`event["feedback"]["ground_truth"]`
`metadata`	Additional event metadata	`event["metadata"]["model"]`
`event_type`	Type: `model`, `tool`, `chain`, or `session`	`event["event_type"]`
`event_name`	Name of the specific event	`event["event_name"]`

Click Show Schema in the evaluator console to see all available properties for your events.

Evaluator Function

Define your evaluation logic in a Python function:

def check_unwanted_phrases(event):
    unwanted_phrases = ["As an AI language model", "I'm sorry, but I can't", "I don't have personal opinions"]
    model_completion = event["outputs"]["content"]
    return not any(phrase.lower() in model_completion.lower() for phrase in unwanted_phrases)

result = check_unwanted_phrases(event)

Looking for ready-made examples? Check out our Python Evaluator Templates.

Resource limits: Python evaluators have a 1GB memory limit and 30-second timeout. Optimize your code to stay within these constraints.

Configuration

Event Filters

Filter which events this evaluator runs on by Event Type and Event Name. Use this to target specific spans in your pipeline (e.g., only model events named generate_response).

Return Type

Boolean: For true/false evaluations
Numeric: For scores or ratings (configure the scale, e.g., 1-5)
String: For categorical outputs

Passing Range

Define pass/fail criteria for your evaluator. Useful for CI builds and detecting failed test cases.

Advanced Settings

Expand to configure:

Requires Ground Truth: Enable if your evaluator needs feedback.ground_truth

Click Create to save your evaluator.

Production Settings

After creating an evaluator, you can enable it for production traces from the Evaluators table:

Enabled: Toggle to run this evaluator on production traces (where source != evaluation)
Sampling %: When enabled, set a sampling percentage to control costs (e.g., 25% evaluates one in four events)

Using with Experiments

Server-side evaluators automatically run on all experiment traces that match your event filters. When you run evaluate(), your server-side evaluators score the results without any additional code.

from honeyhive import evaluate

# Server-side evaluators run automatically on matching events
result = evaluate(
    function=my_function,
    dataset=my_dataset,
    name="my-experiment"
)
# No need to pass evaluators param—server-side evaluators are applied automatically

Run Your First Experiment

Get started with experiments

Experiments Framework

Learn how experiments and evaluators work together

Getting Started

Observability

Evaluation

Prompt Management

Administration

Learn More

Python Evaluators

Creating a Python Evaluator

Event Schema

Evaluator Function

Configuration

Event Filters

Return Type

Passing Range

Advanced Settings

Production Settings

Using with Experiments

Run Your First Experiment

Experiments Framework

Getting Started

Observability

Evaluation

Prompt Management

Administration

Learn More

​Creating a Python Evaluator

​Event Schema

​Evaluator Function

​Configuration

​Event Filters

​Return Type

​Passing Range

​Advanced Settings

​Production Settings

​Using with Experiments

Run Your First Experiment

Experiments Framework

Creating a Python Evaluator

Event Schema

Evaluator Function

Configuration

Event Filters

Return Type

Passing Range

Advanced Settings

Production Settings

Using with Experiments