Python evaluators
Technical documentation for creating custom Python evaluators in HoneyHive
Python evaluators allow you to create custom evaluations for any steps in your pipeline using Python code.
Creating a Python Evaluator
- Navigate to the Evaluators tab in the HoneyHive console.
- Click
Add Evaluator
and selectPython Evaluator
.
Event Schema
Python evaluators operate on event
objects. Key properties include:
event_type
: Type of event (e.g., “model”, “tool”, “chain”, “session”)event_name
: Name of the specific eventinputs
: Input data for the eventoutputs
: Output data from the eventfeedback
: User feedback and ground truth data
Show Schema
in the evaluator console to explore available event properties.Evaluator Function
Define your evaluation logic in a Python function:
Configuration
Return Type
Boolean
: For true/false evaluationsNumeric
: For numeric scores or ratingsString
: For categorical evals or other objects
Passing Range
Passing ranges are useful in order to be able to detect which test cases failed in your evaluation. This is particularly useful for defining pass/fail criteria on a datapoint level in your CI builds.
Online Evaluation
Toggle to enable real-time evaluation in production. We define production as any traces where source != evaluation
when initializing the tracer.
Event Filters
You can choose to compute your evaluator over a specific event_type
and event_name
in your pipeline, including the root span (session
).
Testing
You can quickly test your evaluator with the built-in IDE by either defining your datapoint to test against in the JSON editor, or retrieving any recent events from your project to test your evaluator against.
Commit and deploy your evaluator by clicking Commit
in the top right corner.