Run experiments using server-side HoneyHive evaluators
Sample eval script
Setup input data
inputs
and ground_truths
fields will be accessible in both the function we want to evaluate and the evaluator function, as we will see below. Create the flow you want to evaluate
inputs
is a dictionary with the parameters used in your function, as defined in our dataset.outputs
field of each run in the experiment and will be accessible to your evaluator function, as we will see below.ground_truths
is an optional field and, as the name suggests, contains the ground truth for each set of inputs.Setup Server-side Evaluators
Add Evaluator
and select Python Evaluator
.event
dictionary parameter, such as inputs
, outputs
, or metadata
.
For our Response Length evaluator, we are interested in the model’s response, which we’ll access using the event["outputs"]["content"]
path:Run experiment
evaluate
:Response Length
metric in your dashboard. Note that even though we didn’t
pass any local evaluators when running evaluate
, our server-side evaluator was properly configured and executed.