A list of HoneyHive’s server-side evaluator templates for Python and LLM metrics.
event
objects, so when instrumenting your application for sending traces to HoneyHive, you need to ensure
the correct event properties are being captured and traced.
For example, suppose you want to set up a Python evaluator that requires both the model’s response and a provided ground truth, as well as an LLM evaluator that requires the model’s response and a provided context.
In this case, you can wrap your model call within a function and enrich the event object with the necessary properties:
chain
event, as it groups together a model
event within it.
The chain
event will be named after the traced function.
When setting up an evaluator in HoneyHive for the example above, follow these steps:
chain
generate_response
event["outputs"]["result"]
event["feedback"]["ground_truth"]
event["inputs"]["context"]
{{ outputs.result }}
{{ feedback.ground_truth }}
{{ inputs.context }}
Response Length Evaluator
Semantic Similarity Evaluator
Levenshtein Distance Evaluator
ROUGE-L Evaluator
BLEU Evaluator
JSON Schema Validation Evaluator
SQL Parse Check Evaluator
Flesch Reading Ease Evaluator
JSON Key Coverage Evaluator
Tokens per Second Evaluator
Keywords Assertion Evaluator
OpenAI Moderation Filter Evaluator
External API Example Evaluator
Answer Faithfulness Evaluator
Answer Relevance Evaluator
Context Relevance Evaluator
Format Adherence Evaluator
Tool Usage Evaluator
Intent Identification Evaluator
Toxicity Evaluator
Coherence Evaluator