Client-side Evaluations
Learn how to log external evaluation results (metrics) with your trace.
Introduction
Client-side evaluations allow you to log external evaluation results (metrics) with your trace.
Prerequisites
You have already set tracing for your code as described in our quickstart guide.
Setting
You can set on both the trace level or the span level. If the applies to the entire trace, then set it on the trace level. If the applies to a specific span, then set it on the span level. For more details, refer to the enrich traces documentation.
In Python, you can use the enrich_session
function to set on the trace level.
To pass to HoneyHive, pass it to the param in the enrich_session
function. This function is used to enrich the session with additional information. Remember that enrich_session
will update, not overwrite, the existing object on the trace.
Read more about the enrich_session
function in the Python SDK reference.
Here’s an example of how to set on the trace level in Python:
Concepts
If you are defining any evaluators in your application execution i.e. client-side, you can add the resulting metrics to any trace or span in HoneyHive for monitoring and analysis.
What are evaluators?
Evaluators are tests (code or LLM based) that compute a score/metric to measure the quality of inputs and/or outputs for your application or specific steps within it.
They can be computed and instrumented within your application runtime (client-side) or computed in our platform (server-side) post-ingestion.
Return Types
We accept all primary data types as evaluator metrics. This includes:
Return Type | Available Measurements | Notes | Uses |
---|---|---|---|
Boolean | Percentage True/False | - | Evaluations |
Number | Sum, Avg, Median, Min, Max, P95, P98, P99 | - | Evaluations |
String | - | Used for filters and group by | Classification, feature extraction, etc. |
For complex data types, we allow you to drill down to the nested fields or specific positions in the array.
So, for example, if you pass metrics like:
You can chart metrics.step_evals.0.user_intervened
as a boolean field or trajectory_eval.overall
as a numeric field.
Learn more
Charting metrics
Learn how to chart metrics from your traces
Setup server-side code evaluators
Learn how to compute metrics server-side over your logs via code
Setup server-side LLM evaluators
Learn how to compute metrics server-side over your logs via a LLM
Setup human annotations
Learn how to setup annotation for your logs via domain experts
SDK Reference
Read more about the enrich_session
function in the Python SDK reference.