HoneyHive Docs

Introduction

Client-side evaluations allow you to log external evaluation results (metrics) with your trace.

Prerequisites

You have already set tracing for your code as described in our quickstart guide.

Setting

You can set on both the trace level or the span level. If the applies to the entire trace, then set it on the trace level. If the applies to a specific span, then set it on the span level. For more details, refer to the enrich traces documentation.

In Python, you can use the enrich_session function to set on the trace level.To pass to HoneyHive, pass it to the param in the enrich_session function. This function is used to enrich the session with additional information. Remember that enrich_session will update, not overwrite, the existing object on the trace.Read more about the enrich_session function in the Python SDK reference.Here’s an example of how to set on the trace level in Python:

Python

from honeyhive import HoneyHiveTracer, enrich_session

HoneyHiveTracer.init(
  api_key="my-api-key",
  project="my-project",
)

# ...

enrich_session(metrics={
  "json_validated": True,
  "num_actions": 10,
  # any other custom fields and values as you need
  "step_evals": [
    {
      "invalid_grammar": False,
      "unable_to_locate_UI": True
    }
  ],
})

Concepts

If you are defining any evaluators in your application execution i.e. client-side, you can add the resulting metrics to any trace or span in HoneyHive for monitoring and analysis.

Most guardrails (eg: format, safety, PII, etc.) are ideal to compute client-side at execution time instead of server-side in HoneyHive post-ingestion.

What are evaluators?

Evaluators are tests (code or LLM based) that compute a score/metric to measure the quality of inputs and/or outputs for your application or specific steps within it. They can be computed and instrumented within your application runtime (client-side) or computed in our platform (server-side) post-ingestion.

Evaluators computed client-side are not overwritten by server-side evaluators of the same name.

Return Types

We accept all primary data types as evaluator metrics. This includes:

Return Type	Available Measurements	Notes	Uses
Boolean	Percentage True/False	-	Evaluations
Number	Sum, Avg, Median, Min, Max, P95, P98, P99	-	Evaluations
String	-	Used for filters and group by	Classification, feature extraction, etc.

For complex data types, we allow you to drill down to the nested fields or specific positions in the array. So, for example, if you pass metrics like:

{
  "step_evals": [
    {
      "invalid_grammar": true,
      "user_interevened": true
    },
    {
      "invalid_grammar": false,
      "unable_to_locate_UI": true
    }
  ],
  "trajectory_eval": {
    "overall": 5,
    "clarified_user_intent": "yes"
  }
}

You can chart metrics.step_evals.0.user_intervened as a boolean field or trajectory_eval.overall as a numeric field.

Nesting Limitations: For complex data types like objects and arrays, we allow max 5 levels of nested objects and max 2 levels of nested arrays.

Learn more

Charting metrics

Learn how to chart metrics from your traces

Setup server-side code evaluators

Learn how to compute metrics server-side over your logs via code

Setup server-side LLM evaluators

Learn how to compute metrics server-side over your logs via a LLM

Setup human annotations

Learn how to setup annotation for your logs via domain experts

SDK Reference

Read more about the enrich_session function in the Python SDK reference.

Introduction

Guides

Tutorials

Learn more

Client-Side Evaluations

Introduction

Prerequisites

Setting

Concepts

What are evaluators?

Return Types

Learn more

Charting metrics

Setup server-side code evaluators

Setup server-side LLM evaluators

Setup human annotations

SDK Reference

Introduction

Guides

Tutorials

Learn more

​Introduction

​Prerequisites

​Setting

​Concepts

​What are evaluators?

​Return Types

​Learn more

Charting metrics

Setup server-side code evaluators

Setup server-side LLM evaluators

Setup human annotations

​SDK Reference

Introduction

Prerequisites

Setting

Concepts

What are evaluators?

Return Types

Learn more

SDK Reference