Running evaluations is a natural extension of the tracing capabilities of HoneyHive. We recommend you to go through the tracing quickstart before proceeding with this guide.

What is an Evaluation Run?

An evaluation run in HoneyHive is a group of related sessions that have a common metadata.run_id field.

When an evaluation run is linked to a dataset, then it can be compared against other runs on the same dataset.

The flexibility here helps you compare the performance of your application across any dimension of configuration you want to test - models, chunking strategies, vector databases, prompts, so on.

Running a evaluation

Prerequisites

  • You have already created a project in HoneyHive, as explained here.
  • You have an API key for your project, as explained here.
  • You have instrumented your application with the HoneyHive SDK, as explained here.

Expected Time: 5 minutes

Steps

1

Installation

To install our SDKs, run the following commands in the shell.

2

Initialize the SDK

To initialize the API client, we simply pass the API key to the SDK constructor.

3

Initialize the evaluation run

To initialize the evaluation run, we need to provide a name for it.

To make evaluation runs comparable, we need to pass the dataset_id to the evaluation run. The docs for creating datasets can be found here.
Keep the returned run_id for future reference.

4

Setup your application tracing

Create a function which traces your application code.

At the end of the function, return the session_id for future reference.

If the relevant execution is not being auto-traced, refer to our custom spans docs to track that as well.

To initialize the tracer, we require 3 key details:

  1. Your HoneyHive API Key
  2. Name of the project to log the trace to
  3. Name for this session - like “Chatbot Session” or “Customer RAG Session”.
To make the datapoints comparable, we need to set the datapoint_id on the tracer metadata.
5

Setup your harness

Now you can loop over any dataset you have and run your application code.

6

Push your evaluation to HoneyHive

Finally, push the evaluation to HoneyHive.

Next Steps

Congratulations! You have successfully run an evaluation in HoneyHive.

Now, to view the evaluation results, head over to the HoneyHive Evaluations.