In the following example, we are going to walk through how to log your pipeline runs to HoneyHive for benchmarking and sharing. For a complete overview of evaluations in HoneyHive, you can refer to our Evaluations Overview page.

For this quickstart tutorial, we will do a simple evaluation comparing 2 different gpt-3.5-turbo variants.


This is temporarily incompatible with Google Colab, please run this in a .py file through the command line
import honeyhive
import os

# import any other vector databases, APIs and other model providers you might need
oos.environ['OPENAI_API_KEY'] = "OPENAI_API_KEY"

Define Evaluation Pipeline

To start, we need to define our pipeline. This function describes how the datapoint is run across the config.

def hh_pipeline(config, datapoint, tracer, metrics):

    # Enter your preprocessing here

    with tracer.model(...):
        # Enter your API call here
    return tracer, metrics

Prepare A Dataset & Configurations

Now that the evaluation is configured, we can set up our offline evaluation. Begin by fetching a dataset to evaluate over.

Try to pick data as close to your production distribution as possible
# in case you have a saved dataset in HoneyHive
from honeyhive.sdk.datasets import get_dataset, get_prompt
dataset = get_dataset("Email Writer Samples")
config = get_prompt("Email Writer:latest")

Running the eval

Once you have instantiated your honeyhive.eval() the eval is executed by calling the .using() method

         name="Max Tokens Comparison", 
         project="Email Writer App", 
Parallelize your evaluation runs whenever possible! Simply passing parralelize=True will save 10x on time.

Share & Collaborate

After running the evaluation, you will receive a url taking you to the evaluation interface in the HoneyHive platform.

From there, you can share it with other members of your team via email or by directly sharing the link.


Sharing Evaluations

How to collaborate over an evaluation in HoneyHive

From discussions, we can garner more insights and then run more evaluations iteratively till we are ready to go to production!

Up Next