Running experiments is a natural extension of the tracing capabilities of HoneyHive. We recommend you to go through the tracing quickstart before proceeding with this guide.

Full code

Here’s a minimal example to get you started with experiments in HoneyHive:

Running an experiment

Prerequisites

  • You have already created a project in HoneyHive, as explained here.
  • You have an API key for your project, as explained here.
  • You have instrumented your application with the HoneyHive SDK, as explained here.

Expected Time: 5 minutes

Steps

1

Create the flow you want to evaluate

Assuming you have gone through the tracing quickstart, you would have a function that looks like this:

The value returned by the function would map to the outputs field of each run in the experiment.

2

Setup input data

Input datasets for experiments can be managed in two ways:

  1. HoneyHive Cloud: Upload and version your datasets directly in HoneyHive for team collaboration and dataset versioning. After uploading the dataset, use the dataset_id in your experiment configuration.
  2. Code-managed: Define your input data directly in your code using a list of JSON objects. This is useful for dynamic datasets or when you want to keep everything in your codebase.
The input fields in the dataset should map to the fields mapped in the evaluate function.
3

(Optional) Setup Evaluators

Evaluators can be configured in two ways:

  1. Client-side Execution: Define evaluators in your code that run immediately after each experiment iteration. These evaluators have direct access to inputs and outputs and run synchronously with your experiment.
For more complex multi-step pipelines, you can compute and log client-side evaluators on specific traces and spans directly in your experiment harness.
  1. Server-side Execution: Configure evaluators in HoneyHive UI that run asynchronously after your traces are logged. This is useful for computation-heavy evaluators or when you want to add/modify metrics after runs are complete.
4

Run experiment

Dashboard View

Remember to review the results in your HoneyHive dashboard to gain insights into your model’s performance across different inputs. The dashboard provides a comprehensive view of the experiment results and performance across multiple runs.

Conclusion

By following these steps, you can set up and run experiments using HoneyHive. This allows you to systematically test your LLM-based systems across various scenarios and collect performance data for analysis.