Running experiments is a natural extension of the tracing capabilities of HoneyHive. We recommend you to go through the tracing quickstart before proceeding with this guide.

Full code

Here’s a minimal example to get you started with experiments in HoneyHive:

Running an experiment

Prerequisites

  • You have already created a project in HoneyHive, as explained here.
  • You have an API key for your project, as explained here.

Expected Time: 5 minutes

Steps

1

Setup input data

Let’s create our dataset by inputting data directly into our code using a list of JSON objects:

dataset = [
    {
        "inputs": {
            "product_type": "electric vehicles",
            "region": "western europe"   
        },
        "ground_truths": {
            "response": "As of 2023, the electric vehicle (EV) ... ",
        }
    },
    {
        "inputs": {
            "product_type": "gaming consoles",
            "region": "north america"
        },
        "ground_truths": {
            "response": "As of 2023, the gaming console market ... ",
        }
    },
    {
        "inputs": {
            "product_type": "smart home devices",
            "region": "australia and new zealand" 
        },
        "ground_truths": {
            "response": "As of 2023, the market for smart home devices in Australia and New Zealand ... ",
        }
    },
]
The inputs and ground_truths fields will be accessible in both the function we want to evaluate and the evaluator function, as we will see below.
2

Define the function you want to evaluate

Define the function you want to evaluate. This can be arbitrarily complex, anywhere from a prompt or a simple retrieval pipeline, to an end-to-end multi-agent system:

# inputs -> parameter to which datapoint or json value will be passed
# (optional) ground_truths -> ground truth values for the input
def function_to_evaluate(inputs, ground_truths):

    # Code here

    return result
  • inputs is a dictionary of parameters available from your dataset (see above) to be used in your function.
  • The value returned by the function would map to the outputs field of each trace in the experiment and will be accessible to your evaluator function, as we will see below.
  • ground_truths is an optional field and, as the name suggests, contains the ground truth dictionary for each set of inputs in your dataset.
3

(Optional) Setup Evaluators

Define client-side evaluators in your code that run immediately after each experiment iteration. These evaluators have direct access to inputs, outputs, and ground truths, and run synchronously with your experiment.

@evaluator()
def sample_evaluator(outputs, inputs, ground_truths):
    # Code here
    import random
    return random.randint(1, 5)
For more complex multi-step pipelines, you can compute and log client-side evaluators on specific traces and spans directly in your experiment harness.
4

Run experiment

Finally, you can run your experiment with evaluate:

from honeyhive import evaluate
from your_module import function_to_evaluate

if __name__ == "__main__":
    evaluate(
        function = function_to_evaluate,
        hh_api_key = '<HONEYHIVE_API_KEY>',
        hh_project = '<HONEYHIVE_PROJECT>',
        name = 'Sample Experiment',
        # To be passed for datasets managed in code
        dataset = dataset,
        # Add evaluators to your trace at the end of each execution
        evaluators=[sample_evaluator, ...]
    )

Dashboard View

Remember to review the results in your HoneyHive dashboard to gain insights into your model’s performance across different inputs. The dashboard provides a comprehensive view of the experiment results and performance across multiple runs.

Conclusion

By following these steps, you can set up and run experiments using HoneyHive. This allows you to systematically test your LLM-based systems across various scenarios and collect performance data for analysis.

Next Steps

If you are interested in a specific workflow, we recommend reading the walkthrough for the relevant product area.