Quickstart
Get started with running experiments with HoneyHive
Running experiments is a natural extension of the tracing capabilities of HoneyHive. We recommend you to go through the tracing quickstart before proceeding with this guide.
Full code
Here’s a minimal example to get you started with experiments in HoneyHive:
Running an experiment
Prerequisites
- You have already created a project in HoneyHive, as explained here.
- You have an API key for your project, as explained here.
Expected Time: 5 minutes
Steps
Setup input data
Let’s create our dataset by inputting data directly into our code using a list of JSON objects:
inputs
and ground_truths
fields will be accessible in both the function we want to evaluate and the evaluator function, as we will see below. Define the function you want to evaluate
Define the function you want to evaluate. This can be arbitrarily complex, anywhere from a prompt or a simple retrieval pipeline, to an end-to-end multi-agent system:
inputs
is a dictionary of parameters available from your dataset (see above) to be used in your function.- The value returned by the function would map to the
outputs
field of each trace in the experiment and will be accessible to your evaluator function, as we will see below. ground_truths
is an optional field and, as the name suggests, contains the ground truth dictionary for each set of inputs in your dataset.
(Optional) Setup Evaluators
Define client-side evaluators in your code that run immediately after each experiment iteration. These evaluators have direct access to inputs, outputs, and ground truths, and run synchronously with your experiment.
Run experiment
Finally, you can run your experiment with evaluate
:
Dashboard View
Remember to review the results in your HoneyHive dashboard to gain insights into your model’s performance across different inputs. The dashboard provides a comprehensive view of the experiment results and performance across multiple runs.
Conclusion
By following these steps, you can set up and run experiments using HoneyHive. This allows you to systematically test your LLM-based systems across various scenarios and collect performance data for analysis.
Next Steps
If you are interested in a specific workflow, we recommend reading the walkthrough for the relevant product area.
Introduction to Evaluators
Learn how to evaluate and monitor your AI applications with HoneyHive’s flexible evaluation framework.
Comparing Experiments
Compare experiments side-by-side in HoneyHive to identify improvements, regressions, and optimize your workflows.
Running Experiments with HoneyHive's managed datasets
Run experiments using HoneyHive’s managed datasets, enabling centralized dataset management and version control.
Running Experiments with HoneyHive's server-side evaluators
Server-side evaluators are centralized, scalable, and versioned, making them ideal for resource-intensive or asynchronous tasks.