Managing Datasets in UI
Run experiments using datasets stored and managed in HoneyHive UI.
In the experiments Quickstart, you learned how to run an experiment using local datasets defined directly on your code. This guide focuses on utilizing datasets managed through the HoneyHive platform. Managed datasets offer several advantages, particularly for team collaboration, as they are centralized and versioned. Though this approach requires some additional initial setup compared to local evaluators, it provides a more robust foundation for collaborative work.
Full code
Below is a minimal example demonstrating how to run an experiment using managed datasets. This assumes you have already created a project and an API key. You will also need to provide a Dataset ID, which will be detailed in the following section.
Create your dataset in jsonl format
Let’s first create our dataset in jsonl format. Simply create a file named market_dataset.jsonl
and paste the following content:
Upload your dataset to HoneyHive
Now that we have our dataset in the proper format, let’s upload it to HoneyHive. HoneyHive supports 2 ways to upload it: via UI or via SDK. In this guide, let’s do that through the UI:
If you want to know more about uploading datasets to HoneyHive, check our Datasets Documentation Page.
Be sure to save your Dataset ID - we will use it in the last step of this tutorial.
Create the flow you want to evaluate
The remaining steps are the same as those seen on Experiments Quickstart. Define the function you want to evaluate:
The inputs
and ground_truths
fields as defined in your dataset will be passed to this function.
For example, in one execution of this function, inputs
might contain a dictionary like:
and ground_truths
might contain a dictionary like:
The value returned by the function would map to the outputs
field of each run in the experiment and will be accessible to your evaluator function, as we will see below.
(Optional) Setup Evaluators
Define client-side evaluators in your code that run immediately after each experiment iteration. These evaluators have direct access to inputs, outputs, and ground truths, and run synchronously with your experiment.
In addition to inputs
and ground_truths
, the evaluator function has access to the return value from function_to_evaluate
, which is mapped to outputs
. In this example, outputs
would contain a string with the model response, such as:
Run experiment
Finally, you can run your experiment with evaluate
:
Dashboard View
Remember to review the results in your HoneyHive dashboard to gain insights into your model’s performance across different inputs. The dashboard provides a comprehensive view of the experiment results and performance across multiple runs.
Conclusion
By following these steps, you’ve learned how to run experiments using HoneyHive’s server-side evaluators. This approach offers centralized evaluation management, scalability, and version control, making it easier to handle complex or resource-intensive evaluations while maintaining consistent standards and enabling seamless collaboration across your team.