Comparison view allows you to run multiple experiments using the same dataset (linked by dataset_id) and compare their results side-by-side. The dataset is the input data for the experiment which can be through HoneyHive Datsets or inputs passed. This is particularly useful when you want to benchmark different models, prompts, or configurations against each other.

The comparison view allows you to:

Let’s walk through the key features of the comparison view to help you effectively compare your experiments.

Advanced Comparison Features

1. Step Level Comparisons

HoneyHive allows you to compare experiments at each individual step level, giving you granular insights into how different configurations perform at specific stages of your workflow.

2. Aggregated Metrics

HoneyHive automatically calculates and compares aggregates from:

  • Server-side metrics
  • Client-side metrics
  • Composite metrics at the session level

3. Improved/regressed events

Filter for events that have improved or regressed in specific metrics.

Select the metric and operation you want.

View the corresponding events in the events table.

4. Output Diff Viewer

Compare outputs and metrics of corresponding events with the same event name.

5. Metric Distribution

Analyze the distribution of various metrics for deeper insights.

Best Practices

  1. Use a consistent dataset for all compared experiments.
  2. Isolate one change at a time (e.g., model, prompt, temperature) to understand its specific impact.
  3. Ensure a sufficient sample size for statistically significant conclusions.
  4. Document configurations used in each experiment for future reference.

Conclusion

Comparitive View for Experiments in HoneyHive provide a powerful tool for benchmarking different LLM configurations. Leverage this feature to make data-driven decisions about optimal models, prompts, or parameters for your specific use case.