HoneyHive’s abstractions have been designed for maximal extensibility & reusability. All concepts are minimally opinionated.

Project

Everything in HoneyHive is organized by projects. A project is a workspace to develop, test, and monitor a specific AI application.

Sessions & Events

Event: An event tracks the execution of different parts of your application along with related metadata, user feedback and so on. This is synonymous with a single span in a trace.

Session: A session is a collection of events that are related to a single user interaction with your application. This is synonomous with a root span in a trace.

They are a useful tool for logging every interaction in your application, understanding what’s happening, and can be used to troubleshoot issues and monitor performance. Full details on how they work can be found in the Tracing Data Model.

Evaluation Run

An evaluation run is a collection of sessions that track the execution of your application (or a part of it) based on a common run_id on metadata.

In our interface, we summarize the metrics present on the session & all its children. Presenting an interface as shown below:

In this interface, you can apply different aggregation functions over the metrics, filter for particular sessions, and step into the trace view for each run.

Two evaluation runs can compare sessions/events against each other when they have a common datapoint_id on metadata.

Configuration

A configuration is a generic set of parameters that define the behavior of any component in your application - be that the model, a sub-component, or the application itself.

Prompt

A prompt is a specific configuration for your model. It includes the model name, provider, prompt template, and any other hyperparameters (including functions/tools associated with your template).

Datapoint

A datapoint is a set of input-output pairs (along with any metadata) that can be used by your models, retrievers and other components.

Each datapoint has a unique datapoint_id that can be used to track it across different sessions, evaluation runs, and comparisons.

They are also linked to the events that generated them, so you can always trace back to the original data.

Dataset

A dataset is a collection of datapoints that can be used in evaluation or fine-tuning or however you see fit.

These can be exported and used in your testing or fine-tuning pipelines.

Evaluator

An evaluator is a function (python or LLM) that’s run over your logs to evaluate the performance of your application.

We support both client-side and server-side evaluators, so you can decide to run the evaluation on your own infrastructure or use our managed metrics.