Data Model Overview
An overview of our data model for logging traces and events
HoneyHive combines logs, metrics, and traces into a unified data model, leveraging the concept of high cardinality to provide a comprehensive view of your AI system’s performance and behavior. By consolidating these traditionally separate observability pillars into a single, flexible event-based structure, we enable developers to gain deeper insights and perform more sophisticated analyses. This approach offers several key benefits:
- Unified Context: Each event captures not just raw data, but also the surrounding context, allowing for more meaningful correlations and insights.
- Flexible Querying: High cardinality enables precise filtering and aggregation across multiple dimensions, facilitating complex analyses and troubleshooting.
- Scalability: The event-based model scales efficiently with the growing complexity of AI systems and the increasing volume of observability data.
- Faster Debugging: The ability to trace a request through various components while simultaneously accessing logs and metrics streamlines the debugging process.
Introducing Events
The base unit of data in HoneyHive is called an event
, which represents a span in a trace. A root event in a trace is of the type session
, while all non-root events in a trace can be of 3 core types - model
, tool
and chain
.
session
event, which being a root event does not have any parents.session
: A root event used to group together multiplemodel
,tool
, andchain
events into a single trace. This is achieved by having a commonsession_id
across all children.model
events: Used to track the execution of any LLM requests.tool
events: Used to track execution of any deterministic functions like requests to vector DBs, requests to an external API, regex parsing, document reranking, and more.chain
events: Used to group together multiplemodel
andtool
events into composable units that can be evaluated and monitored independently. Typical examples of chains include retrieval pipelines, post-processing pipelines, and more.
Here’s a visual representation of the event hierarchy:
Session Events
Session events are used to track the execution of your application. These can be used to capture
- Session configuration like the application version, environment, etc.
- Session metrics like session latency, session throughput, etc.
- Session properties like user id, country, tier, etc.
- Session feedback like overall session feedback, etc.
Here’s an example session event:
Model Events
Model events are used to track the execution of your AI model. These can be used to capture
- Model configuration like model name, model hyperparameters, prompt template, etc.
- Model metrics like completion token count, cost, tokens per second, etc.
- API-level metrics like request latency, rate limit errors, etc.
Here’s an example model event:
Tool Events
Tool events are used to track the execution of anything other than the model. These can be used to capture
- Tool configuration like vector index name, vector index hyperparameters, any internal tool configuration, etc.
- Tool metrics like retrieved chunk similarity, internal tool response validation, etc.
- API-level metrics like request latency, index errors, internal tool errors, etc.
Here’s an example tool event:
Chain Events
Chain events help with categorizing the events into different stages of the pipeline. These can be synchronous or asynchronous stages.
How Chain Events Work
Any event that has its “parent” set to a chain event becomes a step within that chain. This simple mechanism allows you to consolidate various events into a single unit, making it easier to monitor the progress of your pipeline.
Nesting for Hierarchy
You can also nest chains within each other. This hierarchical approach lets you track the execution of your pipeline in a structured and organized manner. This nesting feature can be particularly useful for complex workflows.
By separating events into chains, you can track properties like:
- Chain configuration like chain name, chain settings, etc.
- Chain metrics like chain latency, chain throughput, etc.
Here’s an example chain event:
Next Steps
Refer to our detailed documentation for a more specific mapping of the data model to your use case.