Logging is an integral part of developing and managing AI applications. It helps you track the performance of your application, identify issues, and understand the application’s behavior during runtime. HoneyHive provides powerful logging capabilities that allow you to trace the execution of your AI models and pipelines.
This guide will introduce you to the concept of logging in HoneyHive, including how it works and how logs can be used to monitor, debug, and optimize the performance of your AI application.
Why log your AI pipelines?
In-depth Analysis: Logging provides detailed information about each event in your AI model’s life cycle, allowing you to understand how your model or pipeline is operating, where it’s spending most of its time, and where potential issues or bottlenecks might be.
Debugging and Optimization: With the detailed view provided by logging, you can easily identify any errors or performance issues. This makes debugging easier and allows you to optimize your model or pipeline for better performance.
Monitoring: Regularly checking the logs can help you spot any unusual behavior or performance issues early on, allowing you to proactively address potential problems.
Dataset Curation: Based on user feedback and metric performance, you can use logging to curate your evaluation and fine-tuning datasets to improve the quality of your AI model.
How does logging work in HoneyHive?
There are two types of logs in HoneyHive: Events and Evaluations.
Events are used to track the execution of your AI model or pipeline. As your pipeline executes, each step’s execution is tracked as an event.
An event can be of 3 types -
Model events are used to track the execution of your AI model. These can be used to capture
- Model configuration like model name, model hyperparameters, prompt template, etc.
- Model metrics like completion token count, cost, tokens per second, etc.
- API-level metrics like request latency, rate limit errors, etc.
Tool events are used to track the execution of anything other than the model. These can be used to capture
- Tool configuration like vector index name, vector index hyperparameters, any internal tool configuration, etc.
- Tool metrics like retrieved chunk similarity, internal tool response validation, etc.
- API-level metrics like request latency, index errors, internal tool errors, etc.
Chain events help with categorizing the events into different stages of the pipeline. These can be synchronous or asynchronous stages.
How Chain Events Work: Here’s the core concept: any event that has its “parent” set to a chain event becomes a step within that chain. This simple mechanism allows you to consolidate various events into a single unit, making it easier to monitor the progress of your pipeline.
Nesting for Hierarchy: You can also nest chains within each other. This hierarchical approach lets you track the execution of your pipeline in a structured and organized manner. This nesting feature can be particularly useful for complex workflows.
By separating events into chains, you can track properties like:
- Chain configuration like chain name, chain settings, etc.
- Chain metrics like chain latency, chain throughput, etc.
An evaluation in HoneyHive is a collection of pipeline runs with a summary computed over it.
results- a 2D array of results from each pipeline run tracking the session id for each test case id and configuration id pair.
summary- a summary of the metrics computed over the results, including pass/fail, averages, and other statistics.
For more information on how we manage evaluations data and how you can use it in HoneyHive, refer to the Evaluations Overview section.
How to use event data in HoneyHive
HoneyHive provides a number of features that allow you to use event data to enhance the debugging, monitoring, and optimization of your AI applications.
Pipeline traces with error data can be used to debug which steps in the pipeline break.
- View the full trace in the platform on the Session Sideview by clicking the session trace in our Datasets or Monitoring pages.
- Analyze your data and create charts to visualize performance metrics, user feedback or error rates from your pipeline over time.
Pipeline Variant Selection
You can log pipeline traces into our evaluations SDK by passing the
session_id as a metric to the SDK. This allows you to compare the performance of different pipeline variants step by step on the same dataset.
Success Metric Analysis
HoneyHive enables computing custom metrics over your session data such as:
- Number of user exchanges in a session
- Number of times a user asked for help in a session
- Action taken by an agent in a session
- Relevance of retrieved vector database chunks
These are just a small list of examples of the metrics that can be computed over your session data. You can find more information on how to compute these metrics in the Metrics section.
HoneyHive enables you to curate your evaluation and fine-tuning datasets based on user feedback and metric performance. You can find more information on how to curate your datasets in the Fine-Tuning and Evaluation sections.
To start using logging in HoneyHive with your AI applications, refer to the following resources:
Tracing a Custom Python Pipeline
Learn how to use HoneyHive’s Python Tracer to monitor your Python pipelines.
Tracing a Custom Pipeline via API
Learn how to use HoneyHive’s APIs to trace your pipelines in any language.
Tracing with LlamaIndex
Learn how to use HoneyHive’s LlamaIndex Tracer to monitor and improve your LlamaIndex pipelines.
Tracing with LangChain
Learn how to use HoneyHive’s LangChain Tracer to monitor and improve your LangChain pipelines.
Tracing in Typescript
Learn how to use HoneyHive’s SessionTracer in Typescript to monitor and improve your LLM pipelines.