Time-Series Chart
Visualizing Performance Metrics in HoneyHive
HoneyHive provides a powerful platform for monitoring your LLM application’s performance in real-time. Here’s how you can effectively utilize segmentation charts to gain insights:
- Real-time Observation: Log in to HoneyHive to observe your LLM application’s performance metrics in real-time. The dashboard provides an intuitive interface to visualize various metrics and their trends.
- Metric Definition: Define the specific metric you want to visualize. HoneyHive supports standard out-of-the-box metrics, custom metrics and user feedback. Standard metrics could include
Request Volume
,Cost
, andLatency
. Any custom metrics that you previously defined and enabled in production can also be visualized here. User feedback will be aggregated based on it’s return type. For example, you can selectAccepted
to track percentage of requests that were accepted by end-users. - Aggregation Functions: Choose the aggregation function that best suits your analysis. Common functions include Average, Sum, Percentage True, and 99th Percentile. Selecting the right aggregation function helps you distill complex data into meaningful insights. HoneyHive automatically provides different aggregation functions for
boolean
andfloat
return-type metrics. - Data Filtering and Comparison: Utilize the power of segmentation by using
filter
andgroup by
. This allows you to focus on specific data slices based on user properties, custom metadata, or other relevant criteria. For example, you can filter byuser_country
orsubscription_tier
to perform cohort-level analysis. Any user properties or custom metadata can be found here.
Metric Types
HoneyHive supports various metric types for monitoring your LLM application’s performance:
- Standard Metrics: These include Request Volume, which measures the number of requests your application receives; Cost, which evaluates the expenses associated with running the application; and Latency, which assesses the time taken to respond to requests.
- Python Evaluators: You can define and enable Python evaluators tailored to your application’s specific needs. These evaluators could be related to content quality, or any other relevant aspect.
- LLM Evaluators: You can define and enabled LLM evaluators to be computed at ingestion-time tailored to your application’s specific needs.
- User Feedback: You can visualize any user feedback fields (as long as the return type is
Float
orBoolean
) to analyze performance in production.
Development
or Staging
environmentsMetric
if the return type is set as float
or boolean
. Any evaluators with string
return type can only be used to group or filter charts.User Properties
User properties provide valuable insights into user behavior and preferences. Common examples include:
user_ID
: A unique identifier for each user, helping you track individual user interactions.user_country
: Allows you to analyze how different regions interact with your application.subscription_tier
: Helps you understand the behavior of different user segments based on their subscription level.
Utilize these properties to perform cohort-level analysis, identifying trends and patterns among specific user groups.
Metadata
Metadata offers flexibility in capturing additional information about user interactions. This arbitrary data can be passed with logged requests. Common examples include:
- Custom Tags: Tag requests with identifiers that hold significance within your application.
- Session Duration: Track how long users engage with your LLM application.
- Content Type: Categorize requests based on the type of content users are interacting with.
Leverage metadata to gain deeper insights into user interactions and tailor your LLM application accordingly.