Instrumentation Guide
How to instrument your application for maximum insights.
HoneyHive provides deeply customizable tooling for observing your LLM application. This guide will help you understand how to instrument your application to get the most insights out of your data.
Instrumenting LLM Requests
It’s essential to log requests made to your models to evaluate performance, track prompts, debug issues, and gain insights into your appliation performance. Use the following schema to capture relevant information about LLM requests on the model
event.
Event: model
The model
event represents a request made to an LLM. It is used to send information about the configuration, inputs, outputs, metadata, and metrics associated with the request.
Root Field | Field | Description |
---|---|---|
config | model | The name or identifier of the LLM model being used for the request. |
provider | The provider or vendor of the LLM model (e.g., Anthropic, OpenAI, etc.). | |
temperature | The temperature hyperparameter value used for the LLM, which controls the randomness or creativity of the generated output. | |
max_tokens | The maximum number of tokens allowed to be generated by the LLM for the current request. | |
top_p | The top-p sampling hyperparameter value used for the LLM, which controls the diversity of the generated output. | |
top_k | The top-k sampling hyperparameter value used for the LLM, which also controls the diversity of the generated output. | |
template | The prompt template or format used for structuring the input to the LLM. | |
{hyperparameter} | Any additional hyperparameters or configuration settings specific to the LLM being used. | |
inputs | chat_history | The messages or context provided as input to the LLM, typically in a conversational or chat-like format. |
{prompt-input} | The respective input fields or values used to populate the prompt template. | |
outputs | role | The role or perspective from which the LLM generated the response (e.g., assistant, user, system). |
content | The actual response message generated by the LLM. | |
{custom_field} | Any additional or parsed fields from the LLM response, such as agent thoughts, tool calls, or other structured data. | |
metadata | total_tokens | The total number of tokens in the LLM’s response, including the prompt and completion. |
completion_tokens | The number of tokens in the generated completion or output from the LLM. | |
prompt_tokens | The number of tokens in the prompt or input provided to the LLM. | |
cost | The cost or pricing information associated with the LLM request, if available. | |
{custom} | Any additional metadata relevant to the LLM request, such as external links, application screenshots, or other context. | |
metrics | {custom} | Any custom metrics or evaluations associated with the LLM request, such as tokens per second, quality scores, or other performance indicators. |
duration | - | The total time taken for the LLM request, measured in milliseconds, which can help identify performance bottlenecks or slow operations. |
error | - | Any errors, exceptions, or error messages that occurred during the LLM request, which can aid in debugging and troubleshooting. |
Adding Application Metadata to Traces
When instrumenting your LLM application, it’s essential to include relevant metadata about the application itself, the user, and the session. This metadata can provide valuable context for debugging, performance analysis, and understanding user behavior. Use the following schema to capture this information on the session
event.
Event: session
The session
event represents the start of a new session or interaction with your LLM application. It is used to send metadata about the application, environment, session, and user.
Root Field | Field | Description |
---|---|---|
config | app_version | The version of the LLM application currently running. This can help identify issues specific to a particular version or track performance improvements across versions. |
source | - | The environment or deployment context of the application, such as production , staging , dev , or evaluation . This can help differentiate between different deployment environments and their respective configurations. |
session | session_id | A unique identifier for the current session or interaction with the LLM application. This can be used to correlate multiple events and trace the flow of a single session. You can either pass your session ID or we will automatically generate one at session start. |
user_properties | user_id | A unique identifier for the user interacting with the LLM application. This can help analyze user-specific behavior and performance. |
user_tier | Additional properties or metadata about the user, such as their subscription tier (e.g., free or pro ). This can be used for segmented analysis or to identify performance differences across user tiers. | |
user_tenant | If the application supports multi-tenancy, this field can represent the tenant or organization the user belongs to. This can be useful for analyzing tenant-specific behavior or performance. |
Instrumenting RAG and Tool Requests
When your LLM application interacts with external APIs, databases, or vector databases like Pinecone, you can instrument these interactions to evaluate performance, debug issues, and gain insights. Use the following schema to capture relevant information about these interactions.
Event: tool
The tool event represents an interaction with an external resource. Send the following fields:
Root Field | Field | Description |
---|---|---|
config | provider | The name of the external service provider offering vector database, API, or other relevant services (e.g., Pinecone, Weaviate, etc.). |
instance | The specific instance or deployment name of the service within the provider’s infrastructure, allowing for differentiation between multiple instances or deployments. | |
embedding_model | The name or identifier of the embedding model used for calculating vector similarity, which is particularly relevant for vector databases or services that rely on vector representations of data. | |
chunk_size | The size (in characters or tokens) of the chunks into which data is split before being converted into vectors, if applicable to the service being used. This is important for services that operate on chunked data. | |
chunk_overlap | The amount of overlap (in characters or tokens) between consecutive chunks of data, if applicable to the service being used. This is also relevant for services that operate on chunked data with overlapping segments. | |
inputs | top_k | The number of top-ranked or most similar results to be retrieved from the vector database or service during a similarity search or ranking operation. |
query | The query string, vector representation, or any other input data used for retrieval, search, or processing by the external service. | |
outputs | chunks | The data chunks, documents, or any other output retrieved or obtained from the external service as a result of the query or operation performed. |
scores | The similarity scores, relevance scores, or any other scoring metrics associated with the retrieved chunks or documents, if applicable to the service being used. | |
duration | - | The total time taken for the request or interaction with the external service, measured in milliseconds, which can be useful for identifying performance bottlenecks or slow operations. |
error | - | Any errors, exceptions, or error messages that occurred during the retrieval request or interaction with the external service, which can aid in debugging and troubleshooting. |