HoneyHive provides deeply customizable tooling for observing your LLM application. This guide will help you understand how to instrument your application to get the most insights out of your data.

Instrumenting LLM Requests

It’s essential to log requests made to your models to evaluate performance, track prompts, debug issues, and gain insights into your appliation performance. Use the following schema to capture relevant information about LLM requests on the model event.

Event: model

The model event represents a request made to an LLM. It is used to send information about the configuration, inputs, outputs, metadata, and metrics associated with the request.

Root FieldFieldDescription
configmodelThe name or identifier of the LLM model being used for the request.
providerThe provider or vendor of the LLM model (e.g., Anthropic, OpenAI, etc.).
temperatureThe temperature hyperparameter value used for the LLM, which controls the randomness or creativity of the generated output.
max_tokensThe maximum number of tokens allowed to be generated by the LLM for the current request.
top_pThe top-p sampling hyperparameter value used for the LLM, which controls the diversity of the generated output.
top_kThe top-k sampling hyperparameter value used for the LLM, which also controls the diversity of the generated output.
templateThe prompt template or format used for structuring the input to the LLM.
{hyperparameter}Any additional hyperparameters or configuration settings specific to the LLM being used.
inputschat_historyThe messages or context provided as input to the LLM, typically in a conversational or chat-like format.
{prompt-input}The respective input fields or values used to populate the prompt template.
outputsroleThe role or perspective from which the LLM generated the response (e.g., assistant, user, system).
contentThe actual response message generated by the LLM.
{custom_field}Any additional or parsed fields from the LLM response, such as agent thoughts, tool calls, or other structured data.
metadatatotal_tokensThe total number of tokens in the LLM’s response, including the prompt and completion.
completion_tokensThe number of tokens in the generated completion or output from the LLM.
prompt_tokensThe number of tokens in the prompt or input provided to the LLM.
costThe cost or pricing information associated with the LLM request, if available.
{custom}Any additional metadata relevant to the LLM request, such as external links, application screenshots, or other context.
metrics{custom}Any custom metrics or validators associated with the LLM request, such as tokens per second, quality scores, or other performance indicators.
duration-The total time taken for the LLM request, measured in milliseconds, which can help identify performance bottlenecks or slow operations.
error-Any errors, exceptions, or error messages that occurred during the LLM request, which can aid in debugging and troubleshooting.

Adding Application Metadata to Traces

When instrumenting your LLM application, it’s essential to include relevant metadata about the application itself, the user, and the session. This metadata can provide valuable context for debugging, performance analysis, and understanding user behavior. Use the following schema to capture this information on the session event.

Event: session

The session event represents the start of a new session or interaction with your LLM application. It is used to send metadata about the application, environment, session, and user.

Root FieldFieldDescription
configapp_versionThe version of the LLM application currently running. This can help identify issues specific to a particular version or track performance improvements across versions.
source-The environment or deployment context of the application, such as production, staging, dev, or evaluation. This can help differentiate between different deployment environments and their respective configurations.
sessionsession_idA unique identifier for the current session or interaction with the LLM application. This can be used to correlate multiple events and trace the flow of a single session. You can either pass your session ID or we will automatically generate one at session start.
user_propertiesuser_idA unique identifier for the user interacting with the LLM application. This can help analyze user-specific behavior and performance.
user_tierAdditional properties or metadata about the user, such as their subscription tier (e.g., free or pro). This can be used for segmented analysis or to identify performance differences across user tiers.
user_tenantIf the application supports multi-tenancy, this field can represent the tenant or organization the user belongs to. This can be useful for analyzing tenant-specific behavior or performance.

Instrumenting RAG and Tool Requests

When your LLM application interacts with external APIs, databases, or vector databases like Pinecone, you can instrument these interactions to evaluate performance, debug issues, and gain insights. Use the following schema to capture relevant information about these interactions.

Event: tool

The tool event represents an interaction with an external resource. Send the following fields:

Root FieldFieldDescription
configproviderThe name of the external service provider offering vector database, API, or other relevant services (e.g., Pinecone, Weaviate, etc.).
instanceThe specific instance or deployment name of the service within the provider’s infrastructure, allowing for differentiation between multiple instances or deployments.
embedding_modelThe name or identifier of the embedding model used for calculating vector similarity, which is particularly relevant for vector databases or services that rely on vector representations of data.
chunk_sizeThe size (in characters or tokens) of the chunks into which data is split before being converted into vectors, if applicable to the service being used. This is important for services that operate on chunked data.
chunk_overlapThe amount of overlap (in characters or tokens) between consecutive chunks of data, if applicable to the service being used. This is also relevant for services that operate on chunked data with overlapping segments.
inputstop_kThe number of top-ranked or most similar results to be retrieved from the vector database or service during a similarity search or ranking operation.
queryThe query string, vector representation, or any other input data used for retrieval, search, or processing by the external service.
outputschunksThe data chunks, documents, or any other output retrieved or obtained from the external service as a result of the query or operation performed.
scoresThe similarity scores, relevance scores, or any other scoring metrics associated with the retrieved chunks or documents, if applicable to the service being used.
duration-The total time taken for the request or interaction with the external service, measured in milliseconds, which can be useful for identifying performance bottlenecks or slow operations.
error-Any errors, exceptions, or error messages that occurred during the retrieval request or interaction with the external service, which can aid in debugging and troubleshooting.