HoneyHive OpenAI Tracing Guide
This comprehensive guide explains how to use HoneyHive to trace and monitor OpenAI API calls. We’ll cover the setup process and explore each type of trace with practical examples from our cookbook code.
Getting Started
Installation
First, install the required packages as specified in requirements.txt
:
pip install openai honeyhive pydantic
Basic Setup
To start tracing your OpenAI calls, initialize the HoneyHive tracer at the beginning of your application:
from openai import OpenAI
from honeyhive import HoneyHiveTracer, trace
# Initialize HoneyHive tracer
HoneyHiveTracer.init(
api_key='your-honeyhive-api-key',
project='OpenAI-traces',
# Optional parameters
source='dev', # Environment: 'dev', 'staging', 'prod', etc.
session_name='openai-session' # Custom session name for better organization
)
# Initialize OpenAI client
client = OpenAI(api_key='your-openai-api-key')
This initialization, found in all our example files, enables automatic instrumentation for all OpenAI API calls.
Types of OpenAI Traces
HoneyHive provides automatic instrumentation for various OpenAI features. Let’s examine each type in detail:
1. Basic Chat Completions
The most common OpenAI interaction is the chat completion, which HoneyHive traces automatically.
From basic_chat.py
:
# Simple function to call OpenAI chat completions API
@trace(name="basic_chat_completion", tags={"type": "chat_completion"})
def basic_chat_completion():
"""Make a simple chat completion call to OpenAI API."""
try:
# This call will be automatically traced by HoneyHive
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
temperature=0.7,
max_tokens=150
)
# Return the response content
return response.choices[0].message.content
except Exception as e:
# Errors will be captured in the trace
print(f"Error: {e}")
raise
What HoneyHive captures:
- Request details (model, messages, parameters)
- Response content
- Token usage (prompt, completion, total)
- Latency metrics
- Any errors or exceptions
Enhancing Chat Completion Traces
For richer context, add custom metadata and tags to your traces, as shown in basic_chat.py
:
@trace(name="annotated_chat_completion",
tags={"type": "chat_completion", "purpose": "geography_question"},
metadata={"user_id": "test-user-123"})
def annotated_chat_completion(question):
"""Make a chat completion call with custom annotations and metadata."""
# Implementation...
This additional information makes it easier to filter, search, and analyze your traces in the HoneyHive dashboard.
2. Function Calling
Function calling is a powerful OpenAI feature that HoneyHive captures in detail. The trace includes the initial request, function execution, and final response.
From function_calling.py
:
@trace(name="basic_function_calling", tags={"type": "function_calling"})
def basic_function_calling():
# Define the tools (functions) the model can use
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a specified location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and country, e.g., 'San Francisco, CA' or 'Paris, France'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Default is celsius."
}
},
"required": ["location"]
}
}
}
]
# Make a request to the OpenAI API
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather like in Paris today?"}
]
# This API call will be traced by HoneyHive
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
tool_choice="auto"
)
# Process response and function calls...
Additionally, tracing the actual functions being called provides a complete picture:
@trace(name="get_weather_function", tags={"type": "external_function"})
def get_weather(location, unit="celsius"):
"""
Get the current weather in a given location.
This is a mock function that would typically call a weather API.
"""
# Implementation...
return weather_data
What HoneyHive captures for function calling:
- The initial request with tools definition
- Function call arguments from the model
- Function execution details
- Second API call with function results
- Final assistant response
3. Structured Outputs
Structured outputs ensure the model’s response adheres to a specific format, either JSON or a Pydantic model. HoneyHive traces these specialized responses including the schema definition.
From structured_output.py
:
# Simple JSON schema response format
@trace(name="json_response_format", tags={"type": "structured_output", "format": "json"})
def get_structured_json():
"""Get a structured JSON response using the response_format parameter."""
try:
response = client.chat.completions.create(
model="gpt-4o-2024-08-06", # Make sure to use a model that supports JSON response format
messages=[
{"role": "system", "content": "You are a helpful assistant that provides weather information."},
{"role": "user", "content": "What's the weather like in New York today?"}
],
response_format={"type": "json_object"}
)
return response.choices[0].message.content
except Exception as e:
print(f"Error: {e}")
raise
More advanced structured outputs using JSON schema:
@trace(name="json_schema_output", tags={"type": "structured_output", "format": "json_schema"})
def get_json_schema_output():
"""Get a structured response using a JSON schema."""
try:
# Define a JSON schema
json_schema = {
"type": "object",
"properties": {
"location": {"type": "string"},
"current_weather": {
"type": "object",
"properties": {
"temperature": {"type": "number"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
"conditions": {"type": "string"},
"precipitation_chance": {"type": "number"}
},
"required": ["temperature", "unit", "conditions", "precipitation_chance"]
},
"forecast": {
"type": "array",
"items": {
"type": "object",
"properties": {
"day": {"type": "string"},
"temperature": {"type": "number"},
"conditions": {"type": "string"}
},
"required": ["day", "temperature", "conditions"]
}
}
},
"required": ["location", "current_weather", "forecast"]
}
response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[...],
response_format={"type": "json_schema", "schema": json_schema}
)
return response.choices[0].message.content
except Exception as e:
print(f"Error: {e}")
raise
And using Pydantic models:
@trace(name="pydantic_structured_output", tags={"type": "structured_output", "format": "pydantic"})
def get_pydantic_structured_output():
"""Get a structured response using Pydantic models."""
try:
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[...],
response_format=Person
)
# The parsed attribute contains the structured data
person = completion.choices[0].message.parsed
return person
except Exception as e:
print(f"Error: {e}")
raise
What HoneyHive captures for structured outputs:
- The schema or model definition
- Response parsing process
- Structured data output
- Any parsing errors
4. Reasoning Models
OpenAI’s reasoning models (o1, o3-mini) have unique tracing needs, particularly around reasoning tokens and effort levels.
From reasoning_models.py
:
@trace(name="reasoning_model_o1", tags={"type": "reasoning_model", "model": "o1"})
def call_o1_model():
"""
Demonstrate calling the o1 reasoning model and trace the request/response.
"""
try:
# Complex math problem that benefits from reasoning capability
response = client.chat.completions.create(
model="o1",
messages=[
{"role": "system", "content": "You are a helpful math assistant."},
{"role": "user", "content": "Solve this step by step: Integrate x^3 * ln(x) with respect to x."}
],
reasoning_effort="high" # Use high reasoning effort for complex problems
)
# Extract the response and the usage information
content = response.choices[0].message.content
reasoning_tokens = response.usage.completion_tokens_details.reasoning_tokens if hasattr(response.usage, "completion_tokens_details") else None
return {
"content": content,
"usage": {
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
"total_tokens": response.usage.total_tokens,
"reasoning_tokens": reasoning_tokens
}
}
except Exception as e:
print(f"Error: {e}")
raise
You can also compare different reasoning effort levels:
@trace(name="reasoning_model_o1_with_effort", tags={"type": "reasoning_model", "model": "o1"})
def call_o1_model_with_effort(problem, effort="medium"):
"""
Demonstrate calling the o1 model with different reasoning efforts.
Args:
problem: Math problem to solve
effort: Reasoning effort ('low', 'medium', or 'high')
"""
# Implementation...
What HoneyHive captures for reasoning models:
- Standard request and response details
- Reasoning token usage
- Reasoning effort level
- Model-specific parameters
5. Multi-turn Conversations
Tracing conversations across multiple turns provides a complete history and context. From multi_turn_conversation.py
:
class Conversation:
"""
Class to manage a conversation with the OpenAI API.
Each turn in the conversation is traced by HoneyHive.
"""
def __init__(self, system_message="You are a helpful assistant."):
self.messages = [{"role": "system", "content": system_message}]
self.turn_count = 0
@trace(name="conversation_turn", tags={"type": "conversation"})
def add_user_message(self, content):
"""Add a user message to the conversation and get the assistant's response."""
# Increment turn count
self.turn_count += 1
# Add user message to the conversation
self.messages.append({"role": "user", "content": content})
try:
# Get assistant response
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=self.messages,
temperature=0.7,
max_tokens=150
)
# Extract the assistant's message
assistant_message = response.choices[0].message
# Add assistant message to the conversation
self.messages.append({"role": "assistant", "content": assistant_message.content})
return {
"role": assistant_message.role,
"content": assistant_message.content,
"turn": self.turn_count,
"usage": {
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
"total_tokens": response.usage.total_tokens
}
}
except Exception as e:
print(f"Error in turn {self.turn_count}: {e}")
raise
Using this class in a full conversation:
@trace(name="rich_conversation", tags={"type": "conversation", "topic": "varied"})
def run_rich_conversation():
"""Run a multi-turn conversation with the assistant on various topics."""
# Initialize conversation with a broad system message
conversation = Conversation(
system_message="You are a knowledgeable assistant able to discuss a wide range of topics."
)
# First turn - Ask about a historical event
turn1 = conversation.add_user_message("Can you tell me about the Apollo 11 mission?")
# Second turn - Follow up on the same topic
turn2 = conversation.add_user_message("What were the names of the astronauts on that mission?")
# Third turn - Change the topic
turn3 = conversation.add_user_message("Let's switch topics. Can you explain how photosynthesis works?")
# Fourth turn - Ask for a summary of the conversation
turn4 = conversation.add_user_message("Can you summarize what we've discussed so far?")
return conversation.get_conversation_history()
What HoneyHive captures for multi-turn conversations:
- Individual turns as separate traces
- Message history accumulation
- Token usage across turns
- Context of the entire conversation
- Relationships between turns
Conclusion
HoneyHive provides comprehensive observability for your OpenAI applications, giving you insights into performance, costs, and behavior. With automatic instrumentation and custom tracing, you can easily monitor and optimize your AI system.
Get started by initializing HoneyHive in your application and watch as your OpenAI calls are automatically traced!