HoneyHive OpenAI Tracing Guide

This comprehensive guide explains how to use HoneyHive to trace and monitor OpenAI API calls. We’ll cover the setup process and explore each type of trace with practical examples from our cookbook code.

Getting Started

Installation

First, install the required packages as specified in requirements.txt:

pip install openai honeyhive pydantic

Basic Setup

To start tracing your OpenAI calls, initialize the HoneyHive tracer at the beginning of your application:

from openai import OpenAI
from honeyhive import HoneyHiveTracer, trace

# Initialize HoneyHive tracer
HoneyHiveTracer.init(
    api_key='your-honeyhive-api-key',
    project='OpenAI-traces',
    # Optional parameters
    source='dev',                  # Environment: 'dev', 'staging', 'prod', etc.
    session_name='openai-session'  # Custom session name for better organization
)

# Initialize OpenAI client
client = OpenAI(api_key='your-openai-api-key')

This initialization, found in all our example files, enables automatic instrumentation for all OpenAI API calls.

Types of OpenAI Traces

HoneyHive provides automatic instrumentation for various OpenAI features. Let’s examine each type in detail:

1. Basic Chat Completions

The most common OpenAI interaction is the chat completion, which HoneyHive traces automatically.

From basic_chat.py:

# Simple function to call OpenAI chat completions API
@trace(name="basic_chat_completion", tags={"type": "chat_completion"})
def basic_chat_completion():
    """Make a simple chat completion call to OpenAI API."""
    try:
        # This call will be automatically traced by HoneyHive
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "What is the capital of France?"}
            ],
            temperature=0.7,
            max_tokens=150
        )
        
        # Return the response content
        return response.choices[0].message.content
    except Exception as e:
        # Errors will be captured in the trace
        print(f"Error: {e}")
        raise

What HoneyHive captures:

  • Request details (model, messages, parameters)
  • Response content
  • Token usage (prompt, completion, total)
  • Latency metrics
  • Any errors or exceptions

Enhancing Chat Completion Traces

For richer context, add custom metadata and tags to your traces, as shown in basic_chat.py:

@trace(name="annotated_chat_completion", 
       tags={"type": "chat_completion", "purpose": "geography_question"}, 
       metadata={"user_id": "test-user-123"})
def annotated_chat_completion(question):
    """Make a chat completion call with custom annotations and metadata."""
    # Implementation...

This additional information makes it easier to filter, search, and analyze your traces in the HoneyHive dashboard.

2. Function Calling

Function calling is a powerful OpenAI feature that HoneyHive captures in detail. The trace includes the initial request, function execution, and final response.

From function_calling.py:

@trace(name="basic_function_calling", tags={"type": "function_calling"})
def basic_function_calling():
    # Define the tools (functions) the model can use
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather in a specified location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and country, e.g., 'San Francisco, CA' or 'Paris, France'"
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Default is celsius."
                        }
                    },
                    "required": ["location"]
                }
            }
        }
    ]
    
    # Make a request to the OpenAI API
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather like in Paris today?"}
    ]
    
    # This API call will be traced by HoneyHive
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    
    # Process response and function calls...

Additionally, tracing the actual functions being called provides a complete picture:

@trace(name="get_weather_function", tags={"type": "external_function"})
def get_weather(location, unit="celsius"):
    """
    Get the current weather in a given location.
    This is a mock function that would typically call a weather API.
    """
    # Implementation...
    return weather_data

What HoneyHive captures for function calling:

  • The initial request with tools definition
  • Function call arguments from the model
  • Function execution details
  • Second API call with function results
  • Final assistant response

3. Structured Outputs

Structured outputs ensure the model’s response adheres to a specific format, either JSON or a Pydantic model. HoneyHive traces these specialized responses including the schema definition.

From structured_output.py:

# Simple JSON schema response format
@trace(name="json_response_format", tags={"type": "structured_output", "format": "json"})
def get_structured_json():
    """Get a structured JSON response using the response_format parameter."""
    try:
        response = client.chat.completions.create(
            model="gpt-4o-2024-08-06",  # Make sure to use a model that supports JSON response format
            messages=[
                {"role": "system", "content": "You are a helpful assistant that provides weather information."},
                {"role": "user", "content": "What's the weather like in New York today?"}
            ],
            response_format={"type": "json_object"}
        )
        
        return response.choices[0].message.content
    except Exception as e:
        print(f"Error: {e}")
        raise

More advanced structured outputs using JSON schema:

@trace(name="json_schema_output", tags={"type": "structured_output", "format": "json_schema"})
def get_json_schema_output():
    """Get a structured response using a JSON schema."""
    try:
        # Define a JSON schema
        json_schema = {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "current_weather": {
                    "type": "object",
                    "properties": {
                        "temperature": {"type": "number"},
                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                        "conditions": {"type": "string"},
                        "precipitation_chance": {"type": "number"}
                    },
                    "required": ["temperature", "unit", "conditions", "precipitation_chance"]
                },
                "forecast": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "day": {"type": "string"},
                            "temperature": {"type": "number"},
                            "conditions": {"type": "string"}
                        },
                        "required": ["day", "temperature", "conditions"]
                    }
                }
            },
            "required": ["location", "current_weather", "forecast"]
        }
        
        response = client.chat.completions.create(
            model="gpt-4o-2024-08-06",
            messages=[...],
            response_format={"type": "json_schema", "schema": json_schema}
        )
        
        return response.choices[0].message.content
    except Exception as e:
        print(f"Error: {e}")
        raise

And using Pydantic models:

@trace(name="pydantic_structured_output", tags={"type": "structured_output", "format": "pydantic"})
def get_pydantic_structured_output():
    """Get a structured response using Pydantic models."""
    try:
        completion = client.beta.chat.completions.parse(
            model="gpt-4o-2024-08-06",
            messages=[...],
            response_format=Person
        )
        
        # The parsed attribute contains the structured data
        person = completion.choices[0].message.parsed
        return person
    except Exception as e:
        print(f"Error: {e}")
        raise

What HoneyHive captures for structured outputs:

  • The schema or model definition
  • Response parsing process
  • Structured data output
  • Any parsing errors

4. Reasoning Models

OpenAI’s reasoning models (o1, o3-mini) have unique tracing needs, particularly around reasoning tokens and effort levels.

From reasoning_models.py:

@trace(name="reasoning_model_o1", tags={"type": "reasoning_model", "model": "o1"})
def call_o1_model():
    """
    Demonstrate calling the o1 reasoning model and trace the request/response.
    """
    try:
        # Complex math problem that benefits from reasoning capability
        response = client.chat.completions.create(
            model="o1",
            messages=[
                {"role": "system", "content": "You are a helpful math assistant."},
                {"role": "user", "content": "Solve this step by step: Integrate x^3 * ln(x) with respect to x."}
            ],
            reasoning_effort="high"  # Use high reasoning effort for complex problems
        )
        
        # Extract the response and the usage information
        content = response.choices[0].message.content
        reasoning_tokens = response.usage.completion_tokens_details.reasoning_tokens if hasattr(response.usage, "completion_tokens_details") else None
        
        return {
            "content": content,
            "usage": {
                "prompt_tokens": response.usage.prompt_tokens,
                "completion_tokens": response.usage.completion_tokens,
                "total_tokens": response.usage.total_tokens,
                "reasoning_tokens": reasoning_tokens
            }
        }
    except Exception as e:
        print(f"Error: {e}")
        raise

You can also compare different reasoning effort levels:

@trace(name="reasoning_model_o1_with_effort", tags={"type": "reasoning_model", "model": "o1"})
def call_o1_model_with_effort(problem, effort="medium"):
    """
    Demonstrate calling the o1 model with different reasoning efforts.
    
    Args:
        problem: Math problem to solve
        effort: Reasoning effort ('low', 'medium', or 'high')
    """
    # Implementation...

What HoneyHive captures for reasoning models:

  • Standard request and response details
  • Reasoning token usage
  • Reasoning effort level
  • Model-specific parameters

5. Multi-turn Conversations

Tracing conversations across multiple turns provides a complete history and context. From multi_turn_conversation.py:

class Conversation:
    """
    Class to manage a conversation with the OpenAI API.
    Each turn in the conversation is traced by HoneyHive.
    """
    def __init__(self, system_message="You are a helpful assistant."):
        self.messages = [{"role": "system", "content": system_message}]
        self.turn_count = 0
    
    @trace(name="conversation_turn", tags={"type": "conversation"})
    def add_user_message(self, content):
        """Add a user message to the conversation and get the assistant's response."""
        # Increment turn count
        self.turn_count += 1
        
        # Add user message to the conversation
        self.messages.append({"role": "user", "content": content})
        
        try:
            # Get assistant response
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=self.messages,
                temperature=0.7,
                max_tokens=150
            )
            
            # Extract the assistant's message
            assistant_message = response.choices[0].message
            
            # Add assistant message to the conversation
            self.messages.append({"role": "assistant", "content": assistant_message.content})
            
            return {
                "role": assistant_message.role,
                "content": assistant_message.content,
                "turn": self.turn_count,
                "usage": {
                    "prompt_tokens": response.usage.prompt_tokens,
                    "completion_tokens": response.usage.completion_tokens,
                    "total_tokens": response.usage.total_tokens
                }
            }
        except Exception as e:
            print(f"Error in turn {self.turn_count}: {e}")
            raise

Using this class in a full conversation:

@trace(name="rich_conversation", tags={"type": "conversation", "topic": "varied"})
def run_rich_conversation():
    """Run a multi-turn conversation with the assistant on various topics."""
    # Initialize conversation with a broad system message
    conversation = Conversation(
        system_message="You are a knowledgeable assistant able to discuss a wide range of topics."
    )
    
    # First turn - Ask about a historical event
    turn1 = conversation.add_user_message("Can you tell me about the Apollo 11 mission?")
    
    # Second turn - Follow up on the same topic
    turn2 = conversation.add_user_message("What were the names of the astronauts on that mission?")
    
    # Third turn - Change the topic
    turn3 = conversation.add_user_message("Let's switch topics. Can you explain how photosynthesis works?")
    
    # Fourth turn - Ask for a summary of the conversation
    turn4 = conversation.add_user_message("Can you summarize what we've discussed so far?")
    
    return conversation.get_conversation_history()

What HoneyHive captures for multi-turn conversations:

  • Individual turns as separate traces
  • Message history accumulation
  • Token usage across turns
  • Context of the entire conversation
  • Relationships between turns

Conclusion

HoneyHive provides comprehensive observability for your OpenAI applications, giving you insights into performance, costs, and behavior. With automatic instrumentation and custom tracing, you can easily monitor and optimize your AI system.

Get started by initializing HoneyHive in your application and watch as your OpenAI calls are automatically traced!