This guide shows how to trace pipelines that handle multi-modal data - images, audio, video, or documents with embedded media.
Auto-instrumentation captures vision calls automatically. If you’re using OpenAI Vision, Gemini Pro Vision, or similar APIs, the LLM calls are traced automatically via instrumentors . This guide covers tracing your custom processing logic around those calls.
When to Use This Guide
Use these patterns when your pipeline includes:
Image preprocessing before vision model calls
Audio transcription or synthesis
Video frame extraction or analysis
Document parsing with embedded media
Media storage/retrieval operations
Basic Pattern
Trace multi-modal functions the same way as any other function - use the @trace decorator:
import os
from honeyhive import HoneyHiveTracer, trace, enrich_span
HoneyHiveTracer.init(
api_key = os.getenv( "HH_API_KEY" ),
project = os.getenv( "HH_PROJECT" )
)
@trace
def analyze_image ( image_path : str , question : str ) -> dict :
"""Analyze an image and answer a question about it."""
# Your preprocessing
image_data = load_and_resize(image_path)
# Vision model call (auto-traced if using instrumentor)
response = vision_model.analyze(image_data, question)
return { "answer" : response.text, "confidence" : response.confidence}
Add context about the media being processed using enrich_span:
@trace
def process_video ( video_path : str ) -> dict :
"""Extract and analyze frames from video."""
# Add media metadata for debugging and analysis
enrich_span({
"media_type" : "video" ,
"format" : "mp4" ,
"duration_seconds" : get_duration(video_path),
"resolution" : "1920x1080"
})
frames = extract_keyframes(video_path)
analyses = [analyze_frame(f) for f in frames]
enrich_span({ "frames_analyzed" : len (frames)})
return { "frame_analyses" : analyses}
Don’t log media bytes. Store references (paths, URLs, IDs) instead of raw binary data. This keeps traces lightweight and queryable.
Multi-Step Pipeline Example
For pipelines with multiple processing stages, each traced function becomes a child span:
@trace
def process_document ( doc_path : str ) -> dict :
"""Process document with embedded images."""
# Each @trace function creates a child span
text = extract_text(doc_path) # Child span
images = extract_images(doc_path) # Child span
summaries = []
for img in images:
summary = analyze_image(img) # Child span per image
summaries.append(summary)
return {
"text" : text,
"image_summaries" : summaries
}
@trace
def extract_text ( doc_path : str ) -> str :
enrich_span({ "step" : "text_extraction" })
# ... extraction logic
return text
@trace
def extract_images ( doc_path : str ) -> list :
enrich_span({ "step" : "image_extraction" })
# ... extraction logic
return image_paths
@trace
def analyze_image ( image_path : str ) -> str :
enrich_span({
"step" : "image_analysis" ,
"image_path" : image_path
})
# ... vision model call
return summary
The trace tree shows the full pipeline hierarchy:
process_document
├── extract_text
├── extract_images
├── analyze_image (image_1)
├── analyze_image (image_2)
└── analyze_image (image_3)
Field Description Example media_typeType of media "image", "audio", "video"formatFile format "png", "wav", "mp4"duration_secondsLength for audio/video 120.5resolutionDimensions "1920x1080"file_size_bytesSize for performance tracking 1048576source_urlReference to original "s3://bucket/file.png"processing_stepsOperations performed ["resize", "normalize"]
Custom Spans Full guide to the @trace decorator
Enriching Traces Adding metadata with enrich_span
OpenAI Vision Auto-tracing for OpenAI vision calls
Gemini Vision Auto-tracing for Gemini vision calls