This guide shows how to trace pipelines that handle multi-modal data - images, audio, video, or documents with embedded media.
Auto-instrumentation captures vision calls automatically. If you’re using OpenAI Vision, Gemini Pro Vision, or similar APIs, the LLM calls are traced automatically via instrumentors. This guide covers tracing your custom processing logic around those calls.
When to Use This Guide
Use these patterns when your pipeline includes:
- Image preprocessing before vision model calls
- Audio transcription or synthesis
- Video frame extraction or analysis
- Document parsing with embedded media
- Media storage/retrieval operations
Basic Pattern
Trace multi-modal functions the same way as any other function - use the @trace decorator:
import os
from honeyhive import HoneyHiveTracer, trace, enrich_span
HoneyHiveTracer.init(
api_key=os.getenv("HH_API_KEY"),
project=os.getenv("HH_PROJECT")
)
@trace
def analyze_image(image_path: str, question: str) -> dict:
"""Analyze an image and answer a question about it."""
# Your preprocessing
image_data = load_and_resize(image_path)
# Vision model call (auto-traced if using instrumentor)
response = vision_model.analyze(image_data, question)
return {"answer": response.text, "confidence": response.confidence}
Add context about the media being processed using enrich_span:
@trace
def process_video(video_path: str) -> dict:
"""Extract and analyze frames from video."""
# Add media metadata for debugging and analysis
enrich_span({
"media_type": "video",
"format": "mp4",
"duration_seconds": get_duration(video_path),
"resolution": "1920x1080"
})
frames = extract_keyframes(video_path)
analyses = [analyze_frame(f) for f in frames]
enrich_span({"frames_analyzed": len(frames)})
return {"frame_analyses": analyses}
Don’t log media bytes. Store references (paths, URLs, IDs) instead of raw binary data. This keeps traces lightweight and queryable.
Multi-Step Pipeline Example
For pipelines with multiple processing stages, each traced function becomes a child span:
@trace
def process_document(doc_path: str) -> dict:
"""Process document with embedded images."""
# Each @trace function creates a child span
text = extract_text(doc_path) # Child span
images = extract_images(doc_path) # Child span
summaries = []
for img in images:
summary = analyze_image(img) # Child span per image
summaries.append(summary)
return {
"text": text,
"image_summaries": summaries
}
@trace
def extract_text(doc_path: str) -> str:
enrich_span({"step": "text_extraction"})
# ... extraction logic
return text
@trace
def extract_images(doc_path: str) -> list:
enrich_span({"step": "image_extraction"})
# ... extraction logic
return image_paths
@trace
def analyze_image(image_path: str) -> str:
enrich_span({
"step": "image_analysis",
"image_path": image_path
})
# ... vision model call
return summary
The trace tree shows the full pipeline hierarchy:
process_document
├── extract_text
├── extract_images
├── analyze_image (image_1)
├── analyze_image (image_2)
└── analyze_image (image_3)
| Field | Description | Example |
|---|
media_type | Type of media | "image", "audio", "video" |
format | File format | "png", "wav", "mp4" |
duration_seconds | Length for audio/video | 120.5 |
resolution | Dimensions | "1920x1080" |
file_size_bytes | Size for performance tracking | 1048576 |
source_url | Reference to original | "s3://bucket/file.png" |
processing_steps | Operations performed | ["resize", "normalize"] |