Skip to main content
This guide shows how to trace pipelines that handle multi-modal data - images, audio, video, or documents with embedded media.
Auto-instrumentation captures vision calls automatically. If you’re using OpenAI Vision, Gemini Pro Vision, or similar APIs, the LLM calls are traced automatically via instrumentors. This guide covers tracing your custom processing logic around those calls.

When to Use This Guide

Use these patterns when your pipeline includes:
  • Image preprocessing before vision model calls
  • Audio transcription or synthesis
  • Video frame extraction or analysis
  • Document parsing with embedded media
  • Media storage/retrieval operations

Basic Pattern

Trace multi-modal functions the same way as any other function - use the @trace decorator:
import os
from honeyhive import HoneyHiveTracer, trace, enrich_span

HoneyHiveTracer.init(
    api_key=os.getenv("HH_API_KEY"),
    project=os.getenv("HH_PROJECT")
)

@trace
def analyze_image(image_path: str, question: str) -> dict:
    """Analyze an image and answer a question about it."""
    
    # Your preprocessing
    image_data = load_and_resize(image_path)
    
    # Vision model call (auto-traced if using instrumentor)
    response = vision_model.analyze(image_data, question)
    
    return {"answer": response.text, "confidence": response.confidence}

Adding Media Metadata

Add context about the media being processed using enrich_span:
@trace
def process_video(video_path: str) -> dict:
    """Extract and analyze frames from video."""
    
    # Add media metadata for debugging and analysis
    enrich_span({
        "media_type": "video",
        "format": "mp4",
        "duration_seconds": get_duration(video_path),
        "resolution": "1920x1080"
    })
    
    frames = extract_keyframes(video_path)
    analyses = [analyze_frame(f) for f in frames]
    
    enrich_span({"frames_analyzed": len(frames)})
    
    return {"frame_analyses": analyses}
Don’t log media bytes. Store references (paths, URLs, IDs) instead of raw binary data. This keeps traces lightweight and queryable.

Multi-Step Pipeline Example

For pipelines with multiple processing stages, each traced function becomes a child span:
@trace
def process_document(doc_path: str) -> dict:
    """Process document with embedded images."""
    
    # Each @trace function creates a child span
    text = extract_text(doc_path)           # Child span
    images = extract_images(doc_path)       # Child span
    
    summaries = []
    for img in images:
        summary = analyze_image(img)        # Child span per image
        summaries.append(summary)
    
    return {
        "text": text,
        "image_summaries": summaries
    }

@trace
def extract_text(doc_path: str) -> str:
    enrich_span({"step": "text_extraction"})
    # ... extraction logic
    return text

@trace  
def extract_images(doc_path: str) -> list:
    enrich_span({"step": "image_extraction"})
    # ... extraction logic
    return image_paths

@trace
def analyze_image(image_path: str) -> str:
    enrich_span({
        "step": "image_analysis",
        "image_path": image_path
    })
    # ... vision model call
    return summary
The trace tree shows the full pipeline hierarchy:
process_document
├── extract_text
├── extract_images
├── analyze_image (image_1)
├── analyze_image (image_2)
└── analyze_image (image_3)

Useful Metadata Fields

FieldDescriptionExample
media_typeType of media"image", "audio", "video"
formatFile format"png", "wav", "mp4"
duration_secondsLength for audio/video120.5
resolutionDimensions"1920x1080"
file_size_bytesSize for performance tracking1048576
source_urlReference to original"s3://bucket/file.png"
processing_stepsOperations performed["resize", "normalize"]