Skip to main content

Why Sample

HoneyHive traces asynchronously and batches span exports, so overhead is minimal. At very high volumes, however, you may want to selectively trace to control cost and storage. The goal is to capture every trace that matters while reducing noise from routine traffic.

Strategies

Always Trace Errors

Failed requests are the most valuable traces. Always capture them regardless of sampling rate.
def should_trace(request):
    if request.get("error"):
        return True
    if request.get("status_code", 200) >= 400:
        return True
    return False

Always Trace High-Priority Requests

Premium users, flagged accounts, or specific endpoints should always be traced.
def should_trace(request):
    if request.get("customer_tier") == "premium":
        return True
    if request.get("endpoint") in ["/api/checkout", "/api/agent"]:
        return True
    return False

Percentage Sampling

For regular traffic, sample a fixed percentage:
import random

SAMPLE_RATE = 0.01  # 1% of traffic

def should_trace(request):
    return random.random() < SAMPLE_RATE

Combined

In practice, combine these strategies:
import random

def should_trace(request):
    # Always trace errors
    if request.get("error") or request.get("status_code", 200) >= 400:
        return True
    # Always trace premium users
    if request.get("customer_tier") == "premium":
        return True
    # Sample 1% of regular traffic
    return random.random() < 0.01

Applying Sampling

Use your sampling function to conditionally initialize the tracer or skip tracing for a given request:
from honeyhive import HoneyHiveTracer, trace

tracer = HoneyHiveTracer.init(
    project="my-app",
    source="production"
)

@app.route("/api/process", methods=["POST"])
def handle_request():
    if not should_trace(request):
        return process(request.json)

    @trace(tracer=tracer)
    def traced_process(data):
        return process(data)

    return traced_process(request.json)

Payload Size

Independent of sampling, avoid tracing large objects directly. Trace metadata about the data instead:
from honeyhive import trace, enrich_span

@trace()
def process(large_data: dict):
    enrich_span({
        "data.size_mb": len(str(large_data)) / 1024 / 1024,
        "data.keys_count": len(large_data)
    })
    # Don't do this:
    # enrich_span({"data.full_content": large_data})