Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt

Use this file to discover all available pages before exploring further.

HoneyHive Python SDK v1.0 supports the HoneyHive v2 platform launch. Use this guide to upgrade v0.x projects to SDK v1 and adopt v2-ready tracing and evaluation patterns.
Use this HoneyHive migration guide: https://docs.honeyhive.ai/v2/sdk-reference/python/migration/v0-to-v1.md

Explore my codebase to find where the HoneyHive Python SDK is used. Ask me any questions needed to understand the project context, runtime, and migration constraints. Then present a concise migration plan for my confirmation. After I confirm the plan, implement the migration changes.

What changed

Areav0.xv1.0
Dataset ground truth keyground_truthsground_truth
Client API responsesNested dict-style access commonTyped response models use attribute access
Evaluated function inputCommonly (inputs, ground_truths)A single datapoint dict
Evaluator parametersoutputs, inputs, ground_truthsoutputs, inputs, ground_truth
Result displayManual inspectionprint_results=True, or print_results=False plus result.print_table()
Session isolationInstance-level session statecreate_session() and with_session() use OpenTelemetry baggage
EnrichmentGlobal helper functions commonTracer instance methods recommended

1. Upgrade the package

pip install --upgrade "honeyhive>=1.0.0"
Verify the installed version:
import honeyhive

print(honeyhive.__version__)

2. Update direct client API usage

If you use lower-level client APIs such as client.events, client.datapoints, client.datasets, or client.metrics, many nested request and response objects are typed Pydantic models in v1. For responses, replace dict-style access with attribute access:
new_ids = response.result["insertedIds"]
event_id = response.events[0]["event_id"]
metric_name = metrics[0]["name"]
If you still need the dict shape for logging or serialization, call model_dump() on the typed model:
payload = response.result.model_dump()
event_payload = response.events[0].model_dump()
For requests, nested dicts are still commonly coerced into the right typed models, but wrapper methods now validate payloads earlier. For example, a missing required event_type can now raise pydantic.ValidationError at request-construction time instead of surfacing only as a backend error after the request is sent. If your environment pins OpenTelemetry or Traceloop instrumentor versions, review those pins during the upgrade. SDK v1 requires newer minimum versions than older 0.x releases.

3. Use ground_truth in datasets

SDK v1 reads the singular ground_truth key from experiment datapoints. If your v0 datasets use ground_truths, rename that field in local datasets and dataset creation code. This is a field-name change, not a restriction on the value shape. If you previously stored multiple expected values inside ground_truths, keep the same inner structure under ground_truth.
dataset = [
    {
        "inputs": {"query": "What is the capital of France?"},
        "ground_truths": {"answer": "Paris"},
    },
    {
        "inputs": {"query": "What is 2+2?"},
        "ground_truths": {"answer": "4"},
    },
]

4. Update evaluated functions

In v1, evaluate() passes the full datapoint to your function. Read inputs and ground_truth from that datapoint.
def classify_intent(inputs, ground_truths):
    text = inputs["text"]
    return {"intent": classify(text)}

evaluate(
    function=classify_intent,
    dataset=dataset,
    name="intent-classifier",
    project="support-bot",
)
If your function needs the active tracer created by evaluate(), add a tracer keyword parameter: def classify_intent(datapoint, tracer): ....

5. Update evaluator parameters

In v1, evaluators can be plain functions. @evaluator is still accepted for compatibility but no longer required. Rename the third parameter to ground_truth so it matches the v1 dataset shape.
from honeyhive import evaluator

@evaluator
def accuracy_check(outputs, inputs, ground_truths):
    expected = ground_truths["intent"]
    actual = outputs["intent"]
    return {
        "score": 1.0 if actual == expected else 0.0,
        "pass": actual == expected,
    }
@evaluator remains available for compatibility, but plain functions can be passed directly in the evaluators list.

6. Update LLM evaluator templates

If you reference ground truth in evaluator prompts, update template variables.
prompt = """
Compare the output to the expected answer.

Expected: {{feedback.ground_truths.answer}}
Actual: {{outputs.answer}}
"""

7. Use the v1 result summary

evaluate() returns an experiment result summary and prints a formatted table by default. If you want to render the table yourself, pass print_results=False and call result.print_table(...) later.
from honeyhive import evaluate

result = evaluate(
    function=classify_intent,
    dataset=dataset,
    evaluators=[accuracy_check],
    name="intent-classifier",
    project="support-bot",
    print_results=False,
)

result.print_table(run_name="intent-classifier")
If you do not need manual control, omit print_results=False and let evaluate() print the table automatically.

8. Update enrichment calls

Free functions such as enrich_span() still work in v1, but instance methods are the primary pattern because they are explicit and work better with multiple tracers.
from honeyhive import enrich_span, trace

@trace
def retrieve_documents(query):
    docs = vector_search(query)
    enrich_span(
        metadata={"query": query},
        outputs={"documents": docs},
    )
    return docs
Use explicit namespaces for structured data:
NamespaceUse for
metadataCustom searchable context
metricsScores, latency, token counts, numeric measurements
feedbackRatings, labels, corrections, ground truth
inputsSpan or session input payloads
outputsSpan or session output payloads
configModel and application configuration
user_propertiesUser attributes
See Enriching Traces and Enrichment Schema.

9. Replace manual session state in concurrent apps

If your v0 integration reused one tracer across many requests, create request-scoped sessions in v1.
from fastapi import FastAPI, Request
from honeyhive import HoneyHiveTracer, trace

app = FastAPI()
tracer = HoneyHiveTracer.init(
    api_key="your-api-key",
    project="support-bot",
)

@app.middleware("http")
async def honeyhive_session(request: Request, call_next):
    await tracer.acreate_session(
        session_name=f"{request.method} {request.url.path}",
        inputs={"path": str(request.url.path)},
    )
    response = await call_next(request)
    tracer.enrich_session(outputs={"status_code": response.status_code})
    return response

@app.post("/chat")
@trace(event_type="chain", tracer=tracer)
async def chat(payload: dict):
    return await answer(payload["message"])
For scoped scripts, use with_session():
with tracer.with_session("batch-job", inputs={"file": "tickets.csv"}):
    run_batch_job()
See Tracer Initialization and Multi-instance Tracing.

10. Simplify distributed tracing

v1 includes helpers for carrying OpenTelemetry trace context across services.
from opentelemetry import context
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator

def handle_request(request):
    propagator = TraceContextTextMapPropagator()
    ctx = propagator.extract(request.headers)
    token = context.attach(ctx)
    try:
        return process_request()
    finally:
        context.detach(token)
See Distributed Tracing.

Troubleshooting

Ground truth does not appear in HoneyHive

Check that every datapoint uses ground_truth, not only ground_truths. SDK v1 reads ground_truth from each datapoint.
dataset = [{"inputs": {"text": "charge issue"}, "ground_truth": {"intent": "billing"}}]

Dict-style client response access breaks after upgrading

If you use direct client APIs, switch from subscripts to attributes on typed response models:
new_ids = response.result.insertedIds
event_id = response.events[0].event_id
Use model_dump() only when you need a plain dict for serialization or an external boundary.

Evaluators do not receive ground truth

Use the singular evaluator parameter:
def my_evaluator(outputs, inputs, ground_truth=None):
    if ground_truth is None:
        return {"score": 0.0}
    return {"score": outputs["answer"] == ground_truth["answer"]}

Traces from different requests share one session

Use create_session() or acreate_session() per request instead of mutating tracer.session_id or relying on session_start() in a shared server process.

Import errors after upgrading

Confirm the runtime is using the upgraded package:
import honeyhive

print(honeyhive.__version__)

Migration checklist

  • Upgrade to honeyhive>=1.0.0
  • Update direct client API code from dict-style response access to typed-model attribute access
  • Review request validation changes if you build payloads for client.events, client.datapoints, client.datasets, or client.metrics
  • Rename ground_truths to ground_truth in datasets
  • Update evaluated functions to accept a datapoint dict
  • Update evaluators to use ground_truth
  • Update LLM evaluator templates from feedback.ground_truths to feedback.ground_truth
  • Use tracer instance methods for new enrichment code
  • Use create_session() or with_session() for request-scoped sessions
  • Review pinned OpenTelemetry or Traceloop instrumentor versions, if any
  • Run an evaluation and confirm table output, scores, and ground truth render correctly

Migrate from Logger to v1

Replace honeyhive-logger calls with SDK v1 tracing.

Migrate from honeyhive-bundled to v1

Replace the bundled package with the stable SDK.

Evaluation Quickstart

Run your first experiment with evaluate().

Tracer Initialization

Choose the right tracer and session pattern for your runtime.

Enriching Traces

Add metadata, metrics, feedback, and user properties to traces.