Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt

Use this file to discover all available pages before exploring further.

HoneyHive Python SDK v1.0 supports the HoneyHive v2 platform launch. Use this guide to upgrade v0.x projects to SDK v1 and adopt v2-ready tracing and evaluation patterns.
Use this HoneyHive migration guide: https://docs.honeyhive.ai/v2/sdk-reference/python/migration/v0-to-v1.md

Explore my codebase to find where the HoneyHive Python SDK is used. Ask me any questions needed to understand the project context, runtime, and migration constraints. Then present a concise migration plan for my confirmation. After I confirm the plan, implement the migration changes.

What changed

Areav0.xv1.0
Client constructorHoneyHive(bearer_auth=...)HoneyHive(api_key=...) (project= and HH_PROJECT/HONEYHIVE_PROJECT are deprecated no-ops)
Evaluation module pathfrom honeyhive.evaluation import evaluate, evaluatorfrom honeyhive.experiments import evaluate, evaluator (legacy path still works with a DeprecationWarning)
evaluate() argumentsPositional or keywordKeyword-only after function
Dataset ground truth keyground_truthsground_truth
Client API responsesNested dict-style access commonTyped response models use attribute access
Evaluated function inputCommonly (inputs, ground_truths)A single datapoint dict
Evaluator parametersoutputs, inputs, ground_truthsoutputs, inputs, ground_truth (or ground_truth=None to support unlabeled datapoints)
Async evaluated functionsRejected (must be sync)Supported (auto-detected)
Result displayManual inspectionprint_results=True, or print_results=False plus result.print_table()
Session isolationInstance-level session statecreate_session(), acreate_session(), and with_session() use OpenTelemetry baggage
EnrichmentGlobal helper functions commonTracer instance methods recommended; enrich_session() free function now only accepts metadata

1. Upgrade the package

pip install --upgrade "honeyhive>=1.0.0"
Verify the installed version:
import honeyhive

print(honeyhive.__version__)
honeyhive.__version__ is new in SDK v1; on v0.x it raises AttributeError because the attribute was not exported. To detect the installed version without importing honeyhive, use:
from importlib.metadata import version

print(version("honeyhive"))

2. Update HoneyHive client initialization

The HoneyHive client constructor takes different keyword arguments in v1.
from honeyhive import HoneyHive

client = HoneyHive(
    bearer_auth="your-api-key",
    server_url="https://api.honeyhive.ai",
)
  • bearer_auth is renamed to api_key. Passing bearer_auth= to v1 raises TypeError.
  • project=, HH_PROJECT, and HONEYHIVE_PROJECT are all deprecated no-ops in v1.0+. HoneyHive(project=...) and HoneyHiveTracer.init(project=...) emit a DeprecationWarning and ignore the value; evaluate(project=...) accepts the kwarg silently but it also has no functional effect. The HH_PROJECT env var is still parsed for backward compatibility but unused. The backend infers project context from the API key — use the API key for the project you want to write to. All of these will be removed in v2.0.
  • server_url= still works; base_url= is also accepted as an alias in v1.
Some sub-clients were renamed or removed:
v0.xv1.0
client.session (singular)Removed. Use client.sessions (plural).
client.toolsRemoved.
client.projectsRemoved.
client.events, client.datapoints, client.datasets, client.metrics, client.configurations, client.experimentsStill available.
client.sessions, client.evaluations, client.charts are new in v1.

3. Update direct client API usage

If you use lower-level client APIs such as client.events, client.datapoints, client.datasets, or client.metrics, many nested request and response objects are typed Pydantic models in v1. For responses, replace dict-style access with attribute access:
new_ids = response.result["insertedIds"]
event_id = response.events[0]["event_id"]
metric_name = metrics[0]["name"]
If you still need the dict shape for logging or serialization, call model_dump() on the typed model:
payload = response.result.model_dump()
event_payload = response.events[0].model_dump()
For requests, nested dicts are still commonly coerced into the right typed models, but wrapper methods now validate payloads earlier. For example, a missing required event_type can now raise pydantic.ValidationError at request-construction time instead of surfacing only as a backend error after the request is sent. If your environment pins OpenTelemetry or Traceloop instrumentor versions, review those pins during the upgrade. SDK v1 requires newer minimum versions than older 0.x releases.

4. Use ground_truth in datasets

SDK v1 reads the singular ground_truth key from experiment datapoints. If your v0 datasets use ground_truths, rename that field in local datasets and dataset creation code. This is a field-name change, not a restriction on the value shape. If you previously stored multiple expected values inside ground_truths, keep the same inner structure under ground_truth.
dataset = [
    {
        "inputs": {"query": "What is the capital of France?"},
        "ground_truths": {"answer": "Paris"},
    },
    {
        "inputs": {"query": "What is 2+2?"},
        "ground_truths": {"answer": "4"},
    },
]

5. Update evaluated functions

In v1, evaluate() passes the full datapoint to your function. Read inputs and ground_truth from that datapoint.
def classify_intent(inputs, ground_truths):
    text = inputs["text"]
    return {"intent": classify(text)}

evaluate(
    function=classify_intent,
    dataset=dataset,
    name="intent-classifier",
    project="support-bot",
)
If your function needs the active tracer created by evaluate(), add a tracer keyword parameter: def classify_intent(datapoint, tracer): ....
SDK v1 detects async def functions automatically and runs them with asyncio.run per datapoint. You can pass them to evaluate() directly; v0 raised an error and required you to call asyncio.run yourself inside a sync wrapper.

6. Update evaluator parameters

In v1, evaluators can be plain functions. @evaluator is still accepted for compatibility but no longer required. Rename the third parameter to ground_truth so it matches the v1 dataset shape.
from honeyhive import evaluator

@evaluator
def accuracy_check(outputs, inputs, ground_truths):
    expected = ground_truths["intent"]
    actual = outputs["intent"]
    return {
        "score": 1.0 if actual == expected else 0.0,
        "pass": actual == expected,
    }
@evaluator remains available for compatibility, but plain functions can be passed directly in the evaluators list.
In v1, evaluators are invoked with only (outputs, inputs) when a datapoint has no ground_truth. Declare the third parameter with a default (ground_truth=None) so your evaluator also runs on unlabeled datapoints; without a default, those calls raise TypeError.
Evaluators should return either a scalar (bool, int, float, str) or a flat dict whose score, explanation, and any extras are scalars. Nested or non-scalar values are dropped (with a warning) so they cannot silently distort run-comparison diffs.

7. Update LLM evaluator templates

If you reference ground truth in evaluator prompts, update template variables.
prompt = """
Compare the output to the expected answer.

Expected: {{feedback.ground_truths.answer}}
Actual: {{outputs.answer}}
"""

8. Use the v1 result summary

evaluate() returns an experiment result summary and prints a formatted table by default. If you want to render the table yourself, pass print_results=False and call result.print_table(...) later.
from honeyhive import evaluate

result = evaluate(
    function=classify_intent,
    dataset=dataset,
    evaluators=[accuracy_check],
    name="intent-classifier",
    print_results=False,
)

result.print_table(run_name="intent-classifier")
If you do not need manual control, omit print_results=False and let evaluate() print the table automatically.

9. Update enrichment calls

Free functions such as enrich_span() still work in v1, but instance methods are the primary pattern because they are explicit and work better with multiple tracers.
from honeyhive import enrich_span, trace

@trace
def retrieve_documents(query):
    docs = vector_search(query)
    enrich_span(
        metadata={"query": query},
        outputs={"documents": docs},
    )
    return docs
The v1 enrich_session() free function has a much narrower signature than its v0 counterpart:
def enrich_session(
    session_id: str,                        # now required, no default
    metadata: Optional[Dict[str, Any]] = None,
    tracer: Optional[Any] = None,
    tracer_instance: Optional[Any] = None,
) -> None:
Two common v0 idioms break:
  • enrich_session(metadata={...}) (no session_id) raises TypeError: enrich_session() missing 1 required positional argument: 'session_id'. In v0, session_id was resolved from a singleton tracer; in v1 you must pass it explicitly.
  • enrich_session(outputs=..., metrics=..., feedback=..., inputs=..., config=..., user_properties=...) raises TypeError because those kwargs were removed.
Switch these call sites to the tracer instance method, tracer.enrich_session(...), which resolves the active session internally and supports every enrichment namespace.
Use explicit namespaces for structured data:
NamespaceUse for
metadataCustom searchable context
metricsScores, latency, token counts, numeric measurements
feedbackRatings, labels, corrections, ground truth
inputsSpan or session input payloads
outputsSpan or session output payloads
configModel and application configuration
user_propertiesUser attributes
See Enriching Traces and Enrichment Schema.
If you do not want to thread tracer=tracer through every call site, set a process-wide default once at startup:
from honeyhive import HoneyHiveTracer, set_default_tracer

tracer = HoneyHiveTracer.init(api_key="your-api-key")
set_default_tracer(tracer)
After that, free functions like enrich_span() and @trace resolve to that tracer automatically.

10. Replace manual session state in concurrent apps

If your v0 integration reused one tracer across many requests, create request-scoped sessions in v1.
from fastapi import FastAPI, Request
from honeyhive import HoneyHiveTracer, trace

app = FastAPI()
tracer = HoneyHiveTracer.init(api_key="your-api-key")

@app.middleware("http")
async def honeyhive_session(request: Request, call_next):
    await tracer.acreate_session(
        session_name=f"{request.method} {request.url.path}",
        inputs={"path": str(request.url.path)},
    )
    response = await call_next(request)
    tracer.enrich_session(outputs={"status_code": response.status_code})
    return response

@app.post("/chat")
@trace(event_type="chain", tracer=tracer)
async def chat(payload: dict):
    return await answer(payload["message"])
For scoped scripts, use with_session():
with tracer.with_session("batch-job", inputs={"file": "tickets.csv"}):
    run_batch_job()
See Tracer Initialization and Multi-instance Tracing.

11. Simplify distributed tracing

v1 includes helpers for carrying OpenTelemetry trace context across services.
from opentelemetry import context
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator

def handle_request(request):
    propagator = TraceContextTextMapPropagator()
    ctx = propagator.extract(request.headers)
    token = context.attach(ctx)
    try:
        return process_request()
    finally:
        context.detach(token)
See Distributed Tracing.

12. Update imports from honeyhive.evaluation to honeyhive.experiments

The honeyhive.evaluation module is deprecated in v1 and will be removed in v2.0. Imports from it still work but emit a DeprecationWarning. Update the import path:
from honeyhive.evaluation import evaluate, evaluator, aevaluator

13. Update evaluate() call signature

In v1, every argument to evaluate() after function is keyword-only. If you previously passed any of dataset, evaluators, name, or project positionally, switch to keyword arguments. The following v0 evaluate() parameters no longer exist in v1 and will raise TypeError if passed:
  • suite
  • run_concurrently
  • disable_http_tracing
  • metadata
v1 adds these parameters:
  • instrumentors: list of zero-argument factories returning OTEL instrumentor instances (e.g. [lambda: OpenAIInstrumentor()]); a fresh instance is created per datapoint.
  • run_id: optional client-supplied run identifier.
  • aggregate_function: backend aggregation function name ("average", "sum", "min", "max").
evaluate() reads HONEYHIVE_API_KEY / HH_API_KEY for api_key. The project= keyword and any *_PROJECT environment variables are deprecated no-ops everywhere in the v1 SDK — the backend infers project context from the API key. Drop project= from your evaluate(), HoneyHive(), and HoneyHiveTracer.init() calls; use the API key for the project you want to write to. See section 2 for the full deprecation picture.

Troubleshooting

Ground truth does not appear in HoneyHive

Check that every datapoint uses ground_truth, not only ground_truths. SDK v1 reads ground_truth from each datapoint.
dataset = [{"inputs": {"text": "charge issue"}, "ground_truth": {"intent": "billing"}}]

Dict-style client response access breaks after upgrading

If you use direct client APIs, switch from subscripts to attributes on typed response models:
new_ids = response.result.insertedIds
event_id = response.events[0].event_id
Use model_dump() only when you need a plain dict for serialization or an external boundary.

TypeError: HoneyHive.__init__() got an unexpected keyword argument 'bearer_auth'

Replace HoneyHive(bearer_auth=...) with HoneyHive(api_key=...). See Update HoneyHive client initialization.

TypeError: enrich_session() got an unexpected keyword argument 'outputs'

The v1 enrich_session() free function only accepts metadata. Switch to the tracer instance method, tracer.enrich_session(outputs=..., metrics=..., ...). See Update enrichment calls.

Evaluators do not receive ground truth

Use the singular evaluator parameter:
def my_evaluator(outputs, inputs, ground_truth=None):
    if ground_truth is None:
        return {"score": 0.0}
    return {"score": outputs["answer"] == ground_truth["answer"]}

Traces from different requests share one session

Use create_session() or acreate_session() per request instead of mutating tracer.session_id or relying on session_start() in a shared server process.

Import errors after upgrading

Confirm the runtime is using the upgraded package:
from importlib.metadata import version

print(version("honeyhive"))

Migration checklist

  • Upgrade to honeyhive>=1.0.0
  • Replace HoneyHive(bearer_auth=...) with HoneyHive(api_key=...)
  • Drop project= from all SDK calls (HoneyHive, HoneyHiveTracer.init, evaluate) and stop relying on HH_PROJECT/HONEYHIVE_PROJECT env vars — the backend infers project from the API key
  • Replace removed sub-clients (client.session, client.tools, client.projects) with their v1 equivalents where applicable
  • Update direct client API code from dict-style response access to typed-model attribute access
  • Review request validation changes if you build payloads for client.events, client.datapoints, client.datasets, or client.metrics
  • Rename ground_truths to ground_truth in datasets
  • Update evaluated functions to accept a datapoint dict
  • Update evaluators to use ground_truth and add a default (ground_truth=None) for unlabeled datapoints
  • Update LLM evaluator templates from feedback.ground_truths to feedback.ground_truth
  • Use tracer instance methods for new enrichment code
  • Migrate enrich_session(outputs=..., metrics=..., ...) free-function calls to tracer.enrich_session(...)
  • Update imports from honeyhive.evaluation to honeyhive.experiments (or the top-level honeyhive package)
  • Convert positional evaluate() arguments to keyword arguments and remove unsupported kwargs (suite, run_concurrently, disable_http_tracing, metadata)
  • Use create_session() or with_session() for request-scoped sessions
  • Review pinned OpenTelemetry or Traceloop instrumentor versions, if any
  • Run an evaluation and confirm table output, scores, and ground truth render correctly

Migrate from Logger to v1

Replace honeyhive-logger calls with SDK v1 tracing.

Migrate from honeyhive-bundled to v1

Replace the bundled package with the stable SDK.

Evaluation Quickstart

Run your first experiment with evaluate().

Tracer Initialization

Choose the right tracer and session pattern for your runtime.

Enriching Traces

Add metadata, metrics, feedback, and user properties to traces.