HoneyHive Docs

HoneyHive Python SDK v1.0 supports the HoneyHive v2 platform launch. Use this guide to upgrade v0.x projects to SDK v1 and adopt v2-ready tracing and evaluation patterns.

Use a coding agent to migrate

Use this HoneyHive migration guide: https://docs.honeyhive.ai/v2/sdk-reference/python/migration/v0-to-v1.md

Explore my codebase to find where the HoneyHive Python SDK is used. Ask me any questions needed to understand the project context, runtime, and migration constraints. Then present a concise migration plan for my confirmation. After I confirm the plan, implement the migration changes.

What changed

Area	v0.x	v1.0
Client constructor	`HoneyHive(bearer_auth=...)`	`HoneyHive(api_key=...)` (`project=` and `HH_PROJECT`/`HONEYHIVE_PROJECT` are deprecated no-ops)
Evaluation module path	`from honeyhive.evaluation import evaluate, evaluator`	`from honeyhive.experiments import evaluate, evaluator` (legacy path still works with a `DeprecationWarning`)
`evaluate()` arguments	Positional or keyword	Keyword-only after `function`
Dataset ground truth key	`ground_truths`	`ground_truth`
Client API responses	Nested dict-style access common	Typed response models use attribute access
Evaluated function input	Commonly `(inputs, ground_truths)`	A single `datapoint` dict
Evaluator parameters	`outputs, inputs, ground_truths`	`outputs, inputs, ground_truth` (or `ground_truth=None` to support unlabeled datapoints)
Async evaluated functions	Rejected (must be sync)	Supported (auto-detected)
Result display	Manual inspection	`print_results=True`, or `print_results=False` plus `result.print_table()`
Session isolation	Instance-level session state	`create_session()`, `acreate_session()`, and `with_session()` use OpenTelemetry baggage
Enrichment	Global helper functions common	Tracer instance methods recommended; `enrich_session()` free function now only accepts `metadata`

1. Upgrade the package

pip install --upgrade "honeyhive>=1.0.0"

Verify the installed version:

import honeyhive

print(honeyhive.__version__)

honeyhive.__version__ is new in SDK v1; on v0.x it raises AttributeError because the attribute was not exported. To detect the installed version without importing honeyhive, use:

from importlib.metadata import version

print(version("honeyhive"))

2. Update `HoneyHive` client initialization

The HoneyHive client constructor takes different keyword arguments in v1.

from honeyhive import HoneyHive

client = HoneyHive(
    bearer_auth="your-api-key",
    server_url="https://api.honeyhive.ai",
)

bearer_auth is renamed to api_key. Passing bearer_auth= to v1 raises TypeError.
project=, HH_PROJECT, and HONEYHIVE_PROJECT are all deprecated no-ops in v1.0+. HoneyHive(project=...) and HoneyHiveTracer.init(project=...) emit a DeprecationWarning and ignore the value; evaluate(project=...) accepts the kwarg silently but it also has no functional effect. The HH_PROJECT env var is still parsed for backward compatibility but unused. The backend infers project context from the API key — use the API key for the project you want to write to. All of these will be removed in v2.0.
server_url= still works; base_url= is also accepted as an alias in v1.

Some sub-clients were renamed or removed:

v0.x	v1.0
`client.session` (singular)	Removed. Use `client.sessions` (plural).
`client.tools`	Removed.
`client.projects`	Removed.
`client.events`, `client.datapoints`, `client.datasets`, `client.metrics`, `client.configurations`, `client.experiments`	Still available.
—	`client.sessions`, `client.evaluations`, `client.charts` are new in v1.

3. Update direct client API usage

If you use lower-level client APIs such as client.events, client.datapoints, client.datasets, or client.metrics, many nested request and response objects are typed Pydantic models in v1. For responses, replace dict-style access with attribute access:

new_ids = response.result["insertedIds"]
event_id = response.events[0]["event_id"]
metric_name = metrics[0]["name"]

If you still need the dict shape for logging or serialization, call model_dump() on the typed model:

payload = response.result.model_dump()
event_payload = response.events[0].model_dump()

For requests, nested dicts are still commonly coerced into the right typed models, but wrapper methods now validate payloads earlier. For example, a missing required event_type can now raise pydantic.ValidationError at request-construction time instead of surfacing only as a backend error after the request is sent. If your environment pins OpenTelemetry or Traceloop instrumentor versions, review those pins during the upgrade. SDK v1 requires newer minimum versions than older 0.x releases.

4. Use `ground_truth` in datasets

SDK v1 reads the singular ground_truth key from experiment datapoints. If your v0 datasets use ground_truths, rename that field in local datasets and dataset creation code. This is a field-name change, not a restriction on the value shape. If you previously stored multiple expected values inside ground_truths, keep the same inner structure under ground_truth.

dataset = [
    {
        "inputs": {"query": "What is the capital of France?"},
        "ground_truths": {"answer": "Paris"},
    },
    {
        "inputs": {"query": "What is 2+2?"},
        "ground_truths": {"answer": "4"},
    },
]

5. Update evaluated functions

In v1, evaluate() passes the full datapoint to your function. Read inputs and ground_truth from that datapoint.

def classify_intent(inputs, ground_truths):
    text = inputs["text"]
    return {"intent": classify(text)}

evaluate(
    function=classify_intent,
    dataset=dataset,
    name="intent-classifier",
    project="support-bot",
)

If your function needs the active tracer created by evaluate(), add a tracer keyword parameter: def classify_intent(datapoint, tracer): ....

SDK v1 detects async def functions automatically and runs them with asyncio.run per datapoint. You can pass them to evaluate() directly; v0 raised an error and required you to call asyncio.run yourself inside a sync wrapper.

6. Update evaluator parameters

In v1, evaluators can be plain functions. @evaluator is still accepted for compatibility but no longer required. Rename the third parameter to ground_truth so it matches the v1 dataset shape.

from honeyhive import evaluator

@evaluator
def accuracy_check(outputs, inputs, ground_truths):
    expected = ground_truths["intent"]
    actual = outputs["intent"]
    return {
        "score": 1.0 if actual == expected else 0.0,
        "pass": actual == expected,
    }

@evaluator remains available for compatibility, but plain functions can be passed directly in the evaluators list.

In v1, evaluators are invoked with only (outputs, inputs) when a datapoint has no ground_truth. Declare the third parameter with a default (ground_truth=None) so your evaluator also runs on unlabeled datapoints; without a default, those calls raise TypeError.

Evaluators should return either a scalar (bool, int, float, str) or a flat dict whose score, explanation, and any extras are scalars. Nested or non-scalar values are dropped (with a warning) so they cannot silently distort run-comparison diffs.

7. Update LLM evaluator templates

If you reference ground truth in evaluator prompts, update template variables.

prompt = """
Compare the output to the expected answer.

Expected: {{feedback.ground_truths.answer}}
Actual: {{outputs.answer}}
"""

8. Use the v1 result summary

evaluate() returns an experiment result summary and prints a formatted table by default. If you want to render the table yourself, pass print_results=False and call result.print_table(...) later.

from honeyhive import evaluate

result = evaluate(
    function=classify_intent,
    dataset=dataset,
    evaluators=[accuracy_check],
    name="intent-classifier",
    print_results=False,
)

result.print_table(run_name="intent-classifier")

If you do not need manual control, omit print_results=False and let evaluate() print the table automatically.

9. Update enrichment calls

Free functions such as enrich_span() still work in v1, but instance methods are the primary pattern because they are explicit and work better with multiple tracers.

from honeyhive import enrich_span, trace

@trace
def retrieve_documents(query):
    docs = vector_search(query)
    enrich_span(
        metadata={"query": query},
        outputs={"documents": docs},
    )
    return docs

The v1 enrich_session() free function has a much narrower signature than its v0 counterpart:

def enrich_session(
    session_id: str,                        # now required, no default
    metadata: Optional[Dict[str, Any]] = None,
    tracer: Optional[Any] = None,
    tracer_instance: Optional[Any] = None,
) -> None:

Two common v0 idioms break:

enrich_session(metadata={...}) (no session_id) raises TypeError: enrich_session() missing 1 required positional argument: 'session_id'. In v0, session_id was resolved from a singleton tracer; in v1 you must pass it explicitly.
enrich_session(outputs=..., metrics=..., feedback=..., inputs=..., config=..., user_properties=...) raises TypeError because those kwargs were removed.

Switch these call sites to the tracer instance method, tracer.enrich_session(...), which resolves the active session internally and supports every enrichment namespace.

Use explicit namespaces for structured data:

Namespace	Use for
`metadata`	Custom searchable context
`metrics`	Scores, latency, token counts, numeric measurements
`feedback`	Ratings, labels, corrections, ground truth
`inputs`	Span or session input payloads
`outputs`	Span or session output payloads
`config`	Model and application configuration
`user_properties`	User attributes

See Enriching Traces and Enrichment Schema.

If you do not want to thread tracer=tracer through every call site, set a process-wide default once at startup:

from honeyhive import HoneyHiveTracer, set_default_tracer

tracer = HoneyHiveTracer.init(api_key="your-api-key")
set_default_tracer(tracer)

After that, free functions like enrich_span() and @trace resolve to that tracer automatically.

10. Replace manual session state in concurrent apps

If your v0 integration reused one tracer across many requests, create request-scoped sessions in v1.

from fastapi import FastAPI, Request
from honeyhive import HoneyHiveTracer, trace

app = FastAPI()
tracer = HoneyHiveTracer.init(api_key="your-api-key")

@app.middleware("http")
async def honeyhive_session(request: Request, call_next):
    await tracer.acreate_session(
        session_name=f"{request.method} {request.url.path}",
        inputs={"path": str(request.url.path)},
    )
    response = await call_next(request)
    tracer.enrich_session(outputs={"status_code": response.status_code})
    return response

@app.post("/chat")
@trace(event_type="chain", tracer=tracer)
async def chat(payload: dict):
    return await answer(payload["message"])

For scoped scripts, use with_session():

with tracer.with_session("batch-job", inputs={"file": "tickets.csv"}):
    run_batch_job()

See Tracer Initialization and Multi-instance Tracing.

11. Simplify distributed tracing

v1 includes helpers for carrying OpenTelemetry trace context across services.

from opentelemetry import context
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator

def handle_request(request):
    propagator = TraceContextTextMapPropagator()
    ctx = propagator.extract(request.headers)
    token = context.attach(ctx)
    try:
        return process_request()
    finally:
        context.detach(token)

See Distributed Tracing.

12. Update imports from `honeyhive.evaluation` to `honeyhive.experiments`

The honeyhive.evaluation module is deprecated in v1 and will be removed in v2.0. Imports from it still work but emit a DeprecationWarning. Update the import path:

from honeyhive.evaluation import evaluate, evaluator, aevaluator

13. Update `evaluate()` call signature

In v1, every argument to evaluate() after function is keyword-only. If you previously passed any of dataset, evaluators, name, or project positionally, switch to keyword arguments. The following v0 evaluate() parameters no longer exist in v1 and will raise TypeError if passed:

suite
run_concurrently
disable_http_tracing
metadata

v1 adds these parameters:

instrumentors: list of zero-argument factories returning OTEL instrumentor instances (e.g. [lambda: OpenAIInstrumentor()]); a fresh instance is created per datapoint.
run_id: optional client-supplied run identifier.
aggregate_function: backend aggregation function name ("average", "sum", "min", "max").

evaluate() reads HONEYHIVE_API_KEY / HH_API_KEY for api_key. The project= keyword and any *_PROJECT environment variables are deprecated no-ops everywhere in the v1 SDK — the backend infers project context from the API key. Drop project= from your evaluate(), HoneyHive(), and HoneyHiveTracer.init() calls; use the API key for the project you want to write to. See section 2 for the full deprecation picture.

Troubleshooting

Ground truth does not appear in HoneyHive

Check that every datapoint uses ground_truth, not only ground_truths. SDK v1 reads ground_truth from each datapoint.

dataset = [{"inputs": {"text": "charge issue"}, "ground_truth": {"intent": "billing"}}]

Dict-style client response access breaks after upgrading

If you use direct client APIs, switch from subscripts to attributes on typed response models:

new_ids = response.result.insertedIds
event_id = response.events[0].event_id

Use model_dump() only when you need a plain dict for serialization or an external boundary.

`TypeError: HoneyHive.init() got an unexpected keyword argument 'bearer_auth'`

Replace HoneyHive(bearer_auth=...) with HoneyHive(api_key=...). See Update HoneyHive client initialization.

`TypeError: enrich_session() got an unexpected keyword argument 'outputs'`

The v1 enrich_session() free function only accepts metadata. Switch to the tracer instance method, tracer.enrich_session(outputs=..., metrics=..., ...). See Update enrichment calls.

Evaluators do not receive ground truth

Use the singular evaluator parameter:

def my_evaluator(outputs, inputs, ground_truth=None):
    if ground_truth is None:
        return {"score": 0.0}
    return {"score": outputs["answer"] == ground_truth["answer"]}

Use create_session() or acreate_session() per request instead of mutating tracer.session_id or relying on session_start() in a shared server process.

Import errors after upgrading

Confirm the runtime is using the upgraded package:

from importlib.metadata import version

print(version("honeyhive"))

Migration checklist

Migrate from Logger to v1

Replace honeyhive-logger calls with SDK v1 tracing.

Migrate from honeyhive-bundled to v1

Replace the bundled package with the stable SDK.

Evaluation Quickstart

Run your first experiment with evaluate().

Tracer Initialization

Choose the right tracer and session pattern for your runtime.

Enriching Traces

Add metadata, metrics, feedback, and user properties to traces.

Documentation Index

​What changed

​1. Upgrade the package

​2. Update HoneyHive client initialization

​3. Update direct client API usage

​4. Use ground_truth in datasets

​5. Update evaluated functions

​6. Update evaluator parameters

​7. Update LLM evaluator templates

​8. Use the v1 result summary

​9. Update enrichment calls

​10. Replace manual session state in concurrent apps

​11. Simplify distributed tracing

​12. Update imports from honeyhive.evaluation to honeyhive.experiments

​13. Update evaluate() call signature

​Troubleshooting

​Ground truth does not appear in HoneyHive

​Dict-style client response access breaks after upgrading

​TypeError: HoneyHive.__init__() got an unexpected keyword argument 'bearer_auth'

​TypeError: enrich_session() got an unexpected keyword argument 'outputs'

​Evaluators do not receive ground truth

​Traces from different requests share one session

​Import errors after upgrading

​Migration checklist

​Related

Migrate from Logger to v1

Migrate from honeyhive-bundled to v1

Evaluation Quickstart

Tracer Initialization

Enriching Traces

What changed

1. Upgrade the package

2. Update `HoneyHive` client initialization

3. Update direct client API usage

4. Use `ground_truth` in datasets

5. Update evaluated functions

6. Update evaluator parameters

7. Update LLM evaluator templates

8. Use the v1 result summary

9. Update enrichment calls

10. Replace manual session state in concurrent apps

11. Simplify distributed tracing

12. Update imports from `honeyhive.evaluation` to `honeyhive.experiments`

13. Update `evaluate()` call signature

Troubleshooting

Ground truth does not appear in HoneyHive

Dict-style client response access breaks after upgrading

`TypeError: HoneyHive.init() got an unexpected keyword argument 'bearer_auth'`

`TypeError: enrich_session() got an unexpected keyword argument 'outputs'`

Evaluators do not receive ground truth

Traces from different requests share one session

Import errors after upgrading

Migration checklist

Related