HoneyHive Docs

This method is designed for users who:

Want to evaluate their data without our Python/TS utilities.
Need to customize the dataset ingestion process.
Want more control over how evaluation sessions are tracked.

You can directly use our APIs to track your evaluation runs and sessions, enabling flexibility in how you set up and execute evaluations.

Where possible, we recommend using HoneyHive datasets to simplify ingestion, improve linking and reduce the overhead of manual data management.

Prerequisites

Before beginning, ensure the following:

You have setup the manual instrumentation, as explained here.

Evaluation Setup

You have two options for running evaluations: with HoneyHive-provided datasets or using external datasets. Both approaches share common steps, but the specifics differ slightly.

Evaluating with External Datasets

API flow to log an evaluation with a self-managed dataset

For evaluations using external datasets, follow these steps:

Start the Evaluation Run
- POST /runs: Initiate the evaluation run using external datasets. Full API reference link here.
Payload:
No dataset ID is required in this case, as you will manually handle dataset ingestion.
Fetch the Data
- Manually retrieve data points from your external dataset.
Session Initialization
- POST /session/start: Start a new session for the evaluation run. Full API reference link here.
  Set the following fields:
  - metadata.run_id = run_id (use the ID from the evaluation run)
Log Your Events
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here. Set the following:
  - session_id = session_id (use the ID from the session start)
- Make sure to pass any client side metrics on the relevant event to make later analysis more granular.
End the Evaluation
- PUT /runs: Mark the evaluation as completed. Full API reference link here.
  Set the following:
  - event_ids: Provide a list of session IDs.
  - status = completed

Evaluating with HoneyHive Datasets

API flow to log an evaluation with a HoneyHive-managed dataset

For evaluations using datasets provided by HoneyHive, the process is simpler:

Fetch the Dataset
- GET /datasets: Fetch the dataset you want to evaluate. This will provide the dataset_id. Full API reference link here.
Start the Evaluation Run
- POST /runs: Initiate the evaluation run. Full API reference link here.
  Set the following fields:
  - dataset_id = dataset_id
Fetch the Data Points
- GET /datapoint/{id}: Retrieve the specific data points to be used for evaluation. Full API reference link here.
Session Initialization
- POST /session/start: Start a new session for the evaluation run. Full API reference link here.
  Set the following fields:
  - metadata.run_id = run_id
  - metadata.datapoint_id = datapoint_id
Log Your Events
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here. Set the following:
  - session_id = session_id
- Make sure to pass any client side metrics on the relevant event to make later analysis more granular.
End the Evaluation
- PUT /runs: Mark the evaluation as completed. Full API reference link here.
  Set the following:
  - event_ids: Provide a list of session IDs.
  - status = completed

Conclusion

Manual evaluation instrumentation allows for flexibility in how you handle your datasets and evaluation sessions. Whether using external datasets or those provided by HoneyHive, the key steps remain the same: initiating the run, starting sessions, logging events, and finalizing the evaluation. If you have any questions or need help, reach out to our support team for assistance with logging your evaluation data to HoneyHive.

Introduction

Logger Reference

Tracer Reference

Experiments Reference

SDK Reference

Manual Evaluation

Prerequisites

Evaluation Setup

Evaluating with External Datasets

Evaluating with HoneyHive Datasets

Conclusion

Introduction

Logger Reference

Tracer Reference

Experiments Reference

SDK Reference

​Prerequisites

​Evaluation Setup

​Evaluating with External Datasets

​Evaluating with HoneyHive Datasets

​Conclusion

Prerequisites

Evaluation Setup

Evaluating with External Datasets

Evaluating with HoneyHive Datasets

Conclusion