- Want to evaluate their data without our Python/TS utilities.
- Need to customize the dataset ingestion process.
- Want more control over how evaluation sessions are tracked.
Where possible, we recommend using HoneyHive datasets to simplify ingestion, improve linking and reduce the overhead of manual data management.
Prerequisites
Before beginning, ensure the following:- You have setup the manual instrumentation, as explained here.
Evaluation Setup
You have two options for running evaluations: with HoneyHive-provided datasets or using external datasets. Both approaches share common steps, but the specifics differ slightly.Evaluating with External Datasets

API flow to log an evaluation with a self-managed dataset
-
Start the Evaluation Run
POST /runs
: Initiate the evaluation run using external datasets. Full API reference link here.
No dataset ID is required in this case, as you will manually handle dataset ingestion. -
Fetch the Data
- Manually retrieve data points from your external dataset.
-
Session Initialization
POST /session/start
: Start a new session for the evaluation run. Full API reference link here.
Set the following fields:metadata.run_id = run_id
(use the ID from the evaluation run)
-
Log Your Events
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here.
Set the following:
session_id = session_id
(use the ID from the session start)
- Make sure to pass any client side metrics on the relevant event to make later analysis more granular.
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here.
Set the following:
-
End the Evaluation
PUT /runs
: Mark the evaluation as completed. Full API reference link here.
Set the following:event_ids
: Provide a list of session IDs.status = completed
Evaluating with HoneyHive Datasets

API flow to log an evaluation with a HoneyHive-managed dataset
-
Fetch the Dataset
GET /datasets
: Fetch the dataset you want to evaluate. This will provide thedataset_id
. Full API reference link here.
-
Start the Evaluation Run
POST /runs
: Initiate the evaluation run. Full API reference link here.
Set the following fields:dataset_id = dataset_id
-
Fetch the Data Points
GET /datapoint/{id}
: Retrieve the specific data points to be used for evaluation. Full API reference link here.
-
Session Initialization
POST /session/start
: Start a new session for the evaluation run. Full API reference link here.
Set the following fields:metadata.run_id = run_id
metadata.datapoint_id = datapoint_id
-
Log Your Events
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here.
Set the following:
session_id = session_id
- Make sure to pass any client side metrics on the relevant event to make later analysis more granular.
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here.
Set the following:
-
End the Evaluation
PUT /runs
: Mark the evaluation as completed. Full API reference link here.
Set the following:event_ids
: Provide a list of session IDs.status = completed