- Want to evaluate their data without our Python/TS utilities.
- Need to customize the dataset ingestion process.
- Want more control over how evaluation sessions are tracked.
Prerequisites
Before beginning, ensure the following:- You have setup the manual instrumentation, as explained here.
Evaluation Setup
You have two options for running evaluations: with HoneyHive-provided datasets or using external datasets. Both approaches share common steps, but the specifics differ slightly.Evaluating with External Datasets

API flow to log an evaluation with a self-managed dataset
-
Start the Evaluation Run
POST /runs: Initiate the evaluation run using external datasets. Full API reference link here.
No dataset ID is required in this case, as you will manually handle dataset ingestion. -
Fetch the Data
- Manually retrieve data points from your external dataset.
-
Session Initialization
POST /session/start: Start a new session for the evaluation run. Full API reference link here.
Set the following fields:metadata.run_id = run_id(use the ID from the evaluation run)
-
Log Your Events
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here.
Set the following:
session_id = session_id(use the ID from the session start)
- Make sure to pass any client side metrics on the relevant event to make later analysis more granular.
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here.
Set the following:
-
End the Evaluation
PUT /runs: Mark the evaluation as completed. Full API reference link here.
Set the following:event_ids: Provide a list of session IDs.status = completed
Evaluating with HoneyHive Datasets

API flow to log an evaluation with a HoneyHive-managed dataset
-
Fetch the Dataset
GET /datasets: Fetch the dataset you want to evaluate. This will provide thedataset_id. Full API reference link here.
-
Start the Evaluation Run
POST /runs: Initiate the evaluation run. Full API reference link here.
Set the following fields:dataset_id = dataset_id
-
Fetch the Data Points
GET /datapoint/{id}: Retrieve the specific data points to be used for evaluation. Full API reference link here.
-
Session Initialization
POST /session/start: Start a new session for the evaluation run. Full API reference link here.
Set the following fields:metadata.run_id = run_idmetadata.datapoint_id = datapoint_id
-
Log Your Events
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here.
Set the following:
session_id = session_id
- Make sure to pass any client side metrics on the relevant event to make later analysis more granular.
- Post your evaluation events using your preferred method (e.g., OpenTelemetry, batch endpoints, etc.). Detailed docs for that are avaliable here.
Set the following:
-
End the Evaluation
PUT /runs: Mark the evaluation as completed. Full API reference link here.
Set the following:event_ids: Provide a list of session IDs.status = completed

