TypeScript
Reference documentation for the evaluate function
The evaluate
function is a core utility designed to orchestrate automated evaluations through HoneyHive’s infrastructure. It provides systematic testing, tracing, and metrics collection capabilities for any TypeScript/JavaScript function, with particular emphasis on AI model evaluation, data processing pipelines, and performance analysis.
The evaluation framework manages the complete lifecycle of an evaluation run, from initialization through execution to completion, while integrating with HoneyHive’s tracing system for comprehensive telemetry capture. A detailed explanation of how tracing works in Typescript can be found here.
Example Usage
Function Signature and Interfaces
Parameters
Required Parameters
-
evaluationFunction (
Function
): The function to evaluate. Must accept an inputs object and return a serializable output. -
hh_api_key (
string
): API key for authenticating with HoneyHive services. -
hh_project (
string
): Project identifier in HoneyHive. -
name (
string
): Identifier for this evaluation run.
Optional Parameters
-
dataset_id (
string
, optional): ID of an existing HoneyHive dataset to use for evaluation inputs. -
query_list (
Record<string, any>[]
, optional): List of input objects to evaluate against. Alternative to using a dataset. -
runs (
number
, optional): Number of evaluation iterations. Defaults to dataset/query_list length. -
evaluators (
Function[]
, optional): List of evaluator functions that process inputs and outputs to generate metrics.
Return Value
Returns a Promise that resolves to an evaluation result object:
Technical Notes
-
Execution Flow
- Validates configuration requirements and credentials
- Initializes evaluation state and HoneyHive client
- Loads dataset (HoneyHive dataset or generates ID for query list)
- Creates evaluation run in HoneyHive
- For each iteration:
- Retrieves input data from dataset or query list
- Initializes HoneyHiveTracer for the iteration
- Executes evaluation function with inputs
- Runs evaluators on function outputs
- Enriches trace with metadata and metrics
- Collects session ID
- Updates evaluation status to completed
- Returns evaluation metadata
-
Dataset Processing
- Supports both HoneyHive datasets and external query lists
- Generates MD5 hashes for external datasets
- Handles datapoint fetching and validation
- Manages dataset linkage in traces
-
Tracing Integration
- Creates individual trace sessions per evaluation
- Captures:
- Input/output pairs
- Evaluator metrics
- Runtime metadata
- Dataset linkage
- Automatically flushes traces after each run
-
Error Management
- Validates configuration requirements
- Handles API communication errors
- Manages evaluator failures independently
- Preserves partial results on failure
Notes
- Either
dataset_id
orquery_list
must be provided - External query lists are automatically assigned a dataset ID
- Evaluator functions should handle both inputs and outputs
- All evaluation runs are automatically traced using HoneyHiveTracer
- Evaluation status is updated to reflect completion or failure