TypeScript
Reference documentation for the evaluate function
The evaluate
function is a core utility designed to orchestrate automated evaluations through HoneyHive’s infrastructure. It provides systematic testing, tracing, and metrics collection capabilities for any TypeScript/JavaScript function, with particular emphasis on AI model evaluation, data processing pipelines, and performance analysis.
The evaluation framework manages the complete lifecycle of an evaluation run, from initialization through execution to completion, while integrating with HoneyHive’s tracing system for comprehensive telemetry capture. A detailed explanation of how tracing works in Typescript can be found here.
Example Usage
Function Signature and Interfaces
Parameters
Required Parameters
-
function (
Function
): The function to evaluate. The function parameters are positional arguments and must be specified in this order: (1) an inputs object, (2) an optional ground truth object, and return a serializable output. -
apiKey (
string
): API key for authenticating with HoneyHive services. -
project (
string
): Project identifier in HoneyHive. -
name (
string
): Identifier for this evaluation run. -
suite (
string
, optional): Name of the evaluation suite. If not provided, uses the directory name of the calling script. -
maxWorkers (
number
, optional): Maximum number of concurrent workers for parallel evaluation. Defaults to 10. -
runConcurrently (
boolean
, optional): Whether to run evaluations concurrently. Defaults to false. -
serverUrl (
string
, optional): Custom server URL for HoneyHive API. -
verbose (
boolean
, optional): Whether to print detailed logs during evaluation. Defaults to false. -
disableHttpTracing (
boolean
, optional): Whether to disable automatic HTTP request tracing. Defaults to false. -
metadata (
Record<string, any>
, optional): Additional metadata to attach to the evaluation run. -
instrumentModules (
Record<string, any>
, optional): Modules to instrument for automatic tracing.
Optional Parameters
-
datasetId (
string
, optional): ID of an existing HoneyHive dataset to use for evaluation inputs. -
dataset (
Record<string, any>[]
, optional): List of input objects to evaluate against. Alternative to using a dataset. -
evaluators (
Function[]
, optional): List of evaluator functions. The function parameters are positional arguments and must be specified in this order: (1) outputs, (2) inputs, (3) and ground truths to generate metrics.
Return Value
Returns a Promise that resolves to an evaluation result object:
Technical Notes
-
Execution Flow
- Validates configuration requirements and credentials
- Initializes evaluation state and HoneyHive client
- Loads dataset (HoneyHive dataset or generates ID for in-code datasets)
- Creates evaluation run in HoneyHive
- For each iteration:
- Retrieves input data from dataset
- Initializes HoneyHiveTracer for the iteration
- Executes evaluation function with inputs
- Runs evaluators on function outputs
- Enriches trace with metadata and metrics
- Collects session ID
- Updates evaluation status to completed
- Returns evaluation metadata
-
Dataset Processing
- Supports both HoneyHive datasets and external datasets
- Generates MD5 hashes for external datasets
- Handles datapoint fetching and validation
- Manages dataset linkage in traces
-
Tracing Integration
- Creates individual trace sessions per evaluation
- Captures:
- Input/output pairs
- Evaluator metrics
- Runtime metadata
- Dataset linkage
- Automatically flushes traces after each run
-
Error Management
- Validates configuration requirements
- Handles API communication errors
- Manages evaluator failures independently
- Preserves partial results on failure
Notes
- Either
dataset_id
ordataset
must be provided - External datasets are automatically assigned a dataset ID
- Evaluator functions should handle both inputs and outputs
- All evaluation runs are automatically traced using HoneyHiveTracer
- Evaluation status is updated to reflect completion or failure
Was this page helpful?