Reference documentation for the evaluate function
evaluate
function is a core utility designed to orchestrate automated evaluations through HoneyHive’s infrastructure. It provides systematic testing, tracing, and metrics collection capabilities for any TypeScript/JavaScript function, with particular emphasis on AI model evaluation, data processing pipelines, and performance analysis.
The evaluation framework manages the complete lifecycle of an evaluation run, from initialization through execution to completion, while integrating with HoneyHive’s tracing system for comprehensive telemetry capture. A detailed explanation of how tracing works in Typescript can be found here.
Function
):
The function to evaluate. The function parameters are positional arguments and must be specified in this order: (1) an inputs object, (2) an optional ground truth object, and return a serializable output.
string
):
API key for authenticating with HoneyHive services.
string
):
Project identifier in HoneyHive.
string
):
Identifier for this evaluation run.
string
, optional):
Name of the evaluation suite. If not provided, uses the directory name of the calling script.
number
, optional):
Maximum number of concurrent workers for parallel evaluation. Defaults to 10.
boolean
, optional):
Whether to run evaluations concurrently. Defaults to false.
string
, optional):
Custom server URL for HoneyHive API.
boolean
, optional):
Whether to print detailed logs during evaluation. Defaults to false.
boolean
, optional):
Whether to disable automatic HTTP request tracing. Defaults to false.
Record<string, any>
, optional):
Additional metadata to attach to the evaluation run.
Record<string, any>
, optional):
Modules to instrument for automatic tracing.
string
, optional):
ID of an existing HoneyHive dataset to use for evaluation inputs.
Record<string, any>[]
, optional):
List of input objects to evaluate against. Alternative to using a dataset.
Function[]
, optional):
List of evaluator functions. The function parameters are positional arguments and must be specified in this order: (1) outputs, (2) inputs, (3) and ground truths to generate metrics.
dataset_id
or dataset
must be provided