Python
Reference documentation for the evaluate function
The evaluate
function is a core utility designed to orchestrate automated evaluations through HoneyHive’s infrastructure. It provides systematic testing, tracing, and metrics collection capabilities for any Python function, particularly useful for evaluating AI model outputs, data processing pipelines, or any computational process requiring detailed performance analysis.
The evaluation framework integrates with HoneyHive’s tracing system to capture detailed telemetry about each evaluation run, including inputs, outputs, metrics, and runtime metadata. A detailed explanation of how tracing works in Python can be found here.
Example Usage
Function Signature
Parameters
Required Parameters
- function (
Callable[[Dict[str, Any]], Any]
): The evaluation function to be tested. Must accept a dictionary of inputs and return a serializable output. This function will be executed for each datapoint in the dataset or query list.
Optional Parameters
-
hh_api_key (
str
, optional): API key for authenticating with HoneyHive services. If not provided, falls back toHH_API_KEY
environment variable. -
hh_project (
str
, optional): Project identifier in HoneyHive. If not provided, falls back toHH_PROJECT
environment variable. -
name (
str
, optional): Identifier for this evaluation run. Used in HoneyHive’s tracing and run management. -
dataset_id (
str
, optional): ID of an existing HoneyHive dataset to use for evaluation inputs. Mutually exclusive withquery_list
. -
query_list (
List[Dict[str, Any]]
, optional): List of input dictionaries to evaluate against. Alternative to using a HoneyHive dataset. -
runs (
int
, optional): Number of evaluation iterations to perform. Defaults to the length of dataset or query list. -
evaluators (
List[Callable[[Dict[str, Any], Any], Dict[str, Any]]]
, optional): List of evaluator functions that process inputs and outputs to generate metrics. Each evaluator should return a dictionary of metrics.
Return Value
Returns a dictionary containing evaluation metadata:
Technical Notes
-
Execution Flow
- Validates input parameters and credentials
- Initializes HoneyHive tracing session
- Processes dataset or query list sequentially
- Executes evaluation function for each input
- Runs evaluators on function outputs
- Collects and stores metrics
- Returns evaluation metadata
-
Dataset Processing
- HoneyHive datasets are fetched via API and processed automatically
- Query lists are assigned a generated external dataset ID using MD5 hashing
- Each datapoint/query is processed sequentially
- Supports partial completion on failure
-
Tracing Integration
- Automatically initializes HoneyHiveTracer for each evaluation
- Captures:
- Input parameters
- Function outputs
- Evaluator metrics
- Runtime metadata
- Error states
- Links all sessions to a single evaluation run
-
Error Management
- Validates all required parameters before execution
- Handles API communication errors gracefully
- Preserves partial results on failure
- Maintains evaluation run status
- Logs detailed error information
Notes
- The evaluation framework requires either
dataset_id
orquery_list
to be provided - HoneyHive credentials (API key and project) must be available either as parameters or environment variables
- Evaluator functions must handle both inputs and outputs and return a dictionary of metrics
- All evaluation runs are automatically traced using HoneyHiveTracer