Schema introspection
Every command below with arguments supports two read-only flags for tooling and AI agents:--show-file-schema: print the JSON Schema for the full request object (the format--filenameaccepts).--show-argument-schema <flag-name>: print the JSON Schema for one argument’s value. Pass the kebab flag name without the leading--(e.g.dataset-id, not--dataset-id).
list-runs
Get a list of evaluation runs
List experiment runs with optional filtering by dataset, status, name, date range, and specific run IDs. Results are paginated and sortable.
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--dataset-id | string | no | Filter by dataset ID |
--date-range | json | no | Filter by date range |
--limit | number | no | Number of results per page |
--name | string | no | Filter by run name |
--page | number | no | Page number for pagination |
--run-ids | json | no | List of specific run IDs to fetch |
--sort-by | string | no | Field to sort by Allowed: created_at, updated_at, name, status. |
--sort-order | string | no | Sort order Allowed: asc, desc. |
--status | string | no | Filter by run status Allowed: pending, completed, failed, cancelled, running. |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
create-run
Create a new evaluation run
Create a new experiment run to track an evaluation against a dataset.
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--configuration | json | no | configuration |
--datapoint-ids | json | no | datapoint_ids |
--dataset-id | string | no | dataset_id |
--description | string | no | description |
--evaluators | json | no | evaluators |
--event-ids | json | no | event_ids |
--metadata | json | no | metadata |
--name | string | no | name |
--passing-ranges | json | no | passing_ranges |
--results | json | no | results |
--run-id | string | no | run_id |
--session-ids | json | no | session_ids |
--status | string | no | status Allowed: pending, completed, failed, cancelled, running. |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
get-runs-schema
Get events schema across all experiment runs in a project
Retrieve the aggregated events schema (fields, datasets, mappings) across all experiment runs in the project.
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--date-range | json | no | Filter by date range |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
get-run
Get details of an evaluation run
Retrieve the full details of a single experiment run by its run ID.
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--run-id | string | yes | run_id |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
update-run
Update an evaluation run
Update fields on an existing experiment run such as name, status, metadata, or results.
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--run-id | string | yes | run_id |
--configuration | json | no | configuration |
--datapoint-ids | json | no | datapoint_ids |
--description | string | no | description |
--evaluators | json | no | evaluators |
--event-ids | json | no | event_ids |
--metadata | json | no | metadata |
--name | string | no | name |
--passing-ranges | json | no | passing_ranges |
--results | json | no | results |
--session-ids | json | no | session_ids |
--status | string | no | status Allowed: pending, completed, failed, cancelled, running. |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
delete-run
Delete an evaluation run
Permanently delete an experiment run by its run ID.
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--run-id | string | yes | run_id |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
get-run-schema
Get events schema for a single experiment run
Retrieve the events schema (fields, datasets, mappings) for a single experiment run.
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--run-id | string | yes | Experiment run ID (UUIDv4) |
--date-range | json | no | Filter by date range |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
get-run-metrics
Get event metrics for an experiment run
Retrieve event metrics from ClickHouse for a specific experiment run
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--run-id | string | yes | Experiment run ID (UUIDv4) |
--date-range | string | no | Date range filter as JSON string |
--filters | json | no | Optional filters to apply (JSON string or array of filter objects) |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
get-summary
Retrieve experiment summary
Compute evaluation summary for an experiment run: pass/fail results, metric aggregations, per-datapoint results, event details, and the experiment run object.
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--run-id | string | yes | Experiment run ID (UUIDv4) |
--aggregate-function | string | no | Aggregation function to apply to metrics Allowed: average, min, max, median, p95, p99, p90, sum, count. |
--filters | json | no | Optional filters to apply (JSON string or array of filter objects) |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
compare-runs
Retrieve experiment comparison
Compare metrics and results between two experiment runs
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--new-run-id | string | yes | New experiment run ID to compare (UUIDv4) |
--old-run-id | string | yes | Old experiment run ID to compare against (UUIDv4) |
--aggregate-function | string | no | Aggregation function to apply to metrics Allowed: average, min, max, median, p95, p99, p90, sum, count. |
--filters | json | no | Optional filters to apply (JSON string or array of filter objects) |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
compare-run-events
Compare events between two experiment runs
Retrieve and compare events between two experiment runs for detailed analysis
Usage
Options
| Flag | Type | Required | Description |
|---|---|---|---|
--new-run-id | string | yes | New experiment run ID (UUIDv4) |
--old-run-id | string | yes | Old experiment run ID to compare against (UUIDv4) |
--event-name | string | no | Filter by event name |
--event-type | string | no | Filter by event type |
--filter | json | no | Additional filter criteria (JSON string or object) |
--limit | number | no | Maximum number of results |
--page | number | no | Page number for pagination |
--show-file-schema, --show-argument-schema <flag-name>, and --filename. See Schema introspection for details.
