# Create a new configuration Source: https://docs.honeyhive.ai/api-reference/configurations/create-a-new-configuration post /configurations # Delete a configuration Source: https://docs.honeyhive.ai/api-reference/configurations/delete-a-configuration delete /configurations/{id} # Retrieve a list of configurations Source: https://docs.honeyhive.ai/api-reference/configurations/retrieve-a-list-of-configurations get /configurations # Update an existing configuration Source: https://docs.honeyhive.ai/api-reference/configurations/update-an-existing-configuration put /configurations/{id} # Create a new datapoint Source: https://docs.honeyhive.ai/api-reference/datapoints/create-a-new-datapoint post /datapoints # Delete a specific datapoint Source: https://docs.honeyhive.ai/api-reference/datapoints/delete-a-specific-datapoint delete /datapoints/{id} # Retrieve a list of datapoints Source: https://docs.honeyhive.ai/api-reference/datapoints/retrieve-a-list-of-datapoints get /datapoints # Retrieve a specific datapoint Source: https://docs.honeyhive.ai/api-reference/datapoints/retrieve-a-specific-datapoint get /datapoints/{id} # Update a specific datapoint Source: https://docs.honeyhive.ai/api-reference/datapoints/update-a-specific-datapoint put /datapoints/{id} # Add datapoints to a dataset Source: https://docs.honeyhive.ai/api-reference/datasets/add-datapoints-to-a-dataset post /datasets/{dataset_id}/datapoints # Create a dataset Source: https://docs.honeyhive.ai/api-reference/datasets/create-a-dataset post /datasets # Delete a dataset Source: https://docs.honeyhive.ai/api-reference/datasets/delete-a-dataset delete /datasets # Get datasets Source: https://docs.honeyhive.ai/api-reference/datasets/get-datasets get /datasets # Update a dataset Source: https://docs.honeyhive.ai/api-reference/datasets/update-a-dataset put /datasets # Create a batch of events Source: https://docs.honeyhive.ai/api-reference/events/create-a-batch-of-events post /events/batch Please refer to our instrumentation guide for detailed information # Create a batch of model events Source: https://docs.honeyhive.ai/api-reference/events/create-a-batch-of-model-events post /events/model/batch Please refer to our instrumentation guide for detailed information # Create a new event Source: https://docs.honeyhive.ai/api-reference/events/create-a-new-event post /events Create a new event (span) within a session trace. The request body wraps the event in a JSON-encoded string under the `event` key. **Required properties** within the JSON-encoded event string: - `event_type` (string) — Must be one of: `chain`, `model`, `tool`, `session`. - `inputs` (object) — Input data for the event. **Auto-generated properties** (provided by the server when omitted): - `event_id` (string, UUID) — Unique identifier for the event. - `session_id` (string, UUID) — Session/trace identifier. - `parent_id` (string, UUID) — Parent event ID. Defaults to `session_id`. **Optional properties with defaults:** - `event_name` (string) — Name of the event. Defaults to `"unknown"`. - `source` (string) — Source of the event (e.g. `sdk-python`). Defaults to `"unknown"`. **Optional properties:** - `config` (object) — Configuration data (e.g. model parameters, prompt templates). - `outputs` (object) — Output data from the event. - `error` (string or null) — Error message if the event failed. - `children_ids` (array of strings) — IDs of child events. - `duration` (number) — Duration of the event in milliseconds. - `start_time` (number) — Unix timestamp in milliseconds for event start. - `end_time` (number) — Unix timestamp in milliseconds for event end. - `metadata` (object) — Additional metadata (e.g. token counts, cost). - `metrics` (object) — Custom metrics. - `feedback` (object) — Feedback data (e.g. ratings, ground truth). - `user_properties` (object) — User properties associated with the event. # Create a new model event Source: https://docs.honeyhive.ai/api-reference/events/create-a-new-model-event post /events/model Please refer to our instrumentation guide for detailed information # Retrieve events based on filters Source: https://docs.honeyhive.ai/api-reference/events/retrieve-events-based-on-filters post /events/export # Update an event Source: https://docs.honeyhive.ai/api-reference/events/update-an-event put /events # Create a new evaluation run Source: https://docs.honeyhive.ai/api-reference/experiments/create-a-new-evaluation-run post /runs # Delete an evaluation run Source: https://docs.honeyhive.ai/api-reference/experiments/delete-an-evaluation-run delete /runs/{run_id} # Get a list of evaluation runs Source: https://docs.honeyhive.ai/api-reference/experiments/get-a-list-of-evaluation-runs get /runs # Get details of an evaluation run Source: https://docs.honeyhive.ai/api-reference/experiments/get-details-of-an-evaluation-run get /runs/{run_id} # Retrieve experiment comparison Source: https://docs.honeyhive.ai/api-reference/experiments/retrieve-experiment-comparison get /runs/{run_id_1}/compare-with/{run_id_2} # Retrieve experiment result Source: https://docs.honeyhive.ai/api-reference/experiments/retrieve-experiment-result get /runs/{run_id}/result # Update an evaluation run Source: https://docs.honeyhive.ai/api-reference/experiments/update-an-evaluation-run put /runs/{run_id} # Create a new metric Source: https://docs.honeyhive.ai/api-reference/metrics/create-a-new-metric post /metrics Add a new metric # Delete a metric Source: https://docs.honeyhive.ai/api-reference/metrics/delete-a-metric delete /metrics Remove a metric # Get all metrics Source: https://docs.honeyhive.ai/api-reference/metrics/get-all-metrics get /metrics Retrieve a list of all metrics # Update an existing metric Source: https://docs.honeyhive.ai/api-reference/metrics/update-an-existing-metric put /metrics Edit a metric # Create a new project Source: https://docs.honeyhive.ai/api-reference/projects/create-a-new-project post /projects # Delete a project Source: https://docs.honeyhive.ai/api-reference/projects/delete-a-project delete /projects # Get a list of projects Source: https://docs.honeyhive.ai/api-reference/projects/get-a-list-of-projects get /projects # Update an existing project Source: https://docs.honeyhive.ai/api-reference/projects/update-an-existing-project put /projects # Retrieve a session Source: https://docs.honeyhive.ai/api-reference/session/retrieve-a-session get /session/{session_id} # Start a new session Source: https://docs.honeyhive.ai/api-reference/session/start-a-new-session post /session/start Start a new session. The session field should contain a JSON-encoded session object as a string. # Product Updates Source: https://docs.honeyhive.ai/changelog/changelog New updates and improvements to our core platform and SDKs. ## Core Platform ### Experiments Dashboard Visualize metric trends across all your experiments in a single unified view. HoneyHive Experiments The new Experiments dashboard provides comprehensive visibility into how changes affect your AI application's quality over time: View and compare metrics across 100+ experiments simultaneously. See results from experiments using different prompts, models, and retrieval parameters side-by-side. Identify when changes negatively impact your application's quality metrics. Metric trends make it easy to spot regressions at a glance. Track how sweeps across different configurations (prompts, models, retrieval parameters) impact performance over time. Analyze experiment results without jumping between individual experiment pages. All your experiment data in one place for faster, data-driven decision making. [Try it today →](https://app.honeyhive.ai/) ### Annotation Queues Automated trace collection and streamlined human evaluation workflows. HoneyHive Annotation Queues Configure filters to automatically add traces matching specific criteria to annotation queues. The system continuously runs in the background, identifying traces that need human review. Domain experts can evaluate traces based on predefined criteria fields. Use ← → arrow keys for quick navigation between events during high-volume annotation tasks. Build high-quality datasets and maintain consistent human oversight of your AI applications with organized evaluation workflows. ## Core Platform ### Improved Evaluators UX New Evaluators UX Redesigned evaluator creation interface that combines evaluator configuration and editor into a single unified view. Configure evaluator parameters and edit evaluation logic in one place, eliminating the need to switch between multiple views. This streamlined workflow reduces context switching when creating and managing metrics. ## Core Platform ### New Evaluator Templates Expanded evaluator templates library with 11 new pre-built templates for common evaluation patterns. | Category | Evaluators | | -------------------------- | ---------------------------------------------------------------------------------------------------------------- | | **Agent Evaluation** | • Chain-of-Thought Faithfulness
• Plan Coverage
• Trajectory Plan Faithfulness
• Failure Recovery | | **Safety** | • Policy Compliance
• Harm Avoidance | | **RAG** | • Context Coverage | | **Text Evaluation** | • Tone Appropriateness | | **Translation** | • Translation Fluency | | **Code Generation** | • Compilation Success | | **Classification Metrics** | • Precision/Recall/F1 Metrics | Quick-start your evaluations with production-ready templates that follow best practices for various AI application use cases.
## Core Platform ### Improved Review Mode Enhanced context indicators in Review Mode that clearly show which output type you're evaluating. Improved Review Mode The UI now explicitly indicates whether you're providing reviews on: Evaluate individual LLM responses with clear context about the model being reviewed. Review end-to-end agent interactions and complete conversation flows. Assess function and API call results with full execution context. Analyze multi-step process results and complex execution paths. This improved clarity helps domain experts provide more accurate and consistent feedback when working with complex multi-agent systems. ## Core Platform ### Categorical Evaluators New evaluator type that enables classification-based human evaluation with custom scoring. Categorical Evaluators Define custom categorical labels and assign specific scores to each category. Create binary classifications with associated scores for clear go/no-go decisions. Track when outputs shift from high-scoring to low-scoring categories over time. Define multiple categories representing different quality levels or response types. Categorical evaluators provide more structured and interpretable evaluation results compared to purely numeric scores, making it easier to identify specific failure modes in your AI applications. ## Core Platform ### Thread View New visualization mode that displays all LLM events and chat history in a unified, chronological timeline. Thread View View all LLM events alongside complete chat history in a single interface. Understand the full context of multi-turn conversations without navigating through nested spans. The system automatically identifies when control passes between different LLM workflows or agents, highlighting transition points in complex multi-agent systems. Domain experts can provide feedback at the session level, which is automatically applied to the root span (session event) in the trace. ## Core Platform ### Improved Graph View Major enhancements to Graph View with automatic node deduplication and new analytical features. Improved Graph View The graph now intelligently deduplicates nodes, simplifying visualization of complex agent trajectories. View total number of nodes, state transitions, and structural complexity metrics for your agent workflows. Edge thickness represents execution frequency, making common paths immediately visible. Identify which nodes are causing performance issues in your agent workflows. Visualize the most frequent paths through your agent's decision tree to understand typical execution patterns. ## Core Platform ### [Introducing Alerts](https://docs.honeyhive.ai/monitoring/alerts) Monitor key metrics and get notified when behavior changes in your AI applications. HoneyHive Alerts 1. **Comprehensive Monitoring:** Track performance metrics (latency, error rate), quality scores from evaluators, cost and usage patterns, plus any custom fields from your events or sessions. Get visibility into what matters most for your AI applications. 2. **Smart Alert Types:** **Aggregate Alerts** trigger when metrics cross absolute thresholds, while **Drift Alerts** detect when current performance deviates from previous periods by a configurable percentage. Choose the right detection method for your use case. 3. **Flexible Scheduling:** Configure alerts to run hourly, daily, weekly, or monthly based on your monitoring needs. Set custom evaluation windows to balance responsiveness with noise reduction. 4. **Streamlined Workflow:** Real-time preview charts show exactly what your alert will monitor, guided configuration in the right panel walks you through setup, and a recent activity feed tracks alert history. Manage alert states (Active, Triggered, Resolved, Paused, Muted) directly from each alert's detail page. ### Evaluator Templates Gallery Quick-start your evaluations with pre-built templates organized by use case: Agent Trajectory, Tool Selection, RAG, Summarization, Translation, Structured Output, Code Generation, Performance, Safety, and Traditional NLP. New Evaluator Creation Flow ## Core Platform ### New Trace Visualization Modes 1. **Session Summaries and New Tree View:** Unified view of metrics, evaluations, and feedback across all spans in an agent session. Get a comprehensive overview without jumping between individual spans to understand overall session performance. Tree Wiew 2. **Timeline View:** Flamegraph visualization that identifies latency bottlenecks and shows the relationship between sequential and parallel operations in your agent workflows. Perfect for performance optimization and understanding execution flow. Timeline Wiew 3. **Graph View:** Visual representation of complex execution paths and decision points through multi-agent workflows. Quickly understand how your agents interact and make decisions at a glance. Graph Wiew ### Improved Log Store Analytics **Volume Charts:** New mini-charts display request volume patterns over time directly in the sessions table, providing instant visibility into traffic trends and activity levels without needing to drill into individual sessions. New Log Store ## Core Platform ### Role-Based Access Control (RBAC) RBAC 1. **Two-Tier Permission Structure:** Granular permission management with organization and project-level controls. Organization Admins have full control across the entire organization, while Project Admins maintain complete control within specific projects. This creates clear boundaries between teams and prevents data leakage between business units. 2. **Enhanced API Key Security:** Project-specific API key scoping ensures that teams can only access data within their designated projects. This provides better security isolation and compliance with industry regulations, especially critical for organizations in financial services, healthcare, and insurance. 3. **Flexible Team Management:** Easy onboarding and role transitions with transparent permission hierarchy. Delegate administrative responsibilities without compromising security, and manage team member access as organizations evolve. 4. **Seamless Migration Process:** Existing customers can migrate to RBAC with minimal disruption. All current users are automatically assigned Organization Admin roles, and project-specific API keys are available in Settings. Legacy API keys will remain functional until August 31st, 2025. [Learn more about RBAC implementation](https://docs.honeyhive.ai/workspace/roles) ## Core Platform * Added list of allowed characters for project names ## Python SDK (Logger) ### HoneyHive Logger (`honeyhive-logger`) released * The logger sdk has 1. No external dependencies 2. A fully stateless design * Optimized for * Serverless environments * Highly regulated environments with strict security requirements ## TypeScript SDK (Logger) ### HoneyHive Logger (`@honeyhive/logger`) released * The logger sdk has 1. No external dependencies 2. A fully stateless design * Optimized for * Serverless environments * Highly regulated environments with strict security requirements ## Python SDK - Version \[v0.2.49] * Added type annotation to decorators and the evaluation harness ## Documentation * Added documentation for Python/Typescript Loggers * Updated gemini integration documentation to use latest sdk (Python and TypeScript) ## Core Platform ### Support for External Datasets in Experiments You can now log experiments using external datasets with custom IDs for both datasets and datapoints. External dataset IDs will display with the “EXT-” prefix in the UI. This feature provides greater flexibility for teams working with custom datasets while maintaining full integration with our experiment tracking. ``` { "id": "", // Optional "name": "", // Optional "data": [ { "id": "", // Optional "inputs": { ... }, "ground_truths": { ... }, } // Additional datapoints... ] } ``` ## Documentation * Standardizes parameter names and clarified evaluation order in Experiments Quickstart and Python/TS SDK docs. * Adds cookbook: [Inspirational Quotes Recommender with Qdrant and OpenAI](https://github.com/honeyhiveai/cookbook/tree/main/qdrant-discovery) ## Core Platform * Bug fixes and improvements across various areas to enhance performance and stability. ## Documentation * Adds [Evaluating External Logs](https://docs.honeyhive.ai/evaluation/external_logs) tutorial. * Updates Python and TypeScript SDK's references and overall documentation to align with recent improvements and best practices. ## Core Platform * Bug fixes for playground & evaluator version controls. ## Documentation * Adds [Datasets Introduction Guide](https://docs.honeyhive.ai/datasets/introduction). * Adds [Server-side Evaluator Templates List](https://docs.honeyhive.ai/evaluators/evaluator-templates) documentation. * Adds [LangGraph](https://docs.honeyhive.ai/integrations/langgraph) Integration documentation. ## Core Platform ### Wide Mode We've introduced a new **Wide Mode** option that allows users to hide the sidebar, providing: * Expanded workspace area for a more immersive viewing experience * Distraction-free environment when focusing on complex tasks * Better content visibility on smaller screens and split-window setups * Toggle controls accessible via the header menu for easy switching