
Start Tracing
Instrument your first agent and capture traces in 5 minutes.
Run Your First Evaluation
Set up an experiment and evaluate your agent programmatically.
The Workflow
HoneyHive follows an Evaluation-Driven Development (EDD) workflow — similar to TDD in software engineering — where evaluation guides every stage of agent development.Production: Observe and Evaluate
Instrument your application with distributed tracing to capture every interaction. Collect traces, user feedback, and quality metrics from production. Run online evals to surface edge cases at scale, and set up alerts to catch failures or metric drift.
- Traces
- Agent Graphs
- Threads
- Timeline View
- Dashboard
- Alerts
Inspect every LLM call, tool invocation, and chain step in a structured execution log.

Testing: Curate Datasets & Run Experiments
Turn failing production traces into curated test datasets. Run experiments to measure the impact of your changes, track regressions over time, and gate releases in CI.
- Experiments
- Datasets
- Regression Tests
- LLM Evaluators
- Code Evaluators
- Annotation Queues
Compare prompts, models, or configurations side-by-side to see which changes improve performance.

Development: Iterate on Prompts
Use evaluation results to guide changes. Iterate on prompts, test new models, and optimize your application based on what the data shows. Validate changes against curated datasets before deploying.
- Playground
- Prompt Management
Test prompt variations and model configurations with instant feedback before committing to code.

Platform Capabilities
Core features across the development lifecycle:Tracing
Capture and visualize every step of your AI application with distributed tracing.
Experiments & Datasets
Test changes with offline experiments and curated datasets before deploying.
Monitoring & Alerting
Track metrics with dashboards and get alerts when quality degrades.
Online Evaluations
Run automated evals on production traces to catch issues early.
Annotation Queues
Collect expert feedback and turn it into labeled datasets.
Prompt Management
Version and manage prompts across UI and code.
Open Standards, Open Ecosystem
HoneyHive is built on OpenTelemetry, so it works across models, frameworks, and runtimes with no vendor lock-in.
Model Agnostic
Works with OpenAI, Anthropic, Bedrock, open-source models, and more.
Framework Agnostic
Native support for LangChain, CrewAI, Google ADK, AWS Strands, and more.
Runtime Agnostic
Trace any runtime - Lambdas, Kubernetes, Bedrock AgentCore, and more.
Bring Your Own Instrumentor
HoneyHive supports official OTEL GenAI, OpenLLMetry, and OpenInference semantic conventions.
Hosting Options
Multi-Tenant SaaS
Fully managed. Get started in minutes.
Dedicated Cloud
Single-tenant environment managed by our team.
Self-Hosted
Deploy in your VPC for full control and compliance.
Additional Resources
API Reference
REST API documentation for custom integrations.
SDK Documentation
Python SDK guides for advanced use cases.
Invite Your Team
Add teammates and configure role-based access control.
Integrations
Connect with OpenAI, Anthropic, LangChain, and more.













.png)