How to quickly set up logging with HoneyHive.
How to run end-to-end evaluations programmatically via the SDK.
How to trace and debug your LlamaIndex RAG pipelines.
How to trace and debug your LangChain chains and agents.
API Reference Guide
Our reference guide for using our REST APIs.
Data Model Overview
An overview of our data model and different logging methods.
What is HoneyHive?
HoneyHive is the AI Evaluation & Observability Platform. Our tools help you test and evaluate, monitor and debug, and continuously improve your Generative AI applications, enabling a Test-Driven Development (TDD) workflow for your team. A TDD workflow plays a crucial role in transforming your AI prototypes into reliable, enterprise-ready applications.
Pre-Producion: Offline Evaluation and Testing
Test new app versions against your golden test dataset using wide variety of Python and LLM Evaluators that help you quantify and evaluate performance objectively. This helps you confidently choose the best performing variant, debug where your app failed, safely validate quality, and check for regressions before costly errors happen.
In-Production: Online Evaluation, Monitoring, & Debugging
Once in production, our online evaluators and self-serve analytics help you understand user behavior and detect anomalies across your application. Get started by instrumenting your application to log completion requests, user sessions, user feedback, custom metrics and user-specific metadata and creating visualizations of any custom metrics across any data slice. HoneyHive also allows you to trace and visualize fine-grained execution of multi-step LLM chains, agents, and RAG pipelines, so you can precisely pinpoint subtle problems and root cause errors in your pipeline.
Continuous Improvement: Playground & Dataset Management
Our Prompt Studio and dataset management tools allow you to test new prompts and models as you iterate, and label and curate datasets from your production logs for fine-tuning and evaluation. This, combined with our unified suite of evaluation and observability tools allow you to repetably test, measure, and iteratively improve your LLM application, creating a unique data flywheel for continuous improvement.
HoneyHive enables a continuous improvement data flywheel