How HoneyHive’s Management Plane and Data Plane architecture works.
HoneyHive separates the Control Plane from the Data Plane so your application data (traces, evaluations, datasets) never touches the control plane infrastructure. This federated architecture is the foundation of HoneyHive’s security model and determines where your data lives.
Your data stays isolated
Trace and evaluation data is stored in the Data Plane, which has no shared database or credentials with the Control Plane.
You choose where it lives
Deploy the Data Plane in any AWS region, in your own cloud account, or on-premise. See Hosting Models.
Nothing changes when you scale
Move from shared to dedicated infrastructure without changing your SDK integration or workflows.
Control Plane — handles authentication (SSO, SAML 2.0, email/password, MFA), role-based access control, and organization/workspace/project configuration. Stores organizational metadata in PostgreSQL. Has no access to your trace data.
Data Plane — handles trace ingestion, event enrichment, evaluation jobs, and LLM proxy. Operates on its own databases and message queues. Verifies access using short-lived, cryptographically signed tokens issued by the Control Plane.
The Control Plane manages authentication, authorization, and platform configuration. It has no access to your trace or evaluation data.
Service
What it does
Backend API
REST API for authentication, RBAC, organization/workspace/project management, prompt templates, and alert configuration. Exposes a JWKS endpoint for Data Plane token verification.
Web UI
Next.js web application for all platform features. Communicates with both Control Plane and Data Plane APIs.
Controller
Orchestrates Control Plane and Data Plane coordination. Manages Data Plane lifecycle, stream routing, and identity bootstrap (ECDSA keypairs for cluster JWTs). Communicates with Data Plane Controller via bidirectional gRPC stream.
Writer Service
Consumes events from the NATS queue and writes them to ClickHouse. Handles buffering, batching, and real-time enrichment (session linking, metadata inheritance, computed fields). Includes retry logic with exponential backoff and a dead letter queue (S3) for failed writes.
Notification Service
Processes alert notifications and delivers them via email (SES), Slack, or webhooks. Supports scope-based routing and severity stages (critical, warning, resolution).
The Data Plane processes and stores all application data. It verifies access using JWT tokens issued by the Control Plane via a JWKS endpoint — the two planes share no database or credentials.
Service
What it does
Ingestion Service
Receives traces and spans from the HoneyHive SDK via OTLP-compatible HTTP and gRPC endpoints. Validates API keys, normalizes events, and publishes to NATS for downstream processing. Acknowledges receipt immediately to minimize client latency.
Backend API
REST API for Data Plane operations: datasets, datapoints, metrics, experiment runs, charts, provider secrets, and storage. Authenticates requests via JWT tokens or API keys.
Controller
Manages Data Plane lifecycle and communicates with the Control Plane Controller via bidirectional gRPC stream. Reports health metrics and handles identity bootstrap.
Evaluation Service
Consumes events from the NATS queue and executes evaluators (Python, LLM-based, or custom). Publishes evaluation scores to the control plane event stream for persistence. Manages annotation queues and processes online evaluators configured for a project.
LLM Proxy
Routes LLM requests to AI providers via LiteLLM for Playground and LLM-based evaluators. Supports multiple providers (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Vertex AI). Provider credentials are encrypted and scoped per workspace (see Provider Keys).
Python Metric Service
Executes user-defined Python metric code in a sandboxed environment with RestrictedPython. Supports common libraries (pandas, numpy, sklearn, jsonschema) with timeout protection and code size limits.
Trace data, large payloads, and long-term archival. Server-side encryption (SSE-KMS) with versioning for audit trails and lifecycle policies for cost optimization.
The ingestion pipeline is designed for high throughput, low latency, and zero data loss:
Ingestion — the SDK sends traces to the Ingestion Service via OTLP-compatible HTTP or gRPC. The service validates API keys, normalizes incoming events, and publishes to encrypted NATS streams. Receipt is acknowledged immediately to minimize client latency.
Writing and enrichment — the Writer Service pulls events from the CP NATS stream in batches. It enriches events in real time (session linking, metadata inheritance, computed fields) and writes them to ClickHouse. Failed batches are retried with exponential backoff; persistently failing events are sent to a dead letter queue on S3.
Evaluation — the Evaluation Service consumes from the DP NATS stream and executes configured evaluators. Python metrics run in the sandboxed Python Metric Service. LLM-based evaluators route through the LLM Proxy. Scores are published to the CP NATS stream, where the Writer Service persists them to ClickHouse.
HoneyHive uses NATS with JetStream for durable, at-least-once message delivery:
Stream
Subjects
Purpose
events-stream (CP NATS)
events.>
Trace and span events for the Writer Service
notifications-stream (CP NATS)
notifications.>
Alert notifications for the Notification Service
evaluation-stream (DP NATS)
evaluation.>
Evaluation tasks for the Evaluation Service
In production, the Control Plane and Data Plane run separate NATS clusters. The CP NATS cluster uses TLS for external communication. The DP NATS cluster runs internally with no external access.
Organizations requiring complete infrastructure control
Moving from Multi-Tenant SaaS to Dedicated Cloud or Self-Hosted increases physical isolation without changing how you use the platform — your SDK integration, dashboards, and workflows stay the same.
For Dedicated Cloud and Self-Hosted customers, HoneyHive supports private connectivity via AWS PrivateLink and VPC Peering so trace data never traverses the public internet.