Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt

Use this file to discover all available pages before exploring further.

May 2026

v1.0.0 - May 21, 2026

Initial 1.0.0 stable release. The federated hive-control-plane and hive-data-plane chart layout is now GA. This release adds ACM cert auto-discovery support, dedicated metrics ports for dp-pythonmetric and dp-llmproxy, and two values key changes that may require overlay updates.Upgrade steps:
  1. If you override dpIngestionService.ports.metrics, rename the key to dpIngestionService.metricsPort in your data-plane values overlay. The old nested ports.metrics key no longer exists; if you don’t rename, the dp-ingestion metrics port silently reverts to the default (9091).
  2. If you use Prometheus Operator and relied on the default serviceMonitor.enabled: true, add serviceMonitor.enabled: true to both your control-plane and data-plane values overlays. The default was changed to false to avoid CRD-missing errors for customers who do not run Prometheus Operator. Without it, ServiceMonitor resources are not created and Prometheus stops scraping HoneyHive pods.
  3. (Optional) To opt into ACM cert auto-discovery on the control-plane frontend ALB, leave frontendIngress.tls.certificateArn empty (with frontendIngress.tls.enabled: true) and set frontendIngress.tls.hosts to the hostname(s) the ALB must serve.
  4. (Optional) To opt into ACM cert auto-discovery on the data-plane ALB (primary + wildcard), leave ingress.tls.certificateArn empty (with ingress.tls.enabled: true) and set ingress.tls.hosts to your data-plane hostname(s).
What’s changed:
  • Action Required: serviceMonitor.enabled default changed from true to false in both hive-control-plane/values.yaml and hive-data-plane/values.yaml. The default was true in v0.104.0, which caused Helm to render ServiceMonitor resources even when the Prometheus Operator CRDs were not installed, failing the deploy. If you use Prometheus Operator and relied on the default, add serviceMonitor.enabled: true to both your control-plane and data-plane values overlays. Without it, ServiceMonitor resources are not created and Prometheus stops scraping HoneyHive pods.
  • Action Required: dpIngestionService.ports.metrics renamed to dpIngestionService.metricsPort in hive-data-plane/values.yaml. The nested ports.metrics key was the only service using that structure; all other services use a flat metricsPort key. The old key no longer exists; if you don’t rename, the dp-ingestion metrics port silently reverts to the default (9091).
  • New dpPythonmetricService.metricsPort (default: 9091) and dpLlmproxyService.metricsPort (default: 9091) in hive-data-plane/values.yaml for dedicated Prometheus metrics ports on dp-pythonmetric and dp-llmproxy. When serviceMonitor.enabled: true, ServiceMonitors are now also created for these two services.
  • New frontendIngress.tls.hosts (default: []) in hive-control-plane/values.yaml and ingress.tls.hosts (default: []) in hive-data-plane/values.yaml for ACM certificate auto-discovery. frontendIngress.tls.hosts controls the control-plane frontend ALB; ingress.tls.hosts controls the data-plane API ALB (applied to both the primary and wildcard Ingresses). Rendered as spec.tls[*].hosts so the AWS Load Balancer Controller can match an ACM cert by SNI/SAN when certificateArn is empty. Leave certificateArn empty (with tls.enabled: true) and set tls.hosts to your hostname(s).
  • Fixed ACM certificate auto-discovery on the data-plane wildcard Ingress and control-plane frontend Ingress. Previously, leaving certificateArn blank emitted an empty alb.ingress.kubernetes.io/certificate-arn annotation that ArgoCD rejected with a nil-annotation error. The annotation is now omitted entirely when certificateArn is empty. The control-plane frontend Ingress now contributes a spec.tls[*].hosts cert-discovery hint when frontendIngress.tls.hosts is set, so HTTPS listener creation no longer fails on overlays that rely on auto-discovery. Overlays that set an explicit certificateArn continue to render identically.

v0.104.0 - May 13, 2026

  • Action Required: common.dataplane.dpPublicUrl default changed to "" in data-plane/services/values.yaml. Set this to your environment’s public data plane URL (e.g., "https://api.my-dp.example.com"). Without it, the Admin Center Data Planes view and /settings/project/keys page show no URL.
  • Action Required: dpIngestionService HPA defaults changed in data-plane/services/values.yaml. targetCPUUtilizationPercentage is now 60 (was 80), and a new targetMemoryUtilizationPercentage (default: 60) was added. To keep prior behavior, set CPU to 80 and memory to 0.
  • Action Required: ingress.maxBodySize removed from both control-plane/services/values.yaml and data-plane/services/values.yaml. Remove any overrides for this key.
  • Action Required: ingress.authHost removed from control-plane/services/values.yaml. Remove any overrides for this key.
  • Action Required: jobs.serviceAccount.create now defaults to true in both planes. Set to false if you manage the jobs ServiceAccount externally.
  • Action Required (ArgoCD users): Deployments omit spec.replicas when autoscaling is enabled. Add ignoreDifferences with jsonPointers: ["/spec/replicas"] for Deployment kind in your ArgoCD Application spec.
  • New <service>.affinity (default: {}) for all CP/DP services and jobs.affinity for CronJobs/batch jobs. Supports podAffinity, podAntiAffinity, and nodeAffinity.
  • New <service>.autoscaling.enabled (default: true) for all 11 CP/DP services. Set to false to disable HPA and pin replicas via <service>.replicas.
  • New frontendIngress.tls.redirect (default: false) in control-plane/services/values.yaml. When true, the ALB listens on HTTP:80 and redirects to HTTPS.
  • New dpIngestionService.resources (default: {}) in data-plane/services/values.yaml for per-service resource overrides on ingestion.
  • New dpPythonmetricService.gunicornWorkers (default: 4) and dpPythonmetricService.pythonExecutionTimeout (default: 0.1) in data-plane/services/values.yaml for tuning custom metric concurrency and per-metric runtime.
  • New cpNotificationService.ses.domain, .ses.rps (default: 14), .ses.senderRoleArn, .ses.senderRoleDurationSeconds (default: 3600) in control-plane/services/values.yaml for SES email sender configuration.
  • New serviceMonitor.enabled (default: true), serviceMonitor.namespace (default: "monitoring"), serviceMonitor.labels (default: {release: monitoring}), serviceMonitor.interval (default: "30s"), serviceMonitor.scrapeTimeout (default: "10s") in both planes for Prometheus Operator ServiceMonitor support.
  • New <service>.metricsPort (default: 9091), a dedicated Prometheus metrics port for all CP/DP services.
  • ingress.host and ingress.grpcHost now accept a string or a list of strings for multi-host ingress support.
  • Deployments no longer reset replica count on helm upgrade when autoscaling is enabled.
  • Fixed NATS anti-affinity selector to correctly distinguish nats-box from nats server pods.
  • Custom CA certs from common.tls.caCerts now apply to the DP S3 backfill Job.
  • Added app.kubernetes.io/version label and OTel service.version attribute to all CP/DP services.
  • Distributed tracing is now correctly emitted from dp-backend, dp-llmproxy, and dp-controller (the OTEL_ENABLED=true env var was previously missing on these deployments).
  • Ingress routing improvements: /v1/events/*/annotate routes to dp-backend, /v1/* ingestion paths route to ingestion-service, API and frontend traffic use separate ALBs.
  • Fixed PDB template to correctly handle minAvailable: 0 and maxUnavailable: 0.
April 2026

v0.103.0 - April 24, 2026

  • New jobs configuration block in control-plane/services/values.yaml for Kubernetes CronJobs that process alert transitions
  • New jobs configuration block in data-plane/services/values.yaml for operator-triggered S3 time-window backfill batch workloads
  • New per-service keys: <service>.service.annotations (default: {}), <service>.service.labels (default: {}), <service>.service.type (default: "ClusterIP") now exposed on cpWriterService, cpControllerService, cpNotificationService, dpControllerService, dpIngestionService, dpEvaluationService, dpLlmproxyService, dpPythonmetricService — previously hardcoded
  • Service port protocol hints (name: http, protocol: TCP, appProtocol: http/grpc) added to all service templates for correct L7 traffic classification by service meshes (Istio, Linkerd)
  • appProtocol added to ClickHouse instance Service ports (http, interserver, metrics: appProtocol: http; native: appProtocol: tcp)
  • ClickHouse instances chi-installation chart version bumped from 1.1.6 to 1.1.7
  • No breaking changes — fully backward-compatible with v0.102.0 configurations

v0.102.0 - April 8, 2026

  • New podAnnotations (default: {}) in control-plane/infrastructure/clickhouse/clickhouse_instances/values.yaml for arbitrary annotations on ClickHouse pods (useful for Datadog autodiscovery, Prometheus scraping)
  • Fixed missing DP_DATABASE_URL env var in dp-pythonmetric-service deployment template, now reads from common.externalSecrets.postgres.secretName / uriKey like all other data-plane services
  • No breaking changes — fully backward-compatible with v0.101.0 configurations

v0.101.0 - April 1, 2026

  • Disabled ClickHouse replica check before attaching backup parts (CLICKHOUSE_CHECK_REPLICAS_BEFORE_ATTACH set to "false" in backup container) - prevents backup restore failures in environments where replica availability cannot be confirmed
  • ClickHouse instances chi-installation chart version bumped from 1.1.5 to 1.1.6
  • No breaking changes — fully backward-compatible with v0.100.0 configurations
March 2026

v0.100.0 - March 25, 2026

  • New ingress.albClassName (default: "alb") and frontendIngress.albClassName (default: "alb") in control-plane and data-plane services for configurable ALB ingress class
    • Useful for shared CP+DP cluster scenarios where each plane needs its own ALB IngressClass (e.g., "cp-alb", "dp-alb")
    • Existing deployments using the default "alb" class require no changes
  • New scheduling config for CP prometheus-nats-exporter in control-plane/infrastructure/nats/values.yaml: prometheusExporter.tolerations, prometheusExporter.nodeSelector, prometheusExporter.affinity, prometheusExporter.additionalLabels
  • common.extraLabels now propagated to all Service, ServiceAccount, HPA, and Ingress resources across both planes (previously only applied to Deployments)

v0.99.3 - March 20, 2026

  • Action Required: Add common.controlPlane.id in data-plane/services/values.yaml - set it to match common.controlPlane.id from your control-plane values. Without this, dp-controller-service cannot identify its parent control plane, causing data-plane-to-control-plane communication failures
  • New global.labels (default: {}) in control-plane/infrastructure/nats/values.yaml and data-plane/infrastructure/nats/values.yaml - applied to all NATS-generated Kubernetes resources (StatefulSet, Service, PVC, ConfigMap, PDB, etc.)
  • New nack.additionalLabels, nack.tolerations, nack.nodeSelector, nack.affinity in control-plane NATS values for JetStream Controller scheduling
  • prometheusExporter.additionalLabels now propagated to Prometheus NATS Exporter Deployment, Service, and ServiceMonitor labels in both planes
  • Fixed topologySpreadConstraints typo in control-plane NATS values (topolicySpreadConstraintstopologySpreadConstraints)

v0.99.2 - March 9, 2026

  • ClickHouse instances chi-installation chart version bumped from 1.1.4 to 1.1.5 in control-plane/infrastructure/clickhouse/clickhouse_instances/Chart.yaml

v0.99.1 - March 6, 2026

  • Fixed cp-notification-service healthcheck port - renamed env var from EXPRESS_PORT to PORT in deployment template
  • Control plane hive-control-plane chart version bumped from 0.2.0 to 0.2.1

v0.99.0 - March 6, 2026

  • Added readiness and liveness probes (GET /healthcheck) to all 9 service deployments across control-plane and data-plane
  • cp-backend-service and dp-backend-service include a startup probe (allows up to 310s for Prisma migrations before liveness checks begin)
  • Eliminates intermittent 502 errors during rolling updates caused by traffic routing to pods not yet ready to serve
  • No values.yaml changes required - probes use hardcoded values in deployment templates
February 2026

v0.98.9 - February 25, 2026

  • Unified encryption configuration for cp-controller, dp-controller, and dp-llmproxy-service
  • Supports at-rest encryption for identity management and provider secrets via KMS or environment variable mode
  • New common.encryption.keyId value in both control-plane and data-plane services
  • New ExternalSecret templates for encryption in both control-plane and data-plane secret-store charts
  • Replaced dpLlmproxyService.kmsKeyId with unified HH_ENCRYPTION_KEY_ID and HH_ENCRYPTION_SECRET env vars in dp-llmproxy-service
  • Fixed perpetual ArgoCD OutOfSync caused by Redis PDB enabled: true in control-plane Redis
  • Action Required: Remove dpLlmproxyService.kmsKeyId from data-plane values and add common.encryption.keyId in both control-plane and data-plane values
February 2026

v0.98.1 - February 11, 2026

  • Added common.extraLabels for custom governance/compliance labels on all Kubernetes resources (control-plane, data-plane, shared dependencies)
  • Added common.observability.otel.exporterProtocol (default: "grpc") for OTLP exporter protocol configuration in data-plane
  • Added dpLlmproxyService.kmsKeyId (default: "alias/hh-provider-secrets") for AWS KMS encryption of LLM provider secrets
  • Removed duplicate common.observability block in data-plane services values
  • Fixed Next.js cache permission errors in cp-frontend-service with nextjs-cache emptyDir volume
  • Fixed OTEL service name in cp-frontend-service (was hardcoded to cp-controller-service)
  • Added custom CA certificate support (SSL_CERT_FILE, REQUESTS_CA_BUNDLE) for dp-llmproxy-service and dp-pythonmetric-service
  • Added DP_DATABASE_URL env var and KMS config to dp-llmproxy-service
  • Fixed ClickHouse logging configuration (moved to config.d/99-logger.xml with replace="1")
  • Action Required: Set common.extraLabels if your organization requires specific labels on all resources
January 2026

v0.90.17 - January 12, 2026

  • Added kube-prometheus-stack monitoring for both control-plane and data-plane (Prometheus, Grafana, Alertmanager with 30-day retention)
  • Added Tempo for distributed tracing in both control-plane and data-plane
  • Added Loki and Promtail for centralized log aggregation in control-plane
  • Added legacy nats-old chart for backward compatibility
  • Added Datadog integration support for OTEL collectors (disabled by default, set datadog.enabled: true)
  • Added common.tls.caCerts dictionary for custom root CA certificates in data-plane
  • Added common.controlPlane.apiPublicUrl for data-plane to call control-plane API
  • Added resource limits for dp-llmproxy-service and dp-pythonmetric-service (500m/512Mi requests, 1000m/1Gi limits)
  • Added persistent storage for ClickHouse Keeper (storage.enabled: true, storage.size: "10Gi")
  • Updated ClickHouse Keeper image to altinity/clickhouse-keeper:25.3.6.10034.altinitystable-alpine
  • Simplified ClickHouse Operator values from 903 lines to 29 lines
  • Updated ExternalSecret API version from v1beta1 to v1 (requires External Secrets Operator 0.9.0+)
  • Moved OTEL collector nodeSelector/affinity/tolerations under opentelemetry-collector key
  • Action Required: Set common.controlPlane.apiPublicUrl to your control-plane API endpoint
  • Action Required: Ensure ClickHouse Keeper persistent storage is enabled for production
  • Action Required: Update External Secrets Operator to 0.9.0+ if not already
December 2025

December 2025

  • Removed OpenUnison authentication infrastructure (all charts, operators, CRDs)
  • Removed Nginx ingress infrastructure
  • Added NATS infrastructure for data-plane with independent cluster deployment (3 replicas, JetStream, PDB)
  • Disabled S3 DLQ and disk spool in writer service (cpWriterService.dlq.enabled: false)
  • Added frontendIngress.alb.annotations for custom ALB annotations
  • Removed PVC functionality from cp-writer-service
  • Changed cp-frontend-service NEXTJS_PORT env var to PORT
  • Added auth config env vars (AUTH_ISSUER_DOMAIN, AUTH_CLIENT_ID, AUTH_CLIENT_SECRET) to cp-frontend-service
  • Added NATS connection settings for data-plane services (dp-evaluation-service, dp-ingestion-service)
  • Enabled Redis authentication in control-plane (auth: true, existingSecret: redis-secrets)
  • Removed gRPC ingress from data-plane services
  • Switched from NLB to ALB for both control-plane and data-plane ingress
  • Added NATS HA streams configuration with configurable replicas
  • Added common.dataPlane.dpPublicUrl and common.controlPlane.frontendPublicUrl for cross-plane communication
  • Added Prometheus monitoring for NATS (exporter on port 7777) and ClickHouse (built-in on port 9363)
  • Added Redis authentication for data-plane (auth: true, existingSecret: redis-secrets)
  • Fixed Redis PDB in data-plane (removed invalid enabled field)
  • Action Required: Remove any NLB-related values overrides and switch to ALB configuration
  • Action Required: Remove any OpenUnison or Beekeeper-related overrides from values files
  • Action Required: Configure auth secrets in AWS Secrets Manager with client-secret and cp-jwt-private-key