Documentation Index
Fetch the complete documentation index at: https://docs.honeyhive.ai/llms.txt
Use this file to discover all available pages before exploring further.
May 2026
v1.0.0 - May 21, 2026
Initial 1.0.0 stable release. The federatedhive-control-plane and hive-data-plane chart layout is now GA. This release adds ACM cert auto-discovery support, dedicated metrics ports for dp-pythonmetric and dp-llmproxy, and two values key changes that may require overlay updates.Upgrade steps:- If you override
dpIngestionService.ports.metrics, rename the key todpIngestionService.metricsPortin your data-plane values overlay. The old nestedports.metricskey no longer exists; if you don’t rename, the dp-ingestion metrics port silently reverts to the default (9091). - If you use Prometheus Operator and relied on the default
serviceMonitor.enabled: true, addserviceMonitor.enabled: trueto both your control-plane and data-plane values overlays. The default was changed tofalseto avoid CRD-missing errors for customers who do not run Prometheus Operator. Without it, ServiceMonitor resources are not created and Prometheus stops scraping HoneyHive pods. - (Optional) To opt into ACM cert auto-discovery on the control-plane frontend ALB, leave
frontendIngress.tls.certificateArnempty (withfrontendIngress.tls.enabled: true) and setfrontendIngress.tls.hoststo the hostname(s) the ALB must serve. - (Optional) To opt into ACM cert auto-discovery on the data-plane ALB (primary + wildcard), leave
ingress.tls.certificateArnempty (withingress.tls.enabled: true) and setingress.tls.hoststo your data-plane hostname(s).
- Action Required:
serviceMonitor.enableddefault changed fromtruetofalsein bothhive-control-plane/values.yamlandhive-data-plane/values.yaml. The default wastruein v0.104.0, which caused Helm to render ServiceMonitor resources even when the Prometheus Operator CRDs were not installed, failing the deploy. If you use Prometheus Operator and relied on the default, addserviceMonitor.enabled: trueto both your control-plane and data-plane values overlays. Without it, ServiceMonitor resources are not created and Prometheus stops scraping HoneyHive pods. - Action Required:
dpIngestionService.ports.metricsrenamed todpIngestionService.metricsPortinhive-data-plane/values.yaml. The nestedports.metricskey was the only service using that structure; all other services use a flatmetricsPortkey. The old key no longer exists; if you don’t rename, the dp-ingestion metrics port silently reverts to the default (9091). - New
dpPythonmetricService.metricsPort(default:9091) anddpLlmproxyService.metricsPort(default:9091) inhive-data-plane/values.yamlfor dedicated Prometheus metrics ports on dp-pythonmetric and dp-llmproxy. WhenserviceMonitor.enabled: true, ServiceMonitors are now also created for these two services. - New
frontendIngress.tls.hosts(default:[]) inhive-control-plane/values.yamlandingress.tls.hosts(default:[]) inhive-data-plane/values.yamlfor ACM certificate auto-discovery.frontendIngress.tls.hostscontrols the control-plane frontend ALB;ingress.tls.hostscontrols the data-plane API ALB (applied to both the primary and wildcard Ingresses). Rendered asspec.tls[*].hostsso the AWS Load Balancer Controller can match an ACM cert by SNI/SAN whencertificateArnis empty. LeavecertificateArnempty (withtls.enabled: true) and settls.hoststo your hostname(s). - Fixed ACM certificate auto-discovery on the data-plane wildcard Ingress and control-plane frontend Ingress. Previously, leaving
certificateArnblank emitted an emptyalb.ingress.kubernetes.io/certificate-arnannotation that ArgoCD rejected with a nil-annotation error. The annotation is now omitted entirely whencertificateArnis empty. The control-plane frontend Ingress now contributes aspec.tls[*].hostscert-discovery hint whenfrontendIngress.tls.hostsis set, so HTTPS listener creation no longer fails on overlays that rely on auto-discovery. Overlays that set an explicitcertificateArncontinue to render identically.
v0.104.0 - May 13, 2026
- Action Required:
common.dataplane.dpPublicUrldefault changed to""indata-plane/services/values.yaml. Set this to your environment’s public data plane URL (e.g.,"https://api.my-dp.example.com"). Without it, the Admin Center Data Planes view and/settings/project/keyspage show no URL. - Action Required:
dpIngestionServiceHPA defaults changed indata-plane/services/values.yaml.targetCPUUtilizationPercentageis now60(was80), and a newtargetMemoryUtilizationPercentage(default:60) was added. To keep prior behavior, set CPU to80and memory to0. - Action Required:
ingress.maxBodySizeremoved from bothcontrol-plane/services/values.yamlanddata-plane/services/values.yaml. Remove any overrides for this key. - Action Required:
ingress.authHostremoved fromcontrol-plane/services/values.yaml. Remove any overrides for this key. - Action Required:
jobs.serviceAccount.createnow defaults totruein both planes. Set tofalseif you manage the jobs ServiceAccount externally. - Action Required (ArgoCD users): Deployments omit
spec.replicaswhen autoscaling is enabled. AddignoreDifferenceswithjsonPointers: ["/spec/replicas"]for Deployment kind in your ArgoCD Application spec. - New
<service>.affinity(default:{}) for all CP/DP services andjobs.affinityfor CronJobs/batch jobs. SupportspodAffinity,podAntiAffinity, andnodeAffinity. - New
<service>.autoscaling.enabled(default:true) for all 11 CP/DP services. Set tofalseto disable HPA and pin replicas via<service>.replicas. - New
frontendIngress.tls.redirect(default:false) incontrol-plane/services/values.yaml. Whentrue, the ALB listens on HTTP:80 and redirects to HTTPS. - New
dpIngestionService.resources(default:{}) indata-plane/services/values.yamlfor per-service resource overrides on ingestion. - New
dpPythonmetricService.gunicornWorkers(default:4) anddpPythonmetricService.pythonExecutionTimeout(default:0.1) indata-plane/services/values.yamlfor tuning custom metric concurrency and per-metric runtime. - New
cpNotificationService.ses.domain,.ses.rps(default:14),.ses.senderRoleArn,.ses.senderRoleDurationSeconds(default:3600) incontrol-plane/services/values.yamlfor SES email sender configuration. - New
serviceMonitor.enabled(default:true),serviceMonitor.namespace(default:"monitoring"),serviceMonitor.labels(default:{release: monitoring}),serviceMonitor.interval(default:"30s"),serviceMonitor.scrapeTimeout(default:"10s") in both planes for Prometheus Operator ServiceMonitor support. - New
<service>.metricsPort(default:9091), a dedicated Prometheus metrics port for all CP/DP services. ingress.hostandingress.grpcHostnow accept a string or a list of strings for multi-host ingress support.- Deployments no longer reset replica count on
helm upgradewhen autoscaling is enabled. - Fixed NATS anti-affinity selector to correctly distinguish nats-box from nats server pods.
- Custom CA certs from
common.tls.caCertsnow apply to the DP S3 backfill Job. - Added
app.kubernetes.io/versionlabel and OTelservice.versionattribute to all CP/DP services. - Distributed tracing is now correctly emitted from
dp-backend,dp-llmproxy, anddp-controller(theOTEL_ENABLED=trueenv var was previously missing on these deployments). - Ingress routing improvements:
/v1/events/*/annotateroutes to dp-backend,/v1/*ingestion paths route to ingestion-service, API and frontend traffic use separate ALBs. - Fixed PDB template to correctly handle
minAvailable: 0andmaxUnavailable: 0.
April 2026
v0.103.0 - April 24, 2026
- New
jobsconfiguration block incontrol-plane/services/values.yamlfor Kubernetes CronJobs that process alert transitions - New
jobsconfiguration block indata-plane/services/values.yamlfor operator-triggered S3 time-window backfill batch workloads - New per-service keys:
<service>.service.annotations(default:{}),<service>.service.labels(default:{}),<service>.service.type(default:"ClusterIP") now exposed oncpWriterService,cpControllerService,cpNotificationService,dpControllerService,dpIngestionService,dpEvaluationService,dpLlmproxyService,dpPythonmetricService— previously hardcoded - Service port protocol hints (
name: http,protocol: TCP,appProtocol: http/grpc) added to all service templates for correct L7 traffic classification by service meshes (Istio, Linkerd) appProtocoladded to ClickHouse instance Service ports (http,interserver,metrics:appProtocol: http;native:appProtocol: tcp)- ClickHouse instances
chi-installationchart version bumped from1.1.6to1.1.7 - No breaking changes — fully backward-compatible with v0.102.0 configurations
v0.102.0 - April 8, 2026
- New
podAnnotations(default:{}) incontrol-plane/infrastructure/clickhouse/clickhouse_instances/values.yamlfor arbitrary annotations on ClickHouse pods (useful for Datadog autodiscovery, Prometheus scraping) - Fixed missing
DP_DATABASE_URLenv var in dp-pythonmetric-service deployment template, now reads fromcommon.externalSecrets.postgres.secretName/uriKeylike all other data-plane services - No breaking changes — fully backward-compatible with v0.101.0 configurations
v0.101.0 - April 1, 2026
- Disabled ClickHouse replica check before attaching backup parts (
CLICKHOUSE_CHECK_REPLICAS_BEFORE_ATTACHset to"false"in backup container) - prevents backup restore failures in environments where replica availability cannot be confirmed - ClickHouse instances
chi-installationchart version bumped from1.1.5to1.1.6 - No breaking changes — fully backward-compatible with v0.100.0 configurations
March 2026
v0.100.0 - March 25, 2026
- New
ingress.albClassName(default:"alb") andfrontendIngress.albClassName(default:"alb") in control-plane and data-plane services for configurable ALB ingress class- Useful for shared CP+DP cluster scenarios where each plane needs its own ALB IngressClass (e.g.,
"cp-alb","dp-alb") - Existing deployments using the default
"alb"class require no changes
- Useful for shared CP+DP cluster scenarios where each plane needs its own ALB IngressClass (e.g.,
- New scheduling config for CP prometheus-nats-exporter in
control-plane/infrastructure/nats/values.yaml:prometheusExporter.tolerations,prometheusExporter.nodeSelector,prometheusExporter.affinity,prometheusExporter.additionalLabels common.extraLabelsnow propagated to all Service, ServiceAccount, HPA, and Ingress resources across both planes (previously only applied to Deployments)
v0.99.3 - March 20, 2026
- Action Required: Add
common.controlPlane.idindata-plane/services/values.yaml- set it to matchcommon.controlPlane.idfrom your control-plane values. Without this,dp-controller-servicecannot identify its parent control plane, causing data-plane-to-control-plane communication failures - New
global.labels(default:{}) incontrol-plane/infrastructure/nats/values.yamlanddata-plane/infrastructure/nats/values.yaml- applied to all NATS-generated Kubernetes resources (StatefulSet, Service, PVC, ConfigMap, PDB, etc.) - New
nack.additionalLabels,nack.tolerations,nack.nodeSelector,nack.affinityin control-plane NATS values for JetStream Controller scheduling prometheusExporter.additionalLabelsnow propagated to Prometheus NATS Exporter Deployment, Service, and ServiceMonitor labels in both planes- Fixed
topologySpreadConstraintstypo in control-plane NATS values (topolicySpreadConstraints→topologySpreadConstraints)
v0.99.2 - March 9, 2026
- ClickHouse instances
chi-installationchart version bumped from1.1.4to1.1.5incontrol-plane/infrastructure/clickhouse/clickhouse_instances/Chart.yaml
v0.99.1 - March 6, 2026
- Fixed cp-notification-service healthcheck port - renamed env var from
EXPRESS_PORTtoPORTin deployment template - Control plane
hive-control-planechart version bumped from0.2.0to0.2.1
v0.99.0 - March 6, 2026
- Added readiness and liveness probes (
GET /healthcheck) to all 9 service deployments across control-plane and data-plane cp-backend-serviceanddp-backend-serviceinclude a startup probe (allows up to 310s for Prisma migrations before liveness checks begin)- Eliminates intermittent 502 errors during rolling updates caused by traffic routing to pods not yet ready to serve
- No values.yaml changes required - probes use hardcoded values in deployment templates
February 2026
v0.98.9 - February 25, 2026
- Unified encryption configuration for cp-controller, dp-controller, and dp-llmproxy-service
- Supports at-rest encryption for identity management and provider secrets via KMS or environment variable mode
- New
common.encryption.keyIdvalue in both control-plane and data-plane services - New ExternalSecret templates for encryption in both control-plane and data-plane secret-store charts
- Replaced
dpLlmproxyService.kmsKeyIdwith unifiedHH_ENCRYPTION_KEY_IDandHH_ENCRYPTION_SECRETenv vars in dp-llmproxy-service - Fixed perpetual ArgoCD OutOfSync caused by Redis PDB
enabled: truein control-plane Redis - Action Required: Remove
dpLlmproxyService.kmsKeyIdfrom data-plane values and addcommon.encryption.keyIdin both control-plane and data-plane values
February 2026
v0.98.1 - February 11, 2026
- Added
common.extraLabelsfor custom governance/compliance labels on all Kubernetes resources (control-plane, data-plane, shared dependencies) - Added
common.observability.otel.exporterProtocol(default:"grpc") for OTLP exporter protocol configuration in data-plane - Added
dpLlmproxyService.kmsKeyId(default:"alias/hh-provider-secrets") for AWS KMS encryption of LLM provider secrets - Removed duplicate
common.observabilityblock in data-plane services values - Fixed Next.js cache permission errors in cp-frontend-service with
nextjs-cacheemptyDir volume - Fixed OTEL service name in cp-frontend-service (was hardcoded to
cp-controller-service) - Added custom CA certificate support (
SSL_CERT_FILE,REQUESTS_CA_BUNDLE) for dp-llmproxy-service and dp-pythonmetric-service - Added
DP_DATABASE_URLenv var and KMS config to dp-llmproxy-service - Fixed ClickHouse logging configuration (moved to
config.d/99-logger.xmlwithreplace="1") - Action Required: Set
common.extraLabelsif your organization requires specific labels on all resources
January 2026
v0.90.17 - January 12, 2026
- Added kube-prometheus-stack monitoring for both control-plane and data-plane (Prometheus, Grafana, Alertmanager with 30-day retention)
- Added Tempo for distributed tracing in both control-plane and data-plane
- Added Loki and Promtail for centralized log aggregation in control-plane
- Added legacy
nats-oldchart for backward compatibility - Added Datadog integration support for OTEL collectors (disabled by default, set
datadog.enabled: true) - Added
common.tls.caCertsdictionary for custom root CA certificates in data-plane - Added
common.controlPlane.apiPublicUrlfor data-plane to call control-plane API - Added resource limits for dp-llmproxy-service and dp-pythonmetric-service (500m/512Mi requests, 1000m/1Gi limits)
- Added persistent storage for ClickHouse Keeper (
storage.enabled: true,storage.size: "10Gi") - Updated ClickHouse Keeper image to
altinity/clickhouse-keeper:25.3.6.10034.altinitystable-alpine - Simplified ClickHouse Operator values from 903 lines to 29 lines
- Updated ExternalSecret API version from
v1beta1tov1(requires External Secrets Operator 0.9.0+) - Moved OTEL collector
nodeSelector/affinity/tolerationsunderopentelemetry-collectorkey - Action Required: Set
common.controlPlane.apiPublicUrlto your control-plane API endpoint - Action Required: Ensure ClickHouse Keeper persistent storage is enabled for production
- Action Required: Update External Secrets Operator to 0.9.0+ if not already
December 2025
December 2025
- Removed OpenUnison authentication infrastructure (all charts, operators, CRDs)
- Removed Nginx ingress infrastructure
- Added NATS infrastructure for data-plane with independent cluster deployment (3 replicas, JetStream, PDB)
- Disabled S3 DLQ and disk spool in writer service (
cpWriterService.dlq.enabled: false) - Added
frontendIngress.alb.annotationsfor custom ALB annotations - Removed PVC functionality from cp-writer-service
- Changed cp-frontend-service
NEXTJS_PORTenv var toPORT - Added auth config env vars (
AUTH_ISSUER_DOMAIN,AUTH_CLIENT_ID,AUTH_CLIENT_SECRET) to cp-frontend-service - Added NATS connection settings for data-plane services (dp-evaluation-service, dp-ingestion-service)
- Enabled Redis authentication in control-plane (
auth: true,existingSecret: redis-secrets) - Removed gRPC ingress from data-plane services
- Switched from NLB to ALB for both control-plane and data-plane ingress
- Added NATS HA streams configuration with configurable replicas
- Added
common.dataPlane.dpPublicUrlandcommon.controlPlane.frontendPublicUrlfor cross-plane communication - Added Prometheus monitoring for NATS (exporter on port 7777) and ClickHouse (built-in on port 9363)
- Added Redis authentication for data-plane (
auth: true,existingSecret: redis-secrets) - Fixed Redis PDB in data-plane (removed invalid
enabledfield) - Action Required: Remove any NLB-related values overrides and switch to ALB configuration
- Action Required: Remove any OpenUnison or Beekeeper-related overrides from values files
- Action Required: Configure auth secrets in AWS Secrets Manager with
client-secretandcp-jwt-private-key

