Skip to content

HyperbytedbCluster

The HyperbytedbCluster custom resource declares a HyperbyteDB database deployment. The operator reconciles this into a StatefulSet, headless Service, client Service, PVCs, ConfigMaps, and optional monitoring resources.


Quick Start

Single Node

A minimal single-node deployment for development or testing:

apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
  name: hyperbytedb-single
  namespace: default
spec:
  replicas: 1
  image: hyperbytedb:latest
  version: "1.0.0"
  server:
    port: 8086
  storage:
    backend: local
    volumeClaimTemplate:
      size: 5Gi
  resources:
    requests:
      cpu: 250m
      memory: 512Mi
    limits:
      cpu: "1"
      memory: 2Gi

Three-Node Cluster

A production-ready cluster with replication, monitoring, and failover:

apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
  name: hyperbytedb-cluster
  namespace: default
spec:
  replicas: 3
  image: hyperbytedb:latest
  version: "1.0.0"
  server:
    port: 8086
    requestTimeoutSecs: 30
    queryTimeoutSecs: 30
  storage:
    backend: local
    volumeClaimTemplate:
      size: 10Gi
  flush:
    intervalSecs: 10
    walSizeThresholdMb: 64
    timeBucketDuration: "1h"
  compaction:
    enabled: true
    intervalSecs: 300
    minFilesToCompact: 4
    targetFileSizeMb: 256
  chdb:
    poolSize: 4
  logging:
    level: info
    format: json
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: "2"
      memory: 4Gi
  cluster:
    heartbeatIntervalSecs: 2
    heartbeatMissThreshold: 5
    antiEntropyIntervalSecs: 60
    replicationMaxRetries: 5
    raftHeartbeatIntervalMs: 300
    raftElectionTimeoutMs: 1000
  monitoring:
    enabled: true
    serviceMonitor: true
  failover:
    enabled: true
    maxFailoverCount: 1
    failoverTimeoutSecs: 300
  podAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8086"

High-Availability with S3 and Autoscaling

A full-featured deployment with S3 storage, autoscaling, TLS, and zone-aware scheduling:

apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
  name: hyperbytedb-ha
  namespace: default
spec:
  replicas: 5
  image: hyperbytedb:latest
  version: "1.0.0"
  server:
    port: 8086
    requestTimeoutSecs: 60
    queryTimeoutSecs: 60
  storage:
    backend: s3
    volumeClaimTemplate:
      size: 50Gi
      storageClassName: fast-ssd
    s3:
      bucket: hyperbytedb-data
      prefix: "production/"
      region: us-east-1
      credentialsSecretName: hyperbytedb-s3-credentials
  flush:
    intervalSecs: 5
    walSizeThresholdMb: 128
    timeBucketDuration: "1h"
  compaction:
    enabled: true
    intervalSecs: 120
    minFilesToCompact: 4
    targetFileSizeMb: 512
  chdb:
    poolSize: 8
  auth:
    enabled: true
    credentialsSecretName: hyperbytedb-auth
  logging:
    level: info
    format: json
  resources:
    requests:
      cpu: "2"
      memory: 4Gi
    limits:
      cpu: "4"
      memory: 8Gi
  cluster:
    heartbeatIntervalSecs: 1
    heartbeatMissThreshold: 3
    antiEntropyIntervalSecs: 30
    syncMaxConcurrentFiles: 8
    replicationMaxRetries: 10
    raftHeartbeatIntervalMs: 200
    raftElectionTimeoutMs: 800
    raftSnapshotThreshold: 500
  monitoring:
    enabled: true
    serviceMonitor: true
    grafanaDashboard: true
  failover:
    enabled: true
    maxFailoverCount: 2
    failoverTimeoutSecs: 180
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: hyperbytedb
          app.kubernetes.io/instance: hyperbytedb-ha
  tolerations:
    - key: dedicated
      operator: Equal
      value: hyperbytedb
      effect: NoSchedule

TLS-Enabled Cluster

Enable TLS with operator-managed self-signed certificates:

apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
  name: hyperbytedb-tls
  namespace: default
spec:
  replicas: 3
  image: hyperbytedb:latest
  version: "1.0.0"
  server:
    port: 8086
    tls:
      enabled: true
      # Omit secretName to let the operator generate a self-signed certificate.
      # Provide secretName to use a pre-existing TLS Secret:
      # secretName: hyperbytedb-tls-cert
  storage:
    backend: local
    volumeClaimTemplate:
      size: 10Gi
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: "2"
      memory: 4Gi
  failover:
    enabled: true
  monitoring:
    enabled: true
    serviceMonitor: true

To use cert-manager instead of self-signed certificates:

server:
  tls:
    enabled: true
    certManagerIssuerRef:
      name: letsencrypt-prod
      kind: ClusterIssuer

Spec Reference

Top-Level Fields

Field Type Default Description
replicas int32 1 Number of cluster members (min: 1)
image string hyperbytedb:latest Container image for HyperbyteDB
version string Version tag for upgrade orchestration; changing this triggers a rolling upgrade
imagePullPolicy string Kubernetes image pull policy (Always, IfNotPresent, Never)
imagePullSecrets list References to Secrets for pulling from private registries
paused bool false When true, the operator skips reconciliation for manual maintenance
resources ResourceRequirements CPU and memory requests/limits for each pod
podAnnotations map Additional annotations applied to each pod
podLabels map Additional labels applied to each pod
affinity Affinity Kubernetes pod affinity/anti-affinity rules
topologySpreadConstraints list Constraints for spreading pods across topology domains
tolerations list Kubernetes tolerations for tainted nodes
additionalVolumes list Extra volumes to mount in pods
additionalVolumeMounts list Mount points for additional volumes

server

Field Type Default Description
port int32 8086 HTTP API port (1--65535)
maxBodySizeBytes int64 26214400 Maximum request body size (25 MiB)
requestTimeoutSecs int32 30 HTTP request timeout
queryTimeoutSecs int32 30 Query execution timeout
tls object TLS configuration (see below)

server.tls

Field Type Default Description
enabled bool Enable TLS for the HTTP API
secretName string Name of a kubernetes.io/tls Secret. Omit to let the operator generate self-signed certs
certManagerIssuerRef object Reference to a cert-manager Issuer or ClusterIssuer (name, kind, optional group)

storage

Field Type Default Description
backend string local Storage backend: local or s3
volumeClaimTemplate.storageClassName string StorageClass for dynamically provisioned PVCs
volumeClaimTemplate.size Quantity 10Gi PVC size per replica
s3.bucket string S3 bucket name (required when backend: s3)
s3.prefix string Key prefix within the bucket
s3.region string AWS region
s3.endpoint string Custom S3-compatible endpoint URL
s3.credentialsSecretName string Secret with access_key_id and secret_access_key keys

flush

Field Type Default Description
intervalSecs int32 10 How often the WAL is flushed to Parquet
walSizeThresholdMb int32 64 WAL size threshold that triggers an early flush
timeBucketDuration string 1h Time bucket granularity for Parquet partitioning

compaction

Field Type Default Description
enabled bool true Enable background compaction
intervalSecs int32 300 Compaction check interval
minFilesToCompact int32 4 Minimum Parquet files before compaction triggers
targetFileSizeMb int32 256 Target output file size

chdb

Field Type Default Description
poolSize int32 4 Number of embedded ClickHouse query engine instances

auth

Field Type Default Description
enabled bool false Enable authentication for the HTTP API
credentialsSecretName string Secret containing auth credentials

logging

Field Type Default Description
level string info Log level: trace, debug, info, warn, error
format string text Log format: text or json

cluster

Tuning parameters for multi-replica cluster behavior. These only take effect when replicas > 1.

Field Type Default Description
heartbeatIntervalSecs int32 2 Interval between peer heartbeats
heartbeatMissThreshold int32 5 Missed heartbeats before marking a peer unhealthy
antiEntropyIntervalSecs int32 60 Interval for Merkle tree data verification
antiEntropyEnabled bool true Enable periodic anti-entropy checks
syncMaxConcurrentFiles int32 4 Max concurrent file transfers during sync
replicationMaxRetries int32 5 Max retries for write replication
raftHeartbeatIntervalMs int32 300 Raft leader heartbeat interval
raftElectionTimeoutMs int32 1000 Raft election timeout
raftSnapshotThreshold int32 1000 Log entries before Raft snapshot
tls object TLS for inter-node replication traffic (same schema as server.tls)

monitoring

Field Type Default Description
enabled bool true Expose Prometheus metrics
serviceMonitor bool true Create a Prometheus ServiceMonitor resource
grafanaDashboard bool false Create a Grafana dashboard ConfigMap with the grafana_dashboard label

autoscaling

Field Type Default Description
enabled bool Enable HorizontalPodAutoscaler
minReplicas int32 Minimum replica count (min: 1)
maxReplicas int32 Maximum replica count (required, min: 1)
targetCPUUtilizationPercentage int32 80 Target average CPU utilization

failover

Field Type Default Description
enabled bool true Enable automatic failover
maxFailoverCount int32 1 Maximum simultaneous failovers (min: 1)
failoverTimeoutSecs int32 300 Seconds a member must be unhealthy before failover (min: 60)

Status

The operator maintains a .status subresource with the following fields:

Field Type Description
phase string Current lifecycle phase: Pending, Initializing, Running, Scaling, Upgrading, Failed
replicas int32 Desired replica count
readyReplicas int32 Number of replicas passing readiness checks
clusterState string High-level health: Healthy, Degraded, Recovering, Unknown
replicationState string Replication convergence: Healthy, Lagging, Diverged, Unknown
members list Per-member status (name, nodeID, podName, state, health, WAL sequence, parquet file count, peer count)
failoverCount int32 Number of failovers in the current generation
configHash string Hash of the current config.toml (used for rolling update detection)
conditions list Standard Kubernetes conditions

Check cluster status:

kubectl get hyperbytedbcluster hyperbytedb-cluster -o wide
NAME               REPLICAS   READY   PHASE     CLUSTER   AGE
hyperbytedb-cluster   3          3       Running   Healthy   5m

Inspect detailed status:

kubectl get hyperbytedbcluster hyperbytedb-cluster -o jsonpath='{.status}' | jq

Operations

Rolling Upgrade

Change the version field to trigger a rolling upgrade:

kubectl patch hyperbytedbcluster hyperbytedb-cluster \
  --type merge -p '{"spec":{"version":"1.1.0"}}'

The operator upgrades one pod at a time, waiting for readiness before proceeding. The cluster phase transitions to Upgrading during the process.

Scaling

kubectl scale hyperbytedbcluster hyperbytedb-cluster --replicas=5

Or patch the spec directly:

kubectl patch hyperbytedbcluster hyperbytedb-cluster \
  --type merge -p '{"spec":{"replicas":5}}'

Pause Reconciliation

kubectl patch hyperbytedbcluster hyperbytedb-cluster \
  --type merge -p '{"spec":{"paused":true}}'

Resume with "paused":false.


See Also