HyperbytedbCluster¶
The HyperbytedbCluster custom resource declares a HyperbyteDB database deployment. The operator reconciles this into a StatefulSet, headless Service, client Service, PVCs, ConfigMaps, and optional monitoring resources.
Quick Start¶
Single Node¶
A minimal single-node deployment for development or testing:
apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
name: hyperbytedb-single
namespace: default
spec:
replicas: 1
image: hyperbytedb:latest
version: "1.0.0"
server:
port: 8086
storage:
backend: local
volumeClaimTemplate:
size: 5Gi
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "1"
memory: 2Gi
Three-Node Cluster¶
A production-ready cluster with replication, monitoring, and failover:
apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
name: hyperbytedb-cluster
namespace: default
spec:
replicas: 3
image: hyperbytedb:latest
version: "1.0.0"
server:
port: 8086
requestTimeoutSecs: 30
queryTimeoutSecs: 30
storage:
backend: local
volumeClaimTemplate:
size: 10Gi
flush:
intervalSecs: 10
walSizeThresholdMb: 64
timeBucketDuration: "1h"
compaction:
enabled: true
intervalSecs: 300
minFilesToCompact: 4
targetFileSizeMb: 256
chdb:
poolSize: 4
logging:
level: info
format: json
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
cluster:
heartbeatIntervalSecs: 2
heartbeatMissThreshold: 5
antiEntropyIntervalSecs: 60
replicationMaxRetries: 5
raftHeartbeatIntervalMs: 300
raftElectionTimeoutMs: 1000
monitoring:
enabled: true
serviceMonitor: true
failover:
enabled: true
maxFailoverCount: 1
failoverTimeoutSecs: 300
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8086"
High-Availability with S3 and Autoscaling¶
A full-featured deployment with S3 storage, autoscaling, TLS, and zone-aware scheduling:
apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
name: hyperbytedb-ha
namespace: default
spec:
replicas: 5
image: hyperbytedb:latest
version: "1.0.0"
server:
port: 8086
requestTimeoutSecs: 60
queryTimeoutSecs: 60
storage:
backend: s3
volumeClaimTemplate:
size: 50Gi
storageClassName: fast-ssd
s3:
bucket: hyperbytedb-data
prefix: "production/"
region: us-east-1
credentialsSecretName: hyperbytedb-s3-credentials
flush:
intervalSecs: 5
walSizeThresholdMb: 128
timeBucketDuration: "1h"
compaction:
enabled: true
intervalSecs: 120
minFilesToCompact: 4
targetFileSizeMb: 512
chdb:
poolSize: 8
auth:
enabled: true
credentialsSecretName: hyperbytedb-auth
logging:
level: info
format: json
resources:
requests:
cpu: "2"
memory: 4Gi
limits:
cpu: "4"
memory: 8Gi
cluster:
heartbeatIntervalSecs: 1
heartbeatMissThreshold: 3
antiEntropyIntervalSecs: 30
syncMaxConcurrentFiles: 8
replicationMaxRetries: 10
raftHeartbeatIntervalMs: 200
raftElectionTimeoutMs: 800
raftSnapshotThreshold: 500
monitoring:
enabled: true
serviceMonitor: true
grafanaDashboard: true
failover:
enabled: true
maxFailoverCount: 2
failoverTimeoutSecs: 180
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: hyperbytedb
app.kubernetes.io/instance: hyperbytedb-ha
tolerations:
- key: dedicated
operator: Equal
value: hyperbytedb
effect: NoSchedule
TLS-Enabled Cluster¶
Enable TLS with operator-managed self-signed certificates:
apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
name: hyperbytedb-tls
namespace: default
spec:
replicas: 3
image: hyperbytedb:latest
version: "1.0.0"
server:
port: 8086
tls:
enabled: true
# Omit secretName to let the operator generate a self-signed certificate.
# Provide secretName to use a pre-existing TLS Secret:
# secretName: hyperbytedb-tls-cert
storage:
backend: local
volumeClaimTemplate:
size: 10Gi
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "2"
memory: 4Gi
failover:
enabled: true
monitoring:
enabled: true
serviceMonitor: true
To use cert-manager instead of self-signed certificates:
Spec Reference¶
Top-Level Fields¶
| Field | Type | Default | Description |
|---|---|---|---|
replicas | int32 | 1 | Number of cluster members (min: 1) |
image | string | hyperbytedb:latest | Container image for HyperbyteDB |
version | string | Version tag for upgrade orchestration; changing this triggers a rolling upgrade | |
imagePullPolicy | string | Kubernetes image pull policy (Always, IfNotPresent, Never) | |
imagePullSecrets | list | References to Secrets for pulling from private registries | |
paused | bool | false | When true, the operator skips reconciliation for manual maintenance |
resources | ResourceRequirements | CPU and memory requests/limits for each pod | |
podAnnotations | map | Additional annotations applied to each pod | |
podLabels | map | Additional labels applied to each pod | |
affinity | Affinity | Kubernetes pod affinity/anti-affinity rules | |
topologySpreadConstraints | list | Constraints for spreading pods across topology domains | |
tolerations | list | Kubernetes tolerations for tainted nodes | |
additionalVolumes | list | Extra volumes to mount in pods | |
additionalVolumeMounts | list | Mount points for additional volumes |
server¶
| Field | Type | Default | Description |
|---|---|---|---|
port | int32 | 8086 | HTTP API port (1--65535) |
maxBodySizeBytes | int64 | 26214400 | Maximum request body size (25 MiB) |
requestTimeoutSecs | int32 | 30 | HTTP request timeout |
queryTimeoutSecs | int32 | 30 | Query execution timeout |
tls | object | TLS configuration (see below) |
server.tls¶
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | Enable TLS for the HTTP API | |
secretName | string | Name of a kubernetes.io/tls Secret. Omit to let the operator generate self-signed certs | |
certManagerIssuerRef | object | Reference to a cert-manager Issuer or ClusterIssuer (name, kind, optional group) |
storage¶
| Field | Type | Default | Description |
|---|---|---|---|
backend | string | local | Storage backend: local or s3 |
volumeClaimTemplate.storageClassName | string | StorageClass for dynamically provisioned PVCs | |
volumeClaimTemplate.size | Quantity | 10Gi | PVC size per replica |
s3.bucket | string | S3 bucket name (required when backend: s3) | |
s3.prefix | string | Key prefix within the bucket | |
s3.region | string | AWS region | |
s3.endpoint | string | Custom S3-compatible endpoint URL | |
s3.credentialsSecretName | string | Secret with access_key_id and secret_access_key keys |
flush¶
| Field | Type | Default | Description |
|---|---|---|---|
intervalSecs | int32 | 10 | How often the WAL is flushed to Parquet |
walSizeThresholdMb | int32 | 64 | WAL size threshold that triggers an early flush |
timeBucketDuration | string | 1h | Time bucket granularity for Parquet partitioning |
compaction¶
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable background compaction |
intervalSecs | int32 | 300 | Compaction check interval |
minFilesToCompact | int32 | 4 | Minimum Parquet files before compaction triggers |
targetFileSizeMb | int32 | 256 | Target output file size |
chdb¶
| Field | Type | Default | Description |
|---|---|---|---|
poolSize | int32 | 4 | Number of embedded ClickHouse query engine instances |
auth¶
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable authentication for the HTTP API |
credentialsSecretName | string | Secret containing auth credentials |
logging¶
| Field | Type | Default | Description |
|---|---|---|---|
level | string | info | Log level: trace, debug, info, warn, error |
format | string | text | Log format: text or json |
cluster¶
Tuning parameters for multi-replica cluster behavior. These only take effect when replicas > 1.
| Field | Type | Default | Description |
|---|---|---|---|
heartbeatIntervalSecs | int32 | 2 | Interval between peer heartbeats |
heartbeatMissThreshold | int32 | 5 | Missed heartbeats before marking a peer unhealthy |
antiEntropyIntervalSecs | int32 | 60 | Interval for Merkle tree data verification |
antiEntropyEnabled | bool | true | Enable periodic anti-entropy checks |
syncMaxConcurrentFiles | int32 | 4 | Max concurrent file transfers during sync |
replicationMaxRetries | int32 | 5 | Max retries for write replication |
raftHeartbeatIntervalMs | int32 | 300 | Raft leader heartbeat interval |
raftElectionTimeoutMs | int32 | 1000 | Raft election timeout |
raftSnapshotThreshold | int32 | 1000 | Log entries before Raft snapshot |
tls | object | TLS for inter-node replication traffic (same schema as server.tls) |
monitoring¶
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Expose Prometheus metrics |
serviceMonitor | bool | true | Create a Prometheus ServiceMonitor resource |
grafanaDashboard | bool | false | Create a Grafana dashboard ConfigMap with the grafana_dashboard label |
autoscaling¶
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | Enable HorizontalPodAutoscaler | |
minReplicas | int32 | Minimum replica count (min: 1) | |
maxReplicas | int32 | Maximum replica count (required, min: 1) | |
targetCPUUtilizationPercentage | int32 | 80 | Target average CPU utilization |
failover¶
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable automatic failover |
maxFailoverCount | int32 | 1 | Maximum simultaneous failovers (min: 1) |
failoverTimeoutSecs | int32 | 300 | Seconds a member must be unhealthy before failover (min: 60) |
Status¶
The operator maintains a .status subresource with the following fields:
| Field | Type | Description |
|---|---|---|
phase | string | Current lifecycle phase: Pending, Initializing, Running, Scaling, Upgrading, Failed |
replicas | int32 | Desired replica count |
readyReplicas | int32 | Number of replicas passing readiness checks |
clusterState | string | High-level health: Healthy, Degraded, Recovering, Unknown |
replicationState | string | Replication convergence: Healthy, Lagging, Diverged, Unknown |
members | list | Per-member status (name, nodeID, podName, state, health, WAL sequence, parquet file count, peer count) |
failoverCount | int32 | Number of failovers in the current generation |
configHash | string | Hash of the current config.toml (used for rolling update detection) |
conditions | list | Standard Kubernetes conditions |
Check cluster status:
Inspect detailed status:
Operations¶
Rolling Upgrade¶
Change the version field to trigger a rolling upgrade:
kubectl patch hyperbytedbcluster hyperbytedb-cluster \
--type merge -p '{"spec":{"version":"1.1.0"}}'
The operator upgrades one pod at a time, waiting for readiness before proceeding. The cluster phase transitions to Upgrading during the process.
Scaling¶
Or patch the spec directly:
Pause Reconciliation¶
Resume with "paused":false.
See Also¶
- Installation -- Installing the operator
- Backup & Restore -- Automated backups and restores
- Administration -- HyperbyteDB operational guide (monitoring, compaction tuning, debug CLI)