Troubleshooting¶
Common problems and their solutions when running HyperbyteDB.
Startup Failures¶
libchdb.so: cannot open shared object file¶
Cause: libchdb is not installed or not in the library search path.
Fix:
Verify: ls /usr/local/lib/libchdb.so
std::bad_function_call crash on startup¶
Symptom: HyperbyteDB aborts immediately with:
terminate called after throwing an instance of 'std::__1::bad_function_call'
what(): std::bad_function_call
Aborted
Cause: Incompatible system-installed libchdb.so. The crash happens during dynamic library loading, before any Rust code runs.
Fix (Option A — Recommended): Use the chdb-rust bundled library:
# Temporarily move the system libchdb
sudo mv /usr/local/lib/libchdb.so /usr/local/lib/libchdb.so.bak
sudo mv /usr/local/include/chdb.h /usr/local/include/chdb.h.bak
# Rebuild so chdb-rust downloads its own
cargo clean -p chdb-rust && cargo build --release
# Run with the bundled library
LIBCHDB_DIR=$(find target -name "libchdb.so" -path "*/build/chdb-rust-*/out/*" | head -1 | xargs dirname)
LD_LIBRARY_PATH="$LIBCHDB_DIR:$LD_LIBRARY_PATH" ./target/release/hyperbytedb serve
# Optionally restore for other tools
sudo mv /usr/local/lib/libchdb.so.bak /usr/local/lib/libchdb.so
Fix (Option B): Reinstall libchdb from the latest release:
failed to open WAL¶
Cause: Corrupted WAL directory (e.g., unclean shutdown, disk error).
Fix: Restore from a backup, or delete the wal_dir to start fresh (data in the WAL that hasn't been flushed will be lost).
address already in use¶
Cause: Another process is listening on the same port.
Fix: Change the port in config, or stop the conflicting process:
tls_cert_path ... not found¶
Cause: TLS is enabled but certificate files are missing.
Fix: Check the paths in your config, or disable TLS:
Writes Succeed but Queries Return Empty¶
Data must be flushed from the WAL to Parquet files before it becomes queryable.
Checklist:
-
Wait for flush. The default flush interval is 10 seconds. Wait at least that long after writing.
-
Check logs for flush errors:
-
Verify Parquet files exist:
-
Verify the database exists:
-
Check the measurement exists:
Query Timeouts¶
Symptom: Queries return HTTP 408 or take very long.
Fixes:
-
Increase the timeout:
-
Add a time range to your query. Queries without
WHERE time > ...scan all data. -
Increase chDB sessions for concurrent queries:
-
Run compaction to reduce file count:
Cardinality Limit Errors¶
Symptom: Writes return HTTP 422 with cardinality limit exceeded.
Possible causes: - High-cardinality values (UUIDs, timestamps, request IDs) used as tag values. These should be fields instead. - Legitimate growth beyond the configured limits.
Fixes:
-
Investigate the data model. Tags are indexed; use fields for high-cardinality values.
-
Increase limits if the growth is expected:
Field Type Conflict¶
Symptom: Writes return HTTP 400 with field type conflict.
Cause: A field was previously registered with one type (e.g., float) and a new write sends a different type (e.g., integer) for the same field name.
Fix: Ensure all writes use the same type for each field. If the original type was wrong, you need to drop the measurement and recreate it:
Cluster Replication Issues¶
Writes not appearing on peer nodes¶
-
Check connectivity: Ensure all nodes can reach each other on their
cluster_addrand port. -
Check logs for replication errors:
-
Verify peers configuration: The
peerslist should NOT include the node's owncluster_addr. -
Check node states:
Nodes must be inActivestate to accept replicated writes.
Persistent data gaps between nodes¶
Symptoms: One node shows fewer series or buckets than others for the same time range; hyperbytedb-debug or manifest inspection shows missing files for a peer’s origin_node_id; metrics may show hyperbytedb_self_repair_hash_mismatch_total or hyperbytedb_self_repair_errors_total increasing.
Checklist:
-
Check peer reachability. Self-repair only contacts active members; fix connectivity or heartbeat issues first.
-
Check the age gate. Repair only considers buckets older than
compaction.verified_compaction_age_secs(default 1 hour). Very recent data still depends on replication/WAL. -
Check repair caps.
max_repair_checks_per_cyclelimits work per tick. Wide skew may take multiple compaction intervals to heal. -
Ensure matching bucket duration.
[compaction].bucket_durationmust be the same on every node. Clients sendbucket_duration_nanosto/internal/bucket-hash; server defaults must match local compaction config.
Reference: Deep dive: self-repair.
Split-brain detection¶
Use the debug CLI to compare membership views:
If nodes have different membership views, check network partitions and ensure all peers can communicate.
High Memory Usage¶
-
Reduce flush batch size:
-
Reduce compaction concurrency:
See Also¶
- Configuration — All tuning parameters
- Administration — Monitoring and operational procedures