Skip to content

Troubleshooting

Common problems and their solutions when running HyperbyteDB.


Startup Failures

libchdb.so: cannot open shared object file

Cause: libchdb is not installed or not in the library search path.

Fix:

curl -sL https://lib.chdb.io | bash
sudo ldconfig

Verify: ls /usr/local/lib/libchdb.so

std::bad_function_call crash on startup

Symptom: HyperbyteDB aborts immediately with:

terminate called after throwing an instance of 'std::__1::bad_function_call'
  what():  std::bad_function_call
Aborted

Cause: Incompatible system-installed libchdb.so. The crash happens during dynamic library loading, before any Rust code runs.

Fix (Option A — Recommended): Use the chdb-rust bundled library:

# Temporarily move the system libchdb
sudo mv /usr/local/lib/libchdb.so /usr/local/lib/libchdb.so.bak
sudo mv /usr/local/include/chdb.h /usr/local/include/chdb.h.bak

# Rebuild so chdb-rust downloads its own
cargo clean -p chdb-rust && cargo build --release

# Run with the bundled library
LIBCHDB_DIR=$(find target -name "libchdb.so" -path "*/build/chdb-rust-*/out/*" | head -1 | xargs dirname)
LD_LIBRARY_PATH="$LIBCHDB_DIR:$LD_LIBRARY_PATH" ./target/release/hyperbytedb serve

# Optionally restore for other tools
sudo mv /usr/local/lib/libchdb.so.bak /usr/local/lib/libchdb.so

Fix (Option B): Reinstall libchdb from the latest release:

curl -sL https://lib.chdb.io | bash

failed to open WAL

Cause: Corrupted WAL directory (e.g., unclean shutdown, disk error).

Fix: Restore from a backup, or delete the wal_dir to start fresh (data in the WAL that hasn't been flushed will be lost).

address already in use

Cause: Another process is listening on the same port.

Fix: Change the port in config, or stop the conflicting process:

lsof -i :8086

tls_cert_path ... not found

Cause: TLS is enabled but certificate files are missing.

Fix: Check the paths in your config, or disable TLS:

[server]
tls_enabled = false


Writes Succeed but Queries Return Empty

Data must be flushed from the WAL to Parquet files before it becomes queryable.

Checklist:

  1. Wait for flush. The default flush interval is 10 seconds. Wait at least that long after writing.

  2. Check logs for flush errors:

    # Look for flush-related messages
    journalctl -u hyperbytedb | grep -i flush
    

  3. Verify Parquet files exist:

    ls -R /var/lib/hyperbytedb/data/mydb/
    

  4. Verify the database exists:

    curl -sS -G 'http://localhost:8086/query' \
      --data-urlencode 'q=SHOW DATABASES'
    

  5. Check the measurement exists:

    curl -sS -G 'http://localhost:8086/query' \
      --data-urlencode 'db=mydb' \
      --data-urlencode 'q=SHOW MEASUREMENTS'
    


Query Timeouts

Symptom: Queries return HTTP 408 or take very long.

Fixes:

  1. Increase the timeout:

    [server]
    query_timeout_secs = 120
    

  2. Add a time range to your query. Queries without WHERE time > ... scan all data.

  3. Increase chDB sessions for concurrent queries:

    [chdb]
    pool_size = 4
    

  4. Run compaction to reduce file count:

    curl -sS -XPOST 'http://localhost:8086/internal/compact'
    


Cardinality Limit Errors

Symptom: Writes return HTTP 422 with cardinality limit exceeded.

Possible causes: - High-cardinality values (UUIDs, timestamps, request IDs) used as tag values. These should be fields instead. - Legitimate growth beyond the configured limits.

Fixes:

  1. Investigate the data model. Tags are indexed; use fields for high-cardinality values.

  2. Increase limits if the growth is expected:

    [cardinality]
    max_tag_values_per_measurement = 500000
    max_measurements_per_database = 50000
    


Field Type Conflict

Symptom: Writes return HTTP 400 with field type conflict.

Cause: A field was previously registered with one type (e.g., float) and a new write sends a different type (e.g., integer) for the same field name.

Fix: Ensure all writes use the same type for each field. If the original type was wrong, you need to drop the measurement and recreate it:

DROP MEASUREMENT "problematic_measurement"


Cluster Replication Issues

Writes not appearing on peer nodes

  1. Check connectivity: Ensure all nodes can reach each other on their cluster_addr and port.

  2. Check logs for replication errors:

    journalctl -u hyperbytedb | grep -i "replication failed"
    

  3. Verify peers configuration: The peers list should NOT include the node's own cluster_addr.

  4. Check node states:

    hyperbytedb-debug --nodes "node1:8086,node2:8086" status
    
    Nodes must be in Active state to accept replicated writes.

Persistent data gaps between nodes

Symptoms: One node shows fewer series or buckets than others for the same time range; hyperbytedb-debug or manifest inspection shows missing files for a peer’s origin_node_id; metrics may show hyperbytedb_self_repair_hash_mismatch_total or hyperbytedb_self_repair_errors_total increasing.

Checklist:

  1. Check peer reachability. Self-repair only contacts active members; fix connectivity or heartbeat issues first.

  2. Check the age gate. Repair only considers buckets older than compaction.verified_compaction_age_secs (default 1 hour). Very recent data still depends on replication/WAL.

  3. Check repair caps. max_repair_checks_per_cycle limits work per tick. Wide skew may take multiple compaction intervals to heal.

  4. Ensure matching bucket duration. [compaction].bucket_duration must be the same on every node. Clients send bucket_duration_nanos to /internal/bucket-hash; server defaults must match local compaction config.

Reference: Deep dive: self-repair.

Split-brain detection

Use the debug CLI to compare membership views:

hyperbytedb-debug --nodes "node1:8086,node2:8086,node3:8086" diff

If nodes have different membership views, check network partitions and ensure all peers can communicate.


High Memory Usage

  1. Reduce flush batch size:

    [flush]
    max_points_per_batch = 50000
    

  2. Reduce compaction concurrency:

    [compaction]
    compact_all_max_inflight = 4
    target_file_size_mb = 128
    


See Also