Glossary¶
Shared terminology used across HyperbyteDB documentation.
| Term | Definition |
|---|---|
| Anti-entropy | Background process that detects data divergence between cluster nodes using Merkle trees and triggers delta sync to repair differences. |
| Arrow | Apache Arrow — in-memory columnar data format used internally for building RecordBatches before writing Parquet files. |
| Batching WAL | Optional decorator over the RocksDB WAL that groups multiple append operations into a single write batch (group commit) for higher throughput. |
| Bucket | A time partition (typically 1 hour or 1 day) used to group Parquet files for compaction and hash verification. |
| Cardinality | The number of unique values for a tag key or the number of unique measurements. High cardinality can degrade performance. |
| chDB | Embedded ClickHouse — the query engine used by HyperbyteDB. Runs in-process via the chdb-rust crate. Executes SQL queries over Parquet files using the file() table function. |
| Column Family (CF) | A RocksDB concept. HyperbyteDB uses separate column families for WAL entries (wal), WAL metadata (wal_meta), general metadata (metadata), and replication state. |
| Compaction | Background process that merges multiple small Parquet files within the same time bucket into fewer, larger files. Improves query performance and reduces file count. |
| Composition Root | bootstrap.rs — the single location where all adapters and services are wired together using dependency injection via Arc<dyn Trait>. |
| Continuous Query (CQ) | A named query that runs automatically on a schedule, typically used for downsampling raw data into summary measurements. |
| Delta Sync | Transfer of only the divergent time buckets between nodes, identified by Merkle tree comparison. More efficient than full sync. |
| Drain | Graceful node removal procedure: stop accepting writes, flush WAL, wait for replication acks, verify data, notify peers, shut down. |
| Field | A key-value pair in a data point that holds the actual measured value. Fields are not indexed. Field types are enforced after first write (Float, Integer, UInteger, String, Boolean). |
| Figment | Configuration loading library used by HyperbyteDB. Merges defaults, TOML config file, and environment variables. |
| Flush | Background process that reads WAL entries, converts them to Arrow RecordBatches, and writes Parquet files. Runs every flush.interval_secs. |
| Hexagonal Architecture | Design pattern where business logic depends only on abstract port traits, with concrete adapters plugged in at the composition root. Also called ports and adapters. |
| Hinted Handoff | Mechanism that stores writes destined for unreachable peers in a local queue and replays them when the peer recovers. |
| InfluxQL | Query language compatible with InfluxDB 1.x. Supports SELECT with aggregates/transforms, SHOW commands, DDL, DELETE, and continuous queries. |
| Line Protocol | InfluxDB's text-based wire format for writing time-series data: measurement,tag=val field=val timestamp. |
| Master-Master Replication | Clustering model where every node independently accepts writes and replicates them to all peers asynchronously. |
| Measurement | Analogous to a table in a relational database. Contains a set of tag keys and field keys. |
| Merkle Tree | Hash tree structure built over time-bucketed Parquet file metadata. Used for efficient divergence detection between cluster nodes. |
| Metadata | Database definitions, measurement schemas (field types, tag keys), Parquet file registry, user accounts, tombstones, and CQ definitions. Stored in RocksDB. |
| OpenRaft | Rust implementation of the Raft consensus protocol. Used by HyperbyteDB for schema mutation ordering in cluster mode. |
| Orphan File | A Parquet file on disk with no corresponding entry in the metadata registry. Cleaned up automatically during compaction. |
| Parquet | Apache Parquet — columnar storage format. HyperbyteDB stores time-series data as ZSTD-compressed Parquet files with 65K row groups and page-level statistics. |
| Point | A single data observation: measurement name, tag set, field set, and timestamp (nanoseconds since Unix epoch). |
| Port | An abstract trait in the hexagonal architecture that defines a boundary between business logic and infrastructure (e.g., WalPort, StoragePort). |
| Precision | Timestamp unit for line protocol writes: ns (nanoseconds), us/u (microseconds), ms (milliseconds), s (seconds). |
| Raft | Consensus algorithm used for schema mutations in cluster mode. Ensures all nodes apply CREATE/DROP/DELETE operations in the same order. |
| RecordBatch | Arrow's in-memory columnar data container. Points are converted to RecordBatches during flush and compaction. |
| Replication Log | RocksDB-backed store tracking WAL and mutation acknowledgements from peers. Used for safe WAL truncation in cluster mode. |
| Retention Policy (RP) | Configuration that controls how long data is kept. Each database has one or more RPs. The default RP is autogen with infinite duration. |
| RocksDB | Embedded key-value store. Used for the WAL (durable write log), metadata store, replication log, and Raft state. |
| Self-Repair | Compaction-driven process that verifies Parquet data consistency between cluster nodes using bucket hashes and fetches corrected data from authoring peers. |
| Series | A unique combination of measurement name and tag set. Each series has its own time-ordered sequence of field values. |
| Statement Summary | Ring buffer tracking recently executed InfluxQL statements with query digest, latency, and error status. Exposed via GET /api/v1/statements. |
| Tag | A key-value pair in a data point used for indexing and grouping. Tags are always strings. Stored in metadata for SHOW TAG queries. |
| Tombstone | A metadata record created by DELETE statements. Marks data for exclusion at query time without physically removing Parquet rows. |
| Verified Compaction | Cluster-mode compaction pass that hashes cold data buckets against authoring peers and replaces divergent files. |
| WAL (Write-Ahead Log) | Durable, ordered log where incoming writes are persisted before the client receives a response. Data in the WAL is flushed to Parquet by the background flush service. |