Skip to content

Architecture

HyperbyteDB is a time-series database that combines four storage/compute engines into one binary. This document describes the high-level design, component relationships, and data flow.


System Overview

graph TD
    Client["Client (Telegraf, Grafana, curl)"]
    HTTP["HTTP Layer (Axum)<br>/write /query /ping /health /metrics"]
    AppSvc["Application Services<br>Ingestion | Query | Flush | Compaction<br>Retention | CQ | Replication"]
    Ports["Port Traits<br>WalPort | StoragePort | QueryPort<br>MetadataPort | IngestionPort | AuthPort"]
    RocksDB["RocksDB<br>(WAL + Metadata)"]
    Parquet["Parquet<br>(Arrow)"]
    ChDB["chDB<br>(ClickHouse)"]
    Storage["Local FS / S3"]

    Client --> HTTP
    HTTP --> AppSvc
    AppSvc --> Ports
    Ports --> RocksDB
    Ports --> Parquet
    Ports --> ChDB
    Ports --> Storage

RocksDB provides the WAL (durable, ordered write log) and metadata store (databases, measurements, schemas, users, tombstones, CQ definitions, Parquet file registry).

Apache Parquet is the long-term storage format. Points are batched, converted to Arrow RecordBatches, and written as ZSTD-compressed Parquet files.

chDB (embedded ClickHouse) is the query engine. InfluxQL is transpiled to ClickHouse SQL that uses the file() table function to scan Parquet files directly.

object_store abstracts local filesystem and S3-compatible storage for Parquet files.


Hexagonal Architecture (Ports and Adapters)

HyperbyteDB uses the hexagonal pattern. Business logic depends only on port traits, never on concrete implementations.

graph TB
    subgraph Domain["Domain Layer"]
        Point["Point, FieldValue"]
        DB["Database, RetentionPolicy"]
        QR["QueryResponse, SeriesResult"]
    end

    subgraph App["Application Services"]
        Ingest["IngestionService"]
        Query["QueryService"]
        Flush["FlushService"]
        Compact["CompactionService"]
    end

    subgraph PortLayer["Port Traits"]
        WalPort["WalPort"]
        StoragePort["StoragePort"]
        QueryPort["QueryPort"]
        MetaPort["MetadataPort"]
    end

    subgraph InboundAdapters["Inbound Adapters"]
        HTTPHandlers["HTTP Handlers"]
        PeerHandlers["Peer Handlers"]
    end

    subgraph OutboundAdapters["Outbound Adapters"]
        RocksWAL["RocksDB WAL"]
        RocksMeta["RocksDB Metadata"]
        ParquetWriter["Parquet Writer"]
        LocalFS["Local Storage"]
        S3["S3 Storage"]
        ChDBAdapter["chDB Adapter"]
    end

    InboundAdapters --> App
    App --> PortLayer
    PortLayer --> OutboundAdapters
    App --> Domain

This means: - Swapping RocksDB for another WAL requires only implementing WalPort. - Swapping chDB for another SQL engine requires only implementing QueryPort. - The HTTP layer can be replaced without touching business logic.


Data Flow

Write Path

sequenceDiagram
    participant C as Client
    participant H as HTTP Handler
    participant I as IngestionService
    participant M as Metadata
    participant W as WAL (RocksDB)
    participant F as FlushService
    participant P as Parquet Writer
    participant S as Storage

    C->>H: POST /write?db=mydb
    H->>I: ingest(db, rp, precision, body)
    I->>I: Parse line protocol → Vec<Point>
    I->>M: Register field types, tag keys
    I->>M: Check cardinality limits
    I->>W: append(WalEntry)
    W-->>I: sequence number
    I-->>H: Ok
    H-->>C: 204 No Content

    Note over F: Background (every 10s)
    F->>W: read_range(cursor, 5000)
    F->>F: Group by (db, rp, measurement)
    F->>F: Partition by hour
    F->>P: points_to_parquet (spawn_blocking)
    P-->>F: Parquet bytes
    F->>S: write_parquet(path, bytes)
    F->>M: register_parquet_file(...)
    F->>W: truncate_before(seq)

Read Path

sequenceDiagram
    participant C as Client
    participant H as HTTP Handler
    participant Q as QueryService
    participant P as Parser
    participant T as Translator
    participant M as Metadata
    participant Ch as chDB
    participant S as Storage

    C->>H: GET /query?db=mydb&q=SELECT...
    H->>Q: execute_query(db, q, epoch)
    Q->>P: parse(influxql_string)
    P-->>Q: Vec<Statement>
    Q->>M: get_parquet_files(db, rp, meas, time_range)
    M-->>Q: file paths / glob
    Q->>M: list_tombstones(db, meas)
    Q->>T: translate(AST, glob, tombstones)
    T-->>Q: ClickHouse SQL
    Q->>Ch: execute_sql(sql) [spawn_blocking]
    Ch->>S: file() reads Parquet
    Ch-->>Q: JSONEachRow results
    Q->>Q: Parse → SeriesResult[]
    Q-->>H: QueryResponse
    H-->>C: JSON response

Component Summary

Component Location Purpose
CLI / Main src/main.rs Entry point, clap CLI, server lifecycle, graceful shutdown
Bootstrap src/bootstrap.rs Composition root: wires all adapters and services
Config src/config.rs Figment-based config loading (TOML + env vars)
Domain src/domain/ Pure types: Point, Database, SeriesKey, WalEntry, etc.
Ports src/ports/ Trait definitions: WalPort, StoragePort, QueryPort, etc.
Application src/application/ Business logic: ingestion, query, flush, compaction, etc.
HTTP Adapters src/adapters/http/ Axum handlers, router, middleware, auth
Storage Adapters src/adapters/storage/ Parquet writer, LocalStorage, S3Storage
WAL Adapter src/adapters/wal/ RocksDB WAL, optional batching wrapper
Metadata Adapter src/adapters/metadata/ RocksDB metadata store
chDB Adapter src/adapters/chdb/ Query adapter, session pool
Auth Adapter src/adapters/auth.rs Argon2 password verification
InfluxQL src/influxql/ Parser, AST, ClickHouse translator, digest
Cluster src/cluster/ Membership, replication, sync, Raft, drain, Merkle
Debug CLI src/bin/hyperbytedb_debug.rs Cluster inspection binary
Backfill src/bin/hyperbytedb_backfill.rs InfluxDB 1.x migration tool

Key Design Patterns

Pattern Where Used
Ports and adapters All business logic depends on port traits, not concrete implementations
Composition root bootstrap::build_services wires everything together
Decorator / wrapper BatchingWal wraps RocksDbWal for group commit
Strategy IngestionServiceImpl vs PeerIngestionService; LocalStorage vs S3Storage
Facade QueryServiceImpl over parser + transpiler + metadata + chDB
Worker pool + channel ReplicationApplyQueue, PeerClient outbound loop
Cache Ingest schema cache, metadata Parquet cache (generation-gated), Merkle cache, auth verification cache
Consensus OpenRaft for schema/coordination only; HTTP transport

See Also