Benchmarks¶

HyperbyteDB ships four default Criterion benchmark suites, plus one for the proxy:

Line protocol ingestion — parse → metadata → WAL
Columnar MessagePack ingestion — decode → metadata → WAL (enabled by default via columnar-ingest)
Fixed-dataset queries — TimeseriesQL parse and execute against seeded chDB tables
Flush service — WAL → chDB flush, incremental tick, and drain
Proxy routing (hyperbytedb-proxy) — backend pick and drain-response detection

Run all of them:

cargo bench
# or
./scripts/bench-all.sh

Throughput is reported as logical points per second for ingestion batches of 1000 rows (Throughput::Elements(1000)), unless noted.

For HTTP load testing against a live server, use scripts/load.sh with WRITE_FORMAT=line or msgpack (k6 soak; set RUN_BENCHES=1 to also run all Criterion suites).

What exists¶

Artifact	Command	Scope
`benches/ingestion_line_protocol.rs`	`cargo bench --bench ingestion_line_protocol`	Line protocol ingest path
`benches/ingestion_columnar.rs`	`cargo bench --bench ingestion_columnar`	Columnar msgpack ingest path
`benches/query_fixed_dataset.rs`	`cargo bench --bench query_fixed_dataset`	TimeseriesQL queries on fixed datasets
`benches/flush_service.rs`	`cargo bench --bench flush_service`	WAL → chDB flush path
`hyperbytedb-proxy/benches/routing.rs`	`cargo bench -p hyperbytedb-proxy --bench routing`	Backend routing hot path
`scripts/load.sh`	k6 HTTP soak	End-to-end write + optional all Criterion benches
`scripts/bench-all.sh`	`cargo bench` wrapper	All Criterion suites (hyperbytedb + proxy)

columnar-ingest is a default feature (Cargo.toml), so columnar benches build without extra flags.

Prerequisites¶

Release profile: Criterion uses [profile.bench] (inherits release with debug symbols).
libchdb: required for query and flush benches. Ensure /usr/local/lib is on the loader path (sudo ldconfig after install, or LD_LIBRARY_PATH=/usr/local/lib).
Ingestion batch size: const BATCH: u64 = 1000 in both ingest bench files.
Dataset profiles: BENCH_DATASET=small|medium|large (default small) for query and flush benches. Seeding runs once per process before timed iterations (flush full/drain re-seed WAL per sample via iter_batched).

Profile	Points	Hosts	Measurements
`small` (default)	10k	10	`cpu`
`medium`	1M	100	`cpu`, `mem`, `disk`
`large`	10M	1000	`cpu`

BENCH_DATASET=medium cargo bench --bench query_fixed_dataset

Line protocol (`ingestion_line_protocol`)¶

cargo bench --bench ingestion_line_protocol

Sequential groups¶

Group	Function	What it measures
`line_protocol_parse`	`parse_1000`	`parse_line_body_to_points` only
`line_protocol_metadata`	`parse_plus_metadata_1000`	Parse + `prepare_batch_metadata`
`line_protocol_wal`	`metadata_plus_wal_append_1000`	Parse + metadata + WAL append

Concurrent groups¶

Group	What it measures
`line_protocol_parse_concurrent`	Parallel parse (`c1`, `c4`, `c16`, `c<cpus>`)
`line_protocol_metadata_concurrent`	Parallel parse + metadata
`line_protocol_wal_concurrent`	Parallel full WAL path
`line_protocol_wal_batched_concurrent`	Parallel path through `BatchingWal`

Synthetic lines: bench,host=bench v={i} {ts} with millisecond timestamps.

Excludes: HTTP, flush, chDB, replication.

Columnar MessagePack (`ingestion_columnar`)¶

cargo bench --bench ingestion_columnar

Point-expansion path¶

Group	Function	What it measures
`columnar_decode`	`parse_1000`	msgpack → `Vec<Point>`
`columnar_metadata`	`prepare_batch_metadata_1000`	Points + metadata
`columnar_wal`	`metadata_plus_wal_append_1000`	Full WAL path

Fast path¶

Group	Function	What it measures
`columnar_decode_fast`	`decode_only_1000`	`decode_columnar_batch`
`columnar_decode_fast`	`decode_to_points_1000`	Decode + `columnar_batch_to_points`
`columnar_decode_fast`	`decode_to_record_batch_1000`	Decode + Arrow `RecordBatch`
`columnar_metadata_fast`	`prepare_columnar_metadata_1000`	Structured batch + metadata
`columnar_wal_fast`	`fast_metadata_plus_wal_append_1000`	Fast path + WAL

Concurrent groups¶

Same pattern as line protocol: columnar_decode_concurrent, columnar_metadata_concurrent, columnar_wal_concurrent, columnar_wal_batched_concurrent.

See Columnar MessagePack write format below.

Fixed-dataset queries (`query_fixed_dataset`)¶

Seeds data via line protocol ingest + flush, then benchmarks TimeseriesQL execution through QueryServiceImpl (parse → translate → chDB → JSON response).

cargo bench --bench query_fixed_dataset
BENCH_DATASET=medium cargo bench --bench query_fixed_dataset

Groups (suffix is dataset profile, e.g. `_small`)¶

Group	Query	What it measures
`query_parse_{profile}`	`SELECT mean(idle) …`	`timeseriesql::parse` only
`query_metadata_show_measurements_{profile}`	`SHOW MEASUREMENTS`	Metadata-only query
`query_metadata_show_tag_keys_{profile}`	`SHOW TAG KEYS FROM cpu`	Metadata + catalog
`query_point_limit10_{profile}`	`SELECT * FROM cpu LIMIT 10`	Small scan
`query_point_limit1000_{profile}`	`SELECT * FROM cpu LIMIT 1000`	Larger scan
`query_aggregate_mean_{profile}`	`SELECT mean(idle) FROM cpu`	Full-table aggregate
`query_aggregate_group_by_time_{profile}`	`… GROUP BY time(1h)`	Time bucket aggregate
`query_aggregate_group_by_tag_{profile}`	`… GROUP BY host`	Tag aggregate
`query_filtered_host_{profile}`	`WHERE host = 'host1' LIMIT 100`	Tag filter
`query_time_range_{profile}`	`WHERE time >= … AND time < …`	Time filter

Query strings align with scripts/query.js categories (metadata, point, aggregate, filtered). Concurrent query fan-out is covered by the k6 script against a live server.

Excludes: HTTP layer, auth, replication.

Flush service (`flush_service`)¶

Seeds WAL entries via line protocol ingest (no flush), then benchmarks FlushServiceImpl (WAL read → prepare → chDB sink → truncate).

cargo bench --bench flush_service
BENCH_DATASET=medium cargo bench --bench flush_service

Groups (suffix is dataset profile, e.g. `_small`)¶

Group	Function	What it measures
`flush_full_{profile}`	`flush_all_{profile}`	Full WAL → chDB flush for entire dataset
`flush_incremental_{profile}`	`flush_1000_{profile}`	After baseline flush, ingest 1000 points then flush (models periodic tick)
`flush_drain_{profile}`	`drain_{profile}`	`drain()` until WAL empty

Throughput: flush_full_* and flush_drain_* report points/sec for the full dataset; flush_incremental_* reports 1000 points/sec.

flush_drain_large: skipped by default for the large profile (10M points). Set BENCH_FLUSH_DRAIN_LARGE=1 to enable.

Excludes: HTTP, replication, cluster truncate barrier logic.

Proxy routing (`hyperbytedb-proxy`)¶

cargo bench -p hyperbytedb-proxy --bench routing

Groups¶

Group	Functions	What it measures
`proxy_pick_active`	`pick_{n}_backends` (n = 1, 4, 8, 16, 32)	Round-robin pick over `Active` backends
`proxy_pick_active_excluding`	`pick_excluding_{n}_backends`	Pick excluding one backend (retry path)
`proxy_pick_active_concurrent`	`pick_8_backends_c{1,4,16,<cpus>}`	Parallel `pick_active` under contention
`proxy_looks_like_drain`	`drain_json_pass`, `drain_json_fail`, `success_body`, `binary_body`	Drain JSON envelope detection

Excludes: HTTP forwarding, DNS discovery, health probes.

Columnar MessagePack write format (v1)¶

Optional ingest encoding, enabled by default (columnar-ingest feature).

HTTP¶

Method / path: POST /write
Query: db (required), precision optional
Content-Type: application/vnd.hyperbytedb.columnar-msgpack.v1

MessagePack body¶

Key	Type	Required	Description
`measurement`	string	yes	Shared measurement
`tags`	map	no	Constant tags per row
`field`	string	yes	Single float field name
`values`	float64[]	yes	One sample per row
`timestamps`	int64[]	no	Parallel to `values`

Reporting environment¶

When publishing numbers, record:

Git: git rev-parse HEAD
Rust: rustc -V
CPU / RAM / disk type
BENCH_DATASET for query and flush benches

Expectations and limits¶

Ingestion benches isolate parse → metadata → WAL on temp RocksDB.
Query and flush benches include chDB execution but not HTTP or cluster replication.
Proxy routing benches use synthetic backends (no live hyperbytedb required).
Criterion HTML reports: target/criterion/
BENCH_DATASET=large seeding can take several minutes (query and flush suites).

Benchmarks¶

What exists¶

Prerequisites¶

Line protocol (ingestion_line_protocol)¶

Sequential groups¶

Concurrent groups¶

Columnar MessagePack (ingestion_columnar)¶

Point-expansion path¶

Fast path¶

Concurrent groups¶

Fixed-dataset queries (query_fixed_dataset)¶

Groups (suffix is dataset profile, e.g. _small)¶

Flush service (flush_service)¶

Groups (suffix is dataset profile, e.g. _small)¶

Proxy routing (hyperbytedb-proxy)¶

Groups¶

Columnar MessagePack write format (v1)¶

HTTP¶

MessagePack body¶

Reporting environment¶

Expectations and limits¶

See also¶

Line protocol (`ingestion_line_protocol`)¶

Columnar MessagePack (`ingestion_columnar`)¶

Fixed-dataset queries (`query_fixed_dataset`)¶

Groups (suffix is dataset profile, e.g. `_small`)¶

Flush service (`flush_service`)¶

Groups (suffix is dataset profile, e.g. `_small`)¶

Proxy routing (`hyperbytedb-proxy`)¶