02 · ENGINE

Storage & WAL

XERJ writes one WAL per index and a list of immutable segments. Each segment is three files, not a segment directory with twelve — a data file, a skip index, and a doc-id sidecar. All three are mmap'd, so reads come straight out of the OS page cache with no application-side buffer.

data/
├── logs/                      · an index
│   ├── schema.json            · field mapping
│   ├── wal/
│   │   ├── wal-000001         · append-only
│   │   └── wal-000002         · rolls at wal_max_size_mb
│   ├── seg-000001/
│   │   ├── segment.seg        · columnar data, mmap'd
│   │   ├── segment.sidx       · skip index for seeks
│   │   └── segment.ids        · doc-id sidecar (external id ↔ internal ordinal)
│   └── seg-000002/
│       ├── segment.seg
│       ├── segment.sidx
│       └── segment.ids
├── traces/
│   └── ...
└── cluster/                   · Raft metadata, only present in clustered mode
    ├── raft-log-*
    └── snapshots/

WAL

Append-only per index. Generation-rotated at wal_max_size_mb (default 512 MiB). Retained until the flush checkpoint passes the tail generation, then the old file is released. Fsync policy is controlled by [storage] wal_sync:

sync — fsync after every write. Maximum durability, lowest throughput. Financial or compliance workloads.
batched — fsync every wal_batch_ms milliseconds (default 100 ms). Recommended default — the durability window is small, throughput stays high.
async — never fsync; trust the OS. Maximum throughput. Loses up to ~1 second on crash. Dev/bench only.

Segments

Three files per segment:

segment.seg — columnar data. Each field is a separate column run; queries only touch the columns they actually read. Integer columns use delta-of-delta or FOR; string columns use dictionary + FST. The whole file is mmap'd read-only at open time.
segment.sidx — skip index. A sparse per-column seek table so point-lookups don't scan the whole column. Mmap'd alongside the segment.
segment.ids — doc-id sidecar mapping external document ids to internal ordinals. Kept outside the main file so id lookups don't pull columnar pages into memory. Mmap'd read-only.

Segments are immutable once written. Updates and deletes work by writing a new segment and a tombstone; merges rewrite surviving documents into a larger segment.

Merges

[merge] strategy picks between size_tiered (default) and log_structured (LSMT-style levelled). min_segments sets the trigger (default 10). io_rate_mb_per_sec throttles the merger to leave headroom for queries (default 100 MiB/s; set 0 to disable). max_concurrent caps parallel merge workers (default 1 — bump to 2–4 on fast NVMe).

Cluster metadata

In single-node mode there is no cluster/ directory — everything the engine needs is right next to the index data. When the server is started with a cluster config, a sibling cluster/ directory holds the embedded Raft log and snapshots. Index data is never in Raft — only the metadata (index schemas, shard assignments, node roster). See Clustering.

Source · engine/crates/storage/src/segment.rs · engine/crates/storage/src/lib.rs

◀ PREVIngest pipelines

NEXT ▶Compression