Storage & WAL
XERJ writes one WAL per index and a list of immutable segments. Each segment is three files, not a segment directory with twelve — a data file, a skip index, and a doc-id sidecar. All three are mmap'd, so reads come straight out of the OS page cache with no application-side buffer.
data/
├── logs/ · an index
│ ├── schema.json · field mapping
│ ├── wal/
│ │ ├── wal-000001 · append-only
│ │ └── wal-000002 · rolls at wal_max_size_mb
│ ├── seg-000001/
│ │ ├── segment.seg · columnar data, mmap'd
│ │ ├── segment.sidx · skip index for seeks
│ │ └── segment.ids · doc-id sidecar (external id ↔ internal ordinal)
│ └── seg-000002/
│ ├── segment.seg
│ ├── segment.sidx
│ └── segment.ids
├── traces/
│ └── ...
└── cluster/ · Raft metadata, only present in clustered mode
├── raft-log-*
└── snapshots/
WAL
Append-only per index. Generation-rotated at wal_max_size_mb (default 512 MiB). Retained until the flush checkpoint passes the tail generation, then the old file is released. Fsync policy is controlled by [storage] wal_sync:
- sync — fsync after every write. Maximum durability, lowest throughput. Financial or compliance workloads.
- batched — fsync every
wal_batch_msmilliseconds (default 100 ms). Recommended default — the durability window is small, throughput stays high. - async — never fsync; trust the OS. Maximum throughput. Loses up to ~1 second on crash. Dev/bench only.
Segments
Three files per segment:
- segment.seg — columnar data. Each field is a separate column run; queries only touch the columns they actually read. Integer columns use delta-of-delta or FOR; string columns use dictionary + FST. The whole file is mmap'd read-only at open time.
- segment.sidx — skip index. A sparse per-column seek table so point-lookups don't scan the whole column. Mmap'd alongside the segment.
- segment.ids — doc-id sidecar mapping external document ids to internal ordinals. Kept outside the main file so id lookups don't pull columnar pages into memory. Mmap'd read-only.
Segments are immutable once written. Updates and deletes work by writing a new segment and a tombstone; merges rewrite surviving documents into a larger segment.
Merges
[merge] strategy picks between size_tiered (default) and log_structured (LSMT-style levelled). min_segments sets the trigger (default 10). io_rate_mb_per_sec throttles the merger to leave headroom for queries (default 100 MiB/s; set 0 to disable). max_concurrent caps parallel merge workers (default 1 — bump to 2–4 on fast NVMe).
Cluster metadata
In single-node mode there is no cluster/ directory — everything the engine needs is right next to the index data. When the server is started with a cluster config, a sibling cluster/ directory holds the embedded Raft log and snapshots. Index data is never in Raft — only the metadata (index schemas, shard assignments, node roster). See Clustering.
Source · engine/crates/storage/src/segment.rs · engine/crates/storage/src/lib.rs