04 · DATA MODEL

Vectors & HNSW

XERJ ships HNSW as the only vector index — it's the one that works at the scales we care about. Distance metrics, graph parameters, and quantization are all tunable from config or per-query.

HNSW graph parameters

KEY
TYPE
DEFAULT
DESCRIPTION
hnsw_m
u32
16
Bi-directional edges per layer. Higher = better recall, more RAM.
hnsw_ef_construction
u32
200
Beam width at index-build time. Higher = better graph, slower writes.
hnsw_ef_search
u32
100
Default beam width at query time. Override per-query via the KNN request.
default_metric
enum
"cosine"
"cosine" · "dot_product" · "euclidean".

Quantization

XERJ supports four quantization modes. scalar8 (SQ8) is the default and gives ~4× memory reduction with almost no recall loss on typical embedding spaces.

KNN query

{
  "knn": {
    "field":      "embedding",
    "query_vector": [0.12, 0.08, -0.31, ...],
    "k":          20,
    "num_candidates": 200,
    "ef_search":  180
  }
}

Hybrid — BM25 + KNN in one planner pass

{
  "hybrid": {
    "fusion": "rrf",
    "queries": [
      { "match": { "message": "kernel panic on reboot" } },
      { "knn":   { "field": "embedding", "query_vector": [...], "k": 50 } }
    ]
  }
}

Source · engine/crates/vector/src/hnsw.rs