01 · ENGINE

Ingest pipelines

XERJ runs its parsers in-process with the writer. One fsync, one network hop, no Logstash tier. Four ingest paths cover most real workloads.

turbo-ingest · NDJSON bulk

The fast path. Docs arrive as NDJSON, get parallel-tokenized across cores, and hit the memtable in batches. [indexing] turbo_batch_size controls the batch size, turbo_parallel enables concurrent tokenization, and turbo_fast_analyzer skips stemming for log-shaped documents.

$ curl -sX POST http://localhost:8080/v1/indices/logs/turbo-ingest \
    -H 'Content-Type: application/x-ndjson' \
    --data-binary @nginx.jsonl

logs · auto-detected timestamps

Sends raw log lines — Apache combined, Nginx access, ISO-8601 structured. The ingest path detects the format, normalizes the timestamp to µs, and stores the structured fields. No regex config.

$ curl -sX POST http://localhost:8080/v1/indices/logs/logs \
    -H 'Content-Type: text/plain' \
    --data-binary @access.log

syslog · RFC-5424

Accepts raw syslog frames. Parses the envelope, extracts structured fields, indexes the message.

$ curl -sX POST http://localhost:8080/v1/indices/syslog/syslog \
    -H 'Content-Type: text/plain' \
    --data-binary @messages

otlp · OpenTelemetry

OTLP HTTP protobuf, no collector in front. Accepts logs, metrics, and traces. Traces land as a connected graph rather than flat docs.

$ curl -sX POST http://localhost:8080/v1/indices/traces/otlp \
    -H 'Content-Type: application/x-protobuf' \
    --data-binary @spans.pb

Back-pressure

End-to-end. When the memtable approaches flush_size_mb, writers start to block rather than OOM. When the WAL is rolling faster than segments can be flushed, turbo-ingest returns 429 with a Retry-After header.

Source · engine/crates/logs/src/parse.rs · otlp/src/lib.rs