Benchmark methodology

Every number on the front page comes from a test anyone with the source tree can run. This page documents exactly what that test does, what it measures, what it deliberately does not measure, and how to reproduce it. The test ships with greatmemory itself: crates/gm-cli/tests/ram_regression.rs.

What is measured

Peak RSS (resident set size — the physical RAM a process actually occupies) of a single process running greatmemory's full ingestion path, while it ingests 51.2 MB of text. RSS is read with the memory-stats crate, which uses task_info on macOS and /proc on Linux — the same numbers Activity Monitor and ps report.

The corpus: deterministic by construction

The test generates 1,000 documents of ~50 KB each — 51.2 MB total — of English-like sentences (6–16 words) drawn from a fixed 115-word vocabulary. Randomness comes from a seeded xorshift64* generator with the constant seed 0x6772_6561_746D_656D ("greatmem" in ASCII), so every run on every machine ingests the byte-identical corpus. No download, no dataset versioning problem, no cherry-picking: the corpus is constructed inside the test from 30 lines of code you can read.

The system under test: the real pipeline

Nothing is mocked. The test wires up the same components the server uses, with the same settings:

Storage: the real SQLite store (sqlite-vec + FTS5), writing to a file in a temporary directory.
Chunking: the production chunker — ~400-token chunks with 15% overlap. The 1,000 documents become 31,000 chunks.
Embeddings: real BGE-small-en-v1.5 (384 dimensions) running natively via fastembed / ONNX Runtime — one model instance behind greatmemory's bounded batch queue (queue capacity 16, batch size 32), exactly as the server constructs it. This is the component where typical WASM-worker architectures accumulate multi-gigabyte heaps, which is why the test embeds for real instead of faking it.
Fact extraction: off. No LLM is configured, so the test isolates the embed-and-store write path. (Fact extraction is an optional feature that calls out to an external LLM process; its memory lives in that process, not this one.)

Sampling and assertions

RSS is sampled every 50 documents (20 samples across the run) and once more after the pipeline fully drains; the peak across all samples is kept. The test then asserts three things, and fails if any is false:

all 1,000 documents were stored;
chunking really happened (> 1,000 chunks in the store);
peak RSS < 1 GB — the bounded-memory promise itself.

The same test runs in CI on every push to main (Ubuntu, release build). If a future change makes memory grow with data, the build breaks. The bound is enforced, not promised.

The published run

The numbers on the front page come from a release-mode run on 2026-06-12: Apple M1 Pro, 16 GB RAM, macOS, total duration 2,620 seconds (~44 minutes — CPU embedding of 51.2 MB is slow; that is the realistic cost, not an artifact). Peak RSS was 588.1 MB. Note that RSS was already ~536 MB after the first 50 documents (2.6 MB ingested): that baseline is the ONNX model arena plus one batch of activations. From there to 51.2 MB ingested, memory moved within ±10% and never trended upward — that flatness, not the absolute number, is the design claim.

The idle figure

The 19 MB "idle server" number is measured differently: it is rss_bytes as reported by the server's own GET /v1/stats endpoint, right after booting the published 158 MB Docker image (before the embedding model is loaded — the model loads lazily on first ingest). It demonstrates the floor, not the working set.

Honest limitations

This is a memory benchmark, not a throughput benchmark. Embedding speed depends on your CPU; the memory bound does not.
RSS includes allocator slack and varies a few percent between platforms and runs; the CI bound (1 GB) is deliberately generous so the test is stable across runners while still catching any grows-with-data regression by a wide margin.
Absolute numbers differ with the embedding model you configure (larger model, larger arena). The published run uses the default model greatmemory ships with.
We publish no measurements of other tools. The comparison we make is architectural — native ONNX with a bounded queue versus pooled WASM heaps — and you are encouraged to measure anything you like yourself; the corpus generator makes apples-to-apples easy.

Reproduce it

# from the greatmemory source tree
cargo test -p gm-cli --release -- --ignored ram_regression --nocapture

Requirements: Rust (stable), ~150 MB of disk for the model download on first run, and patience — expect tens of minutes on CPU. The test prints a sample line every 50 documents and a summary at the end. Send us your summary line and machine specs if you'd like your run added to the results table.

Raw output of the published run

[ram_regression] docs=  50  ingested=  2.6 MB  rss= 535.9 MB  peak= 535.9 MB
[ram_regression] docs= 100  ingested=  5.1 MB  rss= 530.5 MB  peak= 535.9 MB
[ram_regression] docs= 150  ingested=  7.7 MB  rss= 529.3 MB  peak= 535.9 MB
[ram_regression] docs= 200  ingested= 10.2 MB  rss= 535.8 MB  peak= 535.9 MB
[ram_regression] docs= 250  ingested= 12.8 MB  rss= 512.6 MB  peak= 535.9 MB
[ram_regression] docs= 300  ingested= 15.4 MB  rss= 578.1 MB  peak= 578.1 MB
[ram_regression] docs= 350  ingested= 17.9 MB  rss= 545.5 MB  peak= 578.1 MB
[ram_regression] docs= 400  ingested= 20.5 MB  rss= 547.7 MB  peak= 578.1 MB
[ram_regression] docs= 450  ingested= 23.1 MB  rss= 520.9 MB  peak= 578.1 MB
[ram_regression] docs= 500  ingested= 25.6 MB  rss= 545.8 MB  peak= 578.1 MB
[ram_regression] docs= 550  ingested= 28.2 MB  rss= 550.6 MB  peak= 578.1 MB
[ram_regression] docs= 600  ingested= 30.7 MB  rss= 543.4 MB  peak= 578.1 MB
[ram_regression] docs= 650  ingested= 33.3 MB  rss= 556.5 MB  peak= 578.1 MB
[ram_regression] docs= 700  ingested= 35.9 MB  rss= 543.7 MB  peak= 578.1 MB
[ram_regression] docs= 750  ingested= 38.4 MB  rss= 530.4 MB  peak= 578.1 MB
[ram_regression] docs= 800  ingested= 41.0 MB  rss= 526.9 MB  peak= 578.1 MB
[ram_regression] docs= 850  ingested= 43.6 MB  rss= 494.9 MB  peak= 578.1 MB
[ram_regression] docs= 900  ingested= 46.1 MB  rss= 560.0 MB  peak= 578.1 MB
[ram_regression] docs= 950  ingested= 48.7 MB  rss= 588.1 MB  peak= 588.1 MB
[ram_regression] docs=1000  ingested= 51.2 MB  rss= 541.3 MB  peak= 588.1 MB
[ram_regression] SUMMARY: ingested 51.2 MB across 1000 docs -> 31000 chunks; peak RSS 588.1 MB (limit 1024 MB)
test ram_regression_50mb_ingest_stays_under_1gb ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2619.96s