Benchmark methodology

Every number on the front page comes from a test anyone with the source tree can run. This page documents exactly what that test does, what it measures, what it deliberately does not measure, and how to reproduce it. The test ships with greatmemory itself: crates/gm-cli/tests/ram_regression.rs.

What is measured

Peak RSS (resident set size — the physical RAM a process actually occupies) of a single process running greatmemory's full ingestion path, while it ingests 51.2 MB of text. RSS is read with the memory-stats crate, which uses task_info on macOS and /proc on Linux — the same numbers Activity Monitor and ps report.

The corpus: deterministic by construction

The test generates 1,000 documents of ~50 KB each — 51.2 MB total — of English-like sentences (6–16 words) drawn from a fixed 115-word vocabulary. Randomness comes from a seeded xorshift64* generator with the constant seed 0x6772_6561_746D_656D ("greatmem" in ASCII), so every run on every machine ingests the byte-identical corpus. No download, no dataset versioning problem, no cherry-picking: the corpus is constructed inside the test from 30 lines of code you can read.

The system under test: the real pipeline

Nothing is mocked. The test wires up the same components the server uses, with the same settings:

Sampling and assertions

RSS is sampled every 50 documents (20 samples across the run) and once more after the pipeline fully drains; the peak across all samples is kept. The test then asserts three things, and fails if any is false:

The same test runs in CI on every push to main (Ubuntu, release build). If a future change makes memory grow with data, the build breaks. The bound is enforced, not promised.

The published run

The numbers on the front page come from a release-mode run on 2026-06-12: Apple M1 Pro, 16 GB RAM, macOS, total duration 2,620 seconds (~44 minutes — CPU embedding of 51.2 MB is slow; that is the realistic cost, not an artifact). Peak RSS was 588.1 MB. Note that RSS was already ~536 MB after the first 50 documents (2.6 MB ingested): that baseline is the ONNX model arena plus one batch of activations. From there to 51.2 MB ingested, memory moved within ±10% and never trended upward — that flatness, not the absolute number, is the design claim.

The idle figure

The 19 MB "idle server" number is measured differently: it is rss_bytes as reported by the server's own GET /v1/stats endpoint, right after booting the published 158 MB Docker image (before the embedding model is loaded — the model loads lazily on first ingest). It demonstrates the floor, not the working set.

Honest limitations

Reproduce it

# from the greatmemory source tree
cargo test -p gm-cli --release -- --ignored ram_regression --nocapture

Requirements: Rust (stable), ~150 MB of disk for the model download on first run, and patience — expect tens of minutes on CPU. The test prints a sample line every 50 documents and a summary at the end. Send us your summary line and machine specs if you'd like your run added to the results table.

Raw output of the published run
[ram_regression] docs=  50  ingested=  2.6 MB  rss= 535.9 MB  peak= 535.9 MB
[ram_regression] docs= 100  ingested=  5.1 MB  rss= 530.5 MB  peak= 535.9 MB
[ram_regression] docs= 150  ingested=  7.7 MB  rss= 529.3 MB  peak= 535.9 MB
[ram_regression] docs= 200  ingested= 10.2 MB  rss= 535.8 MB  peak= 535.9 MB
[ram_regression] docs= 250  ingested= 12.8 MB  rss= 512.6 MB  peak= 535.9 MB
[ram_regression] docs= 300  ingested= 15.4 MB  rss= 578.1 MB  peak= 578.1 MB
[ram_regression] docs= 350  ingested= 17.9 MB  rss= 545.5 MB  peak= 578.1 MB
[ram_regression] docs= 400  ingested= 20.5 MB  rss= 547.7 MB  peak= 578.1 MB
[ram_regression] docs= 450  ingested= 23.1 MB  rss= 520.9 MB  peak= 578.1 MB
[ram_regression] docs= 500  ingested= 25.6 MB  rss= 545.8 MB  peak= 578.1 MB
[ram_regression] docs= 550  ingested= 28.2 MB  rss= 550.6 MB  peak= 578.1 MB
[ram_regression] docs= 600  ingested= 30.7 MB  rss= 543.4 MB  peak= 578.1 MB
[ram_regression] docs= 650  ingested= 33.3 MB  rss= 556.5 MB  peak= 578.1 MB
[ram_regression] docs= 700  ingested= 35.9 MB  rss= 543.7 MB  peak= 578.1 MB
[ram_regression] docs= 750  ingested= 38.4 MB  rss= 530.4 MB  peak= 578.1 MB
[ram_regression] docs= 800  ingested= 41.0 MB  rss= 526.9 MB  peak= 578.1 MB
[ram_regression] docs= 850  ingested= 43.6 MB  rss= 494.9 MB  peak= 578.1 MB
[ram_regression] docs= 900  ingested= 46.1 MB  rss= 560.0 MB  peak= 578.1 MB
[ram_regression] docs= 950  ingested= 48.7 MB  rss= 588.1 MB  peak= 588.1 MB
[ram_regression] docs=1000  ingested= 51.2 MB  rss= 541.3 MB  peak= 588.1 MB
[ram_regression] SUMMARY: ingested 51.2 MB across 1000 docs -> 31000 chunks; peak RSS 588.1 MB (limit 1024 MB)
test ram_regression_50mb_ingest_stays_under_1gb ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2619.96s