API keys & air-gapped deployment
How greatmemory authenticates clients, how to create and rotate keys, and how to run everything with no internet access at all.
How verification works
Verification is entirely local — there is no license server, no
phone-home, no token issuer. A key is a shared secret: the server loads
GM_API_KEYS at startup and compares each request's
Authorization: Bearer <key> header against that list using a
constant-time comparison, so timing attacks cannot probe key bytes.
client ── Authorization: Bearer gm_a1b2… ──► gmem server
│ constant-time compare vs
│ GM_API_KEYS (in memory)
▼
200 … or 401 {"error":"unauthorized"}
- If
GM_API_KEYSis unset, auth is off — the local-first default (server binds to loopback, nothing else can reach it). - If it is set, every route requires a valid key except
GET /v1/healthz, which stays open so load balancers can probe. - The MCP endpoint (
/mcp) sits behind the same middleware.
Because verification is a local string comparison, an air-gapped client and server behave exactly like an internet-connected pair.
Creating keys
Key generation is up to you — any high-entropy string works. A common convention:
echo "gm_$(openssl rand -hex 32)"
Configure one or more (comma-separated):
GM_API_KEYS=gm_abc...,gm_def... gmem serve
or in greatmemory.toml:
api_keys = ["gm_abc...", "gm_def..."]
Clients authenticate with the GM_API_KEY environment variable (CLI) or by
sending the Bearer header directly (REST, SDKs, MCP over HTTP):
GM_API_KEY=gm_abc... gmem search "deploy checklist"
curl -H "Authorization: Bearer gm_abc..." \
-X POST https://memory.example.internal/v1/search \
-d '{"query":"deploy checklist"}'
Managing and rotating keys
Multiple keys make rotation a rolling operation with no downtime window for well-behaved clients:
- Add the new key to
GM_API_KEYS(keep the old one) and restart. - Move clients to the new key at their own pace.
- Remove the old key and restart. It is rejected from that moment.
Practical guidance:
- One key per client or service. Revoking one client then never affects the others, and access logs stay attributable.
- Keys are read at startup — changes require a restart (fast: the server starts in well under a second; the embedding model loads lazily).
- Protect wherever the keys live:
chmod 600on env files, or your platform's secret store (Railway variables, AWS SSM/Secrets Manager, Azure Key Vault references, GCP Secret Manager).
Fully air-gapped operation
greatmemory needs the network for exactly one thing, exactly once: the
embedding model (~100 MB) downloads on first ingest into
GM_DATA_DIR/models/. For hosts with no internet access:
- On any connected machine, run the server and ingest one document (or
run the test suite) so the model lands in
GM_DATA_DIR/models/. - Copy that
models/directory to the restricted host's data directory. - Start the server. Embedding, search, and storage now run fully offline.
Fact distillation is optional and also stays local if you point GM_LLM
at an Ollama instance on the same network segment. With GM_LLM unset,
storage and retrieval work and only fact extraction is off.
Nothing in the binary calls out: no telemetry, no update checks, no license validation. If your firewall logs show traffic from greatmemory, it is only your configured embedder or LLM endpoint (when you chose a remote one).
Current limitations (v0.1)
Stated plainly so you can plan around them:
- Keys are compared in memory as plain strings — they are not hashed at rest. Treat the env file/secret store as the security boundary.
- Keys are all-or-nothing: any valid key can read and write every space. If you need hard tenant isolation today, run one instance per tenant (the 19 MB idle footprint makes that practical).
- No built-in expiry or audit UI. Revocation = remove the key and restart. Hashed keys, per-key space scoping, and a management UI are on the roadmap.
Security checklist for a public-facing server
- Set
GM_API_KEYSbefore binding to anything other than loopback. - Terminate TLS at your load balancer or reverse proxy (the server speaks plain HTTP).
- Restrict
GM_CORS_ORIGINSto your real frontend origins. - Keep
/v1/healthzas the only thing your probe needs — it leaks nothing but{"status":"ok"}.