Getting started
Local quickstart
Install gmem, start the server, store and search your first memory - all on your own machine.
From zero to a searchable memory in about sixty seconds, entirely on your own machine. No accounts, no API keys, no external services.
1. Install
Prebuilt binaries and Homebrew are coming soon. For now, build from the greatmemory source tree (requires a Rust toolchain):
# from the greatmemory source tree
cargo install --path crates/gm-cli
This installs a single binary named gmem - server, MCP server, and CLI in one.
2. Start the server
gmem serve
The server listens on http://127.0.0.1:7437 and keeps its data (a SQLite database and the embedding model cache) in ./.greatmemory/.
First run downloads a model. On first start, gmem fetches a small local embedding model (BGE-small, a few tens of MB). This is a one-time cost - after that, everything runs offline. If your first add or search feels slow, this download is why.
3. Store your first memory
With the CLI, in another terminal:
gmem add "The staging database runs Postgres 17 with pgvector."
It prints the new memory's id. The same thing works over HTTP:
curl -s http://127.0.0.1:7437/v1/memories \
-H 'Content-Type: application/json' \
-d '{"content": "The staging database runs Postgres 17 with pgvector."}'
{"id":"0196...","space":"default"}
Ingestion is asynchronous (the API returns 202 Accepted): the text is chunked and embedded in the background through a bounded queue, so adds return immediately and memory stays flat.
4. Search
gmem search "what database does staging use?"
0.016 The staging database runs Postgres 17 with pgvector.
Or get a ready-to-inject context block instead of scored chunks:
gmem search "staging setup" --mode context
Search is hybrid: vector similarity, BM25 full-text, and extracted facts are fused with reciprocal rank fusion, then boosted by recency.
5. Use it as prompt context
For agents, the most useful mode is a ready-to-inject context block:
gmem search "staging setup" --mode context
In an app, call the same behavior through REST before the model call:
curl -s http://127.0.0.1:7437/v1/search \
-H 'Content-Type: application/json' \
-d '{"query": "staging setup", "mode": "context", "max_tokens": 1000}'
Put the returned context into your system or developer prompt, then store any new durable fact after the turn.
6. Remove test data
gmem add and POST /v1/memories both return a memory id. Delete by id when a
test import should no longer be retrieved:
curl -s -X DELETE http://127.0.0.1:7437/v1/memories/0196...
For bulk imports, write the returned ids to a manifest and delete from that list. The full add-use-remove flow is in Ingest, use, and remove data.
7. Check status
gmem status
RSS: 312 MB
documents: 2
chunks: 2
active facts: 0
uptime: 84s
RSS is printed first on purpose - bounded memory is the headline feature, and you can verify it live at any time.
Next steps
- Plug it into Claude Code: Claude Code integration
- Expose it as a server with auth and Postgres: Server quickstart
- Import files and clean them up: Ingest, use, and remove data
- The full command reference: CLI reference