Getting started

Local quickstart

Install gmem, start the server, store and search your first memory - all on your own machine.

From zero to a searchable memory in about sixty seconds, entirely on your own machine. No accounts, no API keys, no external services.

1. Install

Prebuilt binaries and Homebrew are coming soon. For now, build from the greatmemory source tree (requires a Rust toolchain):

# from the greatmemory source tree
cargo install --path crates/gm-cli

This installs a single binary named gmem - server, MCP server, and CLI in one.

2. Start the server

gmem serve

The server listens on http://127.0.0.1:7437 and keeps its data (a SQLite database and the embedding model cache) in ./.greatmemory/.

First run downloads a model. On first start, gmem fetches a small local embedding model (BGE-small, a few tens of MB). This is a one-time cost - after that, everything runs offline. If your first add or search feels slow, this download is why.

3. Store your first memory

With the CLI, in another terminal:

gmem add "The staging database runs Postgres 17 with pgvector."

It prints the new memory's id. The same thing works over HTTP:

curl -s http://127.0.0.1:7437/v1/memories \
  -H 'Content-Type: application/json' \
  -d '{"content": "The staging database runs Postgres 17 with pgvector."}'

{"id":"0196...","space":"default"}

Ingestion is asynchronous (the API returns 202 Accepted): the text is chunked and embedded in the background through a bounded queue, so adds return immediately and memory stays flat.

4. Search

gmem search "what database does staging use?"

0.016  The staging database runs Postgres 17 with pgvector.

Or get a ready-to-inject context block instead of scored chunks:

gmem search "staging setup" --mode context

Search is hybrid: vector similarity, BM25 full-text, and extracted facts are fused with reciprocal rank fusion, then boosted by recency.

5. Use it as prompt context

For agents, the most useful mode is a ready-to-inject context block:

gmem search "staging setup" --mode context

In an app, call the same behavior through REST before the model call:

curl -s http://127.0.0.1:7437/v1/search \
  -H 'Content-Type: application/json' \
  -d '{"query": "staging setup", "mode": "context", "max_tokens": 1000}'

Put the returned context into your system or developer prompt, then store any new durable fact after the turn.

6. Remove test data

gmem add and POST /v1/memories both return a memory id. Delete by id when a test import should no longer be retrieved:

curl -s -X DELETE http://127.0.0.1:7437/v1/memories/0196...

For bulk imports, write the returned ids to a manifest and delete from that list. The full add-use-remove flow is in Ingest, use, and remove data.

7. Check status

gmem status

RSS: 312 MB
documents: 2
chunks: 2
active facts: 0
uptime: 84s

RSS is printed first on purpose - bounded memory is the headline feature, and you can verify it live at any time.

Next steps

Plug it into Claude Code: Claude Code integration
Expose it as a server with auth and Postgres: Server quickstart
Import files and clean them up: Ingest, use, and remove data
The full command reference: CLI reference