Reference
Configuration reference
Every environment variable, greatmemory.toml key, CLI flag, and feature toggle in one place.
Configuration reference
Every knob in one place: the environment variables, the greatmemory.toml file,
the gmem CLI flags, and the feature toggles. greatmemory layers its
configuration so later layers win:
CLI flags > GM_* environment variables > greatmemory.toml > built-in defaults
- CLI flags -
gmem serve --host --port --data-dir --configplus the feature toggles below;gmem mcp --data-dir --config. See the CLI reference. - Environment - every
GM_*variable in the tables below. greatmemory.toml- read from the current directory by default (skipped silently when absent);--config <path>selects an explicit file, which must exist. Unknown keys are an error, so typos fail fast.- Defaults - local-first: loopback bind, SQLite, local embeddings, no auth, no LLM.
Booleans accept 1/0, true/false, yes/no, on/off (case-insensitive); any
other value fails startup. Comma-separated lists (GM_CORS_ORIGINS,
GM_API_KEYS) are trimmed and drop empty segments. An unparseable GM_PORT or
GM_EMBEDDER_DIM fails startup rather than being silently ignored.
Server environment variables
| Variable | Default | Example | Purpose |
|---|---|---|---|
GM_HOST | 127.0.0.1 | 0.0.0.0 | Bind address for serve |
GM_PORT | 7437 | 8080 | Bind port for serve |
GM_DATA_DIR | ./.greatmemory | /var/lib/greatmemory | Directory for the SQLite db and embedding model cache (created on demand) |
GM_DB | (→ <data_dir>/greatmemory.db) | postgres://gm:pw@db:5432/gm | Database selector: a postgres:///postgresql:// URL selects Postgres + pgvector; :memory: an in-memory SQLite db; anything else a SQLite file path |
GM_CORS_ORIGINS | (empty → any localhost) | https://app.example.com,https://admin.example.com | Comma-separated exact allowed CORS origins |
GM_API_KEYS | (empty → no auth) | gm_k1,gm_k2 | Comma-separated bearer keys for /v1 and /mcp |
GM_EMBEDDER | fastembed | ollama | Embedder kind: fastembed | ollama | openai | fake |
GM_EMBEDDER_URL | (per kind) | http://127.0.0.1:11434 | Base URL for HTTP embedders |
GM_EMBEDDER_API_KEY | (unset) | sk-... | API key for the openai embedder (optional) |
GM_EMBEDDER_MODEL | (unset) | nomic-embed-text | Model name (required for ollama/openai) |
GM_EMBEDDER_DIM | (unset) | 768 | Embedding dimension (required for ollama/openai) |
GM_LLM | none | ollama | LLM kind for fact extraction & reflection: none | ollama | openai |
GM_LLM_URL | (per kind) | https://api.example.com/v1 | Base URL (required for openai) |
GM_LLM_API_KEY | (unset) | sk-... | API key for the openai LLM (optional) |
GM_LLM_MODEL | (unset) | llama3 | Model name (required for ollama/openai) |
GM_RERANK | none | lexical | Second-stage reranker: none | noop | lexical |
GM_GRAPH_EXPAND_HOPS | 0 | 1 | Knowledge-graph expansion hops during retrieval (0 disables) |
GM_GRAPH_BACKEND | store | mysql://… | Knowledge-graph backend: store reuses the main DB, or a separate mysql:///postgres:///sqlite path |
GM_CARDS_SEMANTIC_LINKING | false | true | Embed memory cards on create and auto-link by cosine similarity instead of keyword/tag overlap |
GM_REFLECTION | true | false | Prospective reflection (RMM): auto-summarize each ingested document into a memory card. Needs an LLM; a no-op without one |
GM_USEFULNESS | true | false | Live-Evo usefulness reweighting: nudge frequently-retrieved memories up the ranking |
GM_TRUST | true | false | Trust gating (InjecMEM): quarantine prompt-injection-looking content from the knowledge graph and downgrade its trust score |
Memory feature toggles
Three higher-level memory behaviors are on by default and can be turned off
independently - via greatmemory.toml, a GM_* env var, or a gmem serve flag
(the flag wins, then env, then toml). The research behind each is on the
Research & roadmap page.
| Feature | toml key | env var | serve flags |
|---|---|---|---|
| Prospective reflection (RMM) | features.reflection | GM_REFLECTION | --enable-reflection / --disable-reflection |
| Usefulness reweighting (Live-Evo) | features.usefulness | GM_USEFULNESS | --enable-usefulness / --disable-usefulness |
| Trust gating (InjecMEM) | features.trust | GM_TRUST | --enable-trust / --disable-trust |
# All three off (ingest-only, no LLM cards, plain relevance ranking, no quarantine):
GM_REFLECTION=false GM_USEFULNESS=false GM_TRUST=false gmem serve
# Same, via flags (flags override env/toml; both --enable-X and --disable-X is an error):
gmem serve --disable-reflection --disable-usefulness --disable-trust
- Reflection - each ingested document is summarized by the LLM into one
agentic memory card (title, summary, keywords, tags), auto-linked to related
cards. Requires
GM_LLM; a no-op without one. Off → ingest without generating cards (you can still create them via/v1/cardsorgmem card create). - Usefulness - retrieval keeps a usefulness score per memory, raised when a memory is returned and decayed when it is not, then nudges frequently-used memories up the ranking. Off → purely relevance-ordered ranking.
- Trust gating - content that looks like a prompt-injection attempt is stored (auditable) but quarantined from the knowledge graph: no facts/edges extracted, trust score downgraded. Off → no quarantine; use only when every source is trusted.
Other toggles
| Behavior | How to enable | Notes |
|---|---|---|
| Reranking | GM_RERANK=lexical | Off by default (none); reorders fused candidates before truncation |
| Graph expansion | GM_GRAPH_EXPAND_HOPS=1 | Off by default (0); walks the graph to pull in related facts |
| Semantic card linking | GM_CARDS_SEMANTIC_LINKING=true | Off by default; links cards by embedding cosine similarity |
| Separate graph backend | GM_GRAPH_BACKEND=mysql://… | Default store reuses the main DB |
Hosted model providers
Any OpenAI-compatible endpoint works as the LLM (GM_LLM=openai) and/or embedder
(GM_EMBEDDER=openai): point GM_*_URL at the provider's full versioned base
(for OpenAI itself that's https://api.openai.com/v1) and greatmemory appends only
/chat/completions or /embeddings, with Authorization: Bearer $GM_*_API_KEY.
That's why the major clouds slot in without code changes, even though their version
paths differ. The big three have dedicated walkthroughs:
| Provider | Auth | LLM | Embeddings | Guide |
|---|---|---|---|---|
| Google Vertex AI | OAuth token (gcloud) | yes | yes | Vertex AI |
| Amazon Bedrock | Bedrock API key | yes | no (use fastembed) | Bedrock |
| Azure OpenAI | resource API key | yes | yes | Azure OpenAI |
Gemini API (Google AI Studio)
The simplest Google path - one API key from
aistudio.google.com, no Google Cloud project
(for the project-based platform, see Vertex AI). Base URL
https://generativelanguage.googleapis.com/v1beta/openai:
# Fact extraction + prospective reflection via Gemini
GM_LLM=openai \
GM_LLM_URL=https://generativelanguage.googleapis.com/v1beta/openai \
GM_LLM_API_KEY=$GEMINI_API_KEY \
GM_LLM_MODEL=gemini-3.5-flash \
gmem serve
# ...and/or Gemini embeddings (alongside the LLM vars above):
GM_EMBEDDER=openai \
GM_EMBEDDER_URL=https://generativelanguage.googleapis.com/v1beta/openai \
GM_EMBEDDER_API_KEY=$GEMINI_API_KEY \
GM_EMBEDDER_MODEL=gemini-embedding-001 \
GM_EMBEDDER_DIM=3072
- Chat models:
gemini-3.5-flash(latest GA flash - a good default),gemini-3.1-flash-lite(cheapest), orgemini-2.5-flash/gemini-2.5-pro(current as of June 2026; check the model list for newer GA models). - Embedding model:
gemini-embedding-001, default output 3072 dims - setGM_EMBEDDER_DIM=3072(the dimension must match exactly or ingestion fails).
Embeddings note (all providers). greatmemory requires the returned vector length to equal
GM_EMBEDDER_DIMexactly and does not request a reduced dimension, so pick a model whose default output matches the dim you set. Changing the embedding model needs a fresh data dir (existing vectors won't match a new model).
CLI client variables
The client commands (add, search, facts, status) talk to a running server
and use two variables of their own:
| Variable | Default | Purpose |
|---|---|---|
GM_URL | http://127.0.0.1:7437 | Server base URL |
GM_API_KEY | (unset) | Bearer token sent when the server has GM_API_KEYS configured |
greatmemory.toml
All keys optional; this example shows everything, including the feature toggles:
host = "127.0.0.1"
port = 7437
data_dir = "./.greatmemory"
# db = "postgres://gm:password@localhost:5432/greatmemory"
cors_origins = ["http://localhost:3000"]
api_keys = ["gm_change_me"]
[embedder]
kind = "fastembed" # fastembed | ollama | openai | fake
# base_url = "http://127.0.0.1:11434"
# api_key = "sk-..."
# model = "nomic-embed-text"
# dim = 768
[llm]
kind = "none" # none | ollama | openai
# base_url = "http://127.0.0.1:11434"
# api_key = "sk-..."
# model = "llama3"
[rerank]
kind = "none" # none | noop | lexical
[graph]
backend = "store" # store | mysql://… | postgres://… | sqlite path | :memory:
[cards]
semantic_linking = false
[features]
reflection = true # prospective reflection (RMM); needs an LLM
usefulness = true # Live-Evo usefulness reweighting
trust = true # trust gating (InjecMEM)
graph_expand_hops = 0 # graph expansion during retrieval (0 = off)
Auth & CORS
Set GM_API_KEYS (or api_keys in toml) to one or more bearer keys. When
non-empty, every /v1 route except /v1/healthz - and the /mcp endpoint -
requires Authorization: Bearer <key>. By default (empty GM_CORS_ORIGINS) the
server allows any localhost origin; set an explicit comma-separated list to
replace that policy entirely (the permissive localhost default is then disabled).
See API keys & air-gapped use for rotation and offline operation.
GM_API_KEYS=gm_prod_key GM_CORS_ORIGINS=https://app.example.com gmem serve
GM_API_KEY=gm_prod_key gmem status # client side