greatmemory

Reference

Configuration reference

Every environment variable, greatmemory.toml key, CLI flag, and feature toggle in one place.

Configuration reference

Every knob in one place: the environment variables, the greatmemory.toml file, the gmem CLI flags, and the feature toggles. greatmemory layers its configuration so later layers win:

CLI flags  >  GM_* environment variables  >  greatmemory.toml  >  built-in defaults
  • CLI flags - gmem serve --host --port --data-dir --config plus the feature toggles below; gmem mcp --data-dir --config. See the CLI reference.
  • Environment - every GM_* variable in the tables below.
  • greatmemory.toml - read from the current directory by default (skipped silently when absent); --config <path> selects an explicit file, which must exist. Unknown keys are an error, so typos fail fast.
  • Defaults - local-first: loopback bind, SQLite, local embeddings, no auth, no LLM.

Booleans accept 1/0, true/false, yes/no, on/off (case-insensitive); any other value fails startup. Comma-separated lists (GM_CORS_ORIGINS, GM_API_KEYS) are trimmed and drop empty segments. An unparseable GM_PORT or GM_EMBEDDER_DIM fails startup rather than being silently ignored.

Server environment variables

VariableDefaultExamplePurpose
GM_HOST127.0.0.10.0.0.0Bind address for serve
GM_PORT74378080Bind port for serve
GM_DATA_DIR./.greatmemory/var/lib/greatmemoryDirectory for the SQLite db and embedding model cache (created on demand)
GM_DB(→ <data_dir>/greatmemory.db)postgres://gm:pw@db:5432/gmDatabase selector: a postgres:///postgresql:// URL selects Postgres + pgvector; :memory: an in-memory SQLite db; anything else a SQLite file path
GM_CORS_ORIGINS(empty → any localhost)https://app.example.com,https://admin.example.comComma-separated exact allowed CORS origins
GM_API_KEYS(empty → no auth)gm_k1,gm_k2Comma-separated bearer keys for /v1 and /mcp
GM_EMBEDDERfastembedollamaEmbedder kind: fastembed | ollama | openai | fake
GM_EMBEDDER_URL(per kind)http://127.0.0.1:11434Base URL for HTTP embedders
GM_EMBEDDER_API_KEY(unset)sk-...API key for the openai embedder (optional)
GM_EMBEDDER_MODEL(unset)nomic-embed-textModel name (required for ollama/openai)
GM_EMBEDDER_DIM(unset)768Embedding dimension (required for ollama/openai)
GM_LLMnoneollamaLLM kind for fact extraction & reflection: none | ollama | openai
GM_LLM_URL(per kind)https://api.example.com/v1Base URL (required for openai)
GM_LLM_API_KEY(unset)sk-...API key for the openai LLM (optional)
GM_LLM_MODEL(unset)llama3Model name (required for ollama/openai)
GM_RERANKnonelexicalSecond-stage reranker: none | noop | lexical
GM_GRAPH_EXPAND_HOPS01Knowledge-graph expansion hops during retrieval (0 disables)
GM_GRAPH_BACKENDstoremysql://…Knowledge-graph backend: store reuses the main DB, or a separate mysql:///postgres:///sqlite path
GM_CARDS_SEMANTIC_LINKINGfalsetrueEmbed memory cards on create and auto-link by cosine similarity instead of keyword/tag overlap
GM_REFLECTIONtruefalseProspective reflection (RMM): auto-summarize each ingested document into a memory card. Needs an LLM; a no-op without one
GM_USEFULNESStruefalseLive-Evo usefulness reweighting: nudge frequently-retrieved memories up the ranking
GM_TRUSTtruefalseTrust gating (InjecMEM): quarantine prompt-injection-looking content from the knowledge graph and downgrade its trust score

Memory feature toggles

Three higher-level memory behaviors are on by default and can be turned off independently - via greatmemory.toml, a GM_* env var, or a gmem serve flag (the flag wins, then env, then toml). The research behind each is on the Research & roadmap page.

Featuretoml keyenv varserve flags
Prospective reflection (RMM)features.reflectionGM_REFLECTION--enable-reflection / --disable-reflection
Usefulness reweighting (Live-Evo)features.usefulnessGM_USEFULNESS--enable-usefulness / --disable-usefulness
Trust gating (InjecMEM)features.trustGM_TRUST--enable-trust / --disable-trust
# All three off (ingest-only, no LLM cards, plain relevance ranking, no quarantine):
GM_REFLECTION=false GM_USEFULNESS=false GM_TRUST=false gmem serve
# Same, via flags (flags override env/toml; both --enable-X and --disable-X is an error):
gmem serve --disable-reflection --disable-usefulness --disable-trust
  • Reflection - each ingested document is summarized by the LLM into one agentic memory card (title, summary, keywords, tags), auto-linked to related cards. Requires GM_LLM; a no-op without one. Off → ingest without generating cards (you can still create them via /v1/cards or gmem card create).
  • Usefulness - retrieval keeps a usefulness score per memory, raised when a memory is returned and decayed when it is not, then nudges frequently-used memories up the ranking. Off → purely relevance-ordered ranking.
  • Trust gating - content that looks like a prompt-injection attempt is stored (auditable) but quarantined from the knowledge graph: no facts/edges extracted, trust score downgraded. Off → no quarantine; use only when every source is trusted.

Other toggles

BehaviorHow to enableNotes
RerankingGM_RERANK=lexicalOff by default (none); reorders fused candidates before truncation
Graph expansionGM_GRAPH_EXPAND_HOPS=1Off by default (0); walks the graph to pull in related facts
Semantic card linkingGM_CARDS_SEMANTIC_LINKING=trueOff by default; links cards by embedding cosine similarity
Separate graph backendGM_GRAPH_BACKEND=mysql://…Default store reuses the main DB

Hosted model providers

Any OpenAI-compatible endpoint works as the LLM (GM_LLM=openai) and/or embedder (GM_EMBEDDER=openai): point GM_*_URL at the provider's full versioned base (for OpenAI itself that's https://api.openai.com/v1) and greatmemory appends only /chat/completions or /embeddings, with Authorization: Bearer $GM_*_API_KEY. That's why the major clouds slot in without code changes, even though their version paths differ. The big three have dedicated walkthroughs:

ProviderAuthLLMEmbeddingsGuide
Google Vertex AIOAuth token (gcloud)yesyesVertex AI
Amazon BedrockBedrock API keyyesno (use fastembed)Bedrock
Azure OpenAIresource API keyyesyesAzure OpenAI

Gemini API (Google AI Studio)

The simplest Google path - one API key from aistudio.google.com, no Google Cloud project (for the project-based platform, see Vertex AI). Base URL https://generativelanguage.googleapis.com/v1beta/openai:

# Fact extraction + prospective reflection via Gemini
GM_LLM=openai \
GM_LLM_URL=https://generativelanguage.googleapis.com/v1beta/openai \
GM_LLM_API_KEY=$GEMINI_API_KEY \
GM_LLM_MODEL=gemini-3.5-flash \
gmem serve

# ...and/or Gemini embeddings (alongside the LLM vars above):
GM_EMBEDDER=openai \
GM_EMBEDDER_URL=https://generativelanguage.googleapis.com/v1beta/openai \
GM_EMBEDDER_API_KEY=$GEMINI_API_KEY \
GM_EMBEDDER_MODEL=gemini-embedding-001 \
GM_EMBEDDER_DIM=3072
  • Chat models: gemini-3.5-flash (latest GA flash - a good default), gemini-3.1-flash-lite (cheapest), or gemini-2.5-flash / gemini-2.5-pro (current as of June 2026; check the model list for newer GA models).
  • Embedding model: gemini-embedding-001, default output 3072 dims - set GM_EMBEDDER_DIM=3072 (the dimension must match exactly or ingestion fails).

Embeddings note (all providers). greatmemory requires the returned vector length to equal GM_EMBEDDER_DIM exactly and does not request a reduced dimension, so pick a model whose default output matches the dim you set. Changing the embedding model needs a fresh data dir (existing vectors won't match a new model).

CLI client variables

The client commands (add, search, facts, status) talk to a running server and use two variables of their own:

VariableDefaultPurpose
GM_URLhttp://127.0.0.1:7437Server base URL
GM_API_KEY(unset)Bearer token sent when the server has GM_API_KEYS configured

greatmemory.toml

All keys optional; this example shows everything, including the feature toggles:

host = "127.0.0.1"
port = 7437
data_dir = "./.greatmemory"
# db = "postgres://gm:password@localhost:5432/greatmemory"
cors_origins = ["http://localhost:3000"]
api_keys = ["gm_change_me"]

[embedder]
kind = "fastembed"          # fastembed | ollama | openai | fake
# base_url = "http://127.0.0.1:11434"
# api_key = "sk-..."
# model = "nomic-embed-text"
# dim = 768

[llm]
kind = "none"               # none | ollama | openai
# base_url = "http://127.0.0.1:11434"
# api_key = "sk-..."
# model = "llama3"

[rerank]
kind = "none"               # none | noop | lexical

[graph]
backend = "store"           # store | mysql://… | postgres://… | sqlite path | :memory:

[cards]
semantic_linking = false

[features]
reflection = true           # prospective reflection (RMM); needs an LLM
usefulness = true           # Live-Evo usefulness reweighting
trust = true                # trust gating (InjecMEM)

graph_expand_hops = 0       # graph expansion during retrieval (0 = off)

Auth & CORS

Set GM_API_KEYS (or api_keys in toml) to one or more bearer keys. When non-empty, every /v1 route except /v1/healthz - and the /mcp endpoint - requires Authorization: Bearer <key>. By default (empty GM_CORS_ORIGINS) the server allows any localhost origin; set an explicit comma-separated list to replace that policy entirely (the permissive localhost default is then disabled). See API keys & air-gapped use for rotation and offline operation.

GM_API_KEYS=gm_prod_key GM_CORS_ORIGINS=https://app.example.com gmem serve
GM_API_KEY=gm_prod_key gmem status   # client side