For chat apps, custom agents, and anything else that speaks HTTP, use the REST API directly. Three calls cover the whole memory loop:
POST /v1/memories— store what the user said worth keepingPOST /v1/searchwithmode: "context"— fetch a ready-to-inject context block before each model callPOST /v1/searchwithmode: "recall"— raw scored chunks + facts if you assemble prompts yourself
Start the server with gmem serve (default http://localhost:7437). CORS already allows any localhost origin out of the box, so browser-based local apps work without configuration; set GM_CORS_ORIGINS for deployed origins. When GM_API_KEYS is set, send Authorization: Bearer <key> with every request.
Use one space per end user (or per agent) to keep memories isolated.
curl
# Add a memory
curl -s http://localhost:7437/v1/memories \
-H 'Content-Type: application/json' \
-d '{"content": "User prefers concise answers and works in UTC+5:30.", "space": "user-42"}'
# Search (scored chunks + facts)
curl -s http://localhost:7437/v1/search \
-H 'Content-Type: application/json' \
-d '{"query": "user preferences", "space": "user-42"}'
# Ready-to-inject context block
curl -s http://localhost:7437/v1/search \
-H 'Content-Type: application/json' \
-d '{"query": "user preferences", "space": "user-42", "mode": "context", "max_tokens": 1000}'
# All active facts grouped by predicate
curl -s 'http://localhost:7437/v1/profile?space=user-42'
Python (requests)
import requests
GM = "http://localhost:7437"
SPACE = "user-42"
# headers={"Authorization": "Bearer gm_..."} when GM_API_KEYS is set
session = requests.Session()
def remember(content: str) -> str:
r = session.post(f"{GM}/v1/memories", json={"content": content, "space": SPACE})
r.raise_for_status()
return r.json()["id"] # 202: ingestion continues in the background
def memory_context(query: str, max_tokens: int = 1000) -> str:
r = session.post(f"{GM}/v1/search", json={
"query": query, "space": SPACE, "mode": "context", "max_tokens": max_tokens,
})
r.raise_for_status()
return r.json()["context"]
def profile() -> dict:
r = session.get(f"{GM}/v1/profile", params={"space": SPACE})
r.raise_for_status()
return r.json()["facts"]
# Before each LLM call:
context = memory_context(user_message)
system_prompt = f"You have these memories about the user:\n{context}"
# ... call your model with system_prompt ...
# After the turn, store anything durable:
remember("User's name is Priya; she is building a Flutter app.")
TypeScript (fetch)
const GM = "http://localhost:7437";
const SPACE = "user-42";
// Add { Authorization: "Bearer gm_..." } to headers when GM_API_KEYS is set.
async function remember(content: string): Promise<string> {
const res = await fetch(`${GM}/v1/memories`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ content, space: SPACE }),
});
if (!res.ok) throw new Error(`remember failed: ${res.status}`);
const { id } = await res.json();
return id;
}
async function memoryContext(query: string, maxTokens = 1000): Promise<string> {
const res = await fetch(`${GM}/v1/search`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ query, space: SPACE, mode: "context", max_tokens: maxTokens }),
});
if (!res.ok) throw new Error(`search failed: ${res.status}`);
const { context } = await res.json();
return context;
}
async function memoryProfile(): Promise<Record<string, unknown>> {
const res = await fetch(`${GM}/v1/profile?space=${SPACE}`);
if (!res.ok) throw new Error(`profile failed: ${res.status}`);
const { facts } = await res.json();
return facts;
}
// Inject memoryContext(userMessage) into your system prompt before each call,
// and remember(...) durable details after each turn.
Generated clients
The server publishes its OpenAPI spec — generated from the same code that serves it — at:
GET /v1/openapi.json
Point any OpenAPI client generator at it to get a typed client for your language. The full HTTP surface (memories CRUD, search, facts, profile, spaces, stats, health) is in the spec.
Tips
- Adds are asynchronous (
202): a memory becomes searchable a moment after the call returns, once it has been chunked and embedded. Don't add and immediately search the same content in a tight loop. - Facts need an LLM: configure
GM_LLM(Ollama or any OpenAI-compatible API) to get distilled subject/predicate/object facts and the/v1/profileview. Without one, storage and hybrid search still work fully. - Official TypeScript and Python SDKs generated from the OpenAPI spec are planned for v0.2; until then the snippets above are the whole integration.