Agents

Custom agents (REST)

For chat apps, custom agents, and anything else that speaks HTTP, use the REST API directly. Three calls cover the whole memory loop:

  1. POST /v1/memories — store what the user said worth keeping
  2. POST /v1/search with mode: "context" — fetch a ready-to-inject context block before each model call
  3. POST /v1/search with mode: "recall" — raw scored chunks + facts if you assemble prompts yourself

Start the server with gmem serve (default http://localhost:7437). CORS already allows any localhost origin out of the box, so browser-based local apps work without configuration; set GM_CORS_ORIGINS for deployed origins. When GM_API_KEYS is set, send Authorization: Bearer <key> with every request.

Use one space per end user (or per agent) to keep memories isolated.

curl

# Add a memory
curl -s http://localhost:7437/v1/memories \
  -H 'Content-Type: application/json' \
  -d '{"content": "User prefers concise answers and works in UTC+5:30.", "space": "user-42"}'

# Search (scored chunks + facts)
curl -s http://localhost:7437/v1/search \
  -H 'Content-Type: application/json' \
  -d '{"query": "user preferences", "space": "user-42"}'

# Ready-to-inject context block
curl -s http://localhost:7437/v1/search \
  -H 'Content-Type: application/json' \
  -d '{"query": "user preferences", "space": "user-42", "mode": "context", "max_tokens": 1000}'

# All active facts grouped by predicate
curl -s 'http://localhost:7437/v1/profile?space=user-42'

Python (requests)

import requests

GM = "http://localhost:7437"
SPACE = "user-42"
# headers={"Authorization": "Bearer gm_..."} when GM_API_KEYS is set
session = requests.Session()

def remember(content: str) -> str:
    r = session.post(f"{GM}/v1/memories", json={"content": content, "space": SPACE})
    r.raise_for_status()
    return r.json()["id"]  # 202: ingestion continues in the background

def memory_context(query: str, max_tokens: int = 1000) -> str:
    r = session.post(f"{GM}/v1/search", json={
        "query": query, "space": SPACE, "mode": "context", "max_tokens": max_tokens,
    })
    r.raise_for_status()
    return r.json()["context"]

def profile() -> dict:
    r = session.get(f"{GM}/v1/profile", params={"space": SPACE})
    r.raise_for_status()
    return r.json()["facts"]

# Before each LLM call:
context = memory_context(user_message)
system_prompt = f"You have these memories about the user:\n{context}"
# ... call your model with system_prompt ...

# After the turn, store anything durable:
remember("User's name is Priya; she is building a Flutter app.")

TypeScript (fetch)

const GM = "http://localhost:7437";
const SPACE = "user-42";
// Add { Authorization: "Bearer gm_..." } to headers when GM_API_KEYS is set.

async function remember(content: string): Promise<string> {
  const res = await fetch(`${GM}/v1/memories`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ content, space: SPACE }),
  });
  if (!res.ok) throw new Error(`remember failed: ${res.status}`);
  const { id } = await res.json();
  return id;
}

async function memoryContext(query: string, maxTokens = 1000): Promise<string> {
  const res = await fetch(`${GM}/v1/search`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ query, space: SPACE, mode: "context", max_tokens: maxTokens }),
  });
  if (!res.ok) throw new Error(`search failed: ${res.status}`);
  const { context } = await res.json();
  return context;
}

async function memoryProfile(): Promise<Record<string, unknown>> {
  const res = await fetch(`${GM}/v1/profile?space=${SPACE}`);
  if (!res.ok) throw new Error(`profile failed: ${res.status}`);
  const { facts } = await res.json();
  return facts;
}

// Inject memoryContext(userMessage) into your system prompt before each call,
// and remember(...) durable details after each turn.

Generated clients

The server publishes its OpenAPI spec — generated from the same code that serves it — at:

GET /v1/openapi.json

Point any OpenAPI client generator at it to get a typed client for your language. The full HTTP surface (memories CRUD, search, facts, profile, spaces, stats, health) is in the spec.

Tips

  • Adds are asynchronous (202): a memory becomes searchable a moment after the call returns, once it has been chunked and embedded. Don't add and immediately search the same content in a tight loop.
  • Facts need an LLM: configure GM_LLM (Ollama or any OpenAI-compatible API) to get distilled subject/predicate/object facts and the /v1/profile view. Without one, storage and hybrid search still work fully.
  • Official TypeScript and Python SDKs generated from the OpenAPI spec are planned for v0.2; until then the snippets above are the whole integration.