Deployment

Google Cloud

greatmemory is just a container. One image (~158 MB), one port (7437), and either one volume (/data) or a Postgres URL (GM_DB). Any Google Cloud service that runs containers works — Cloud Run with Cloud SQL is the recommended path, a GCE VM with Docker Compose the stateful-disk alternative.

Environment variables

VariablePurpose
GM_HOSTBind address — the published image already sets 0.0.0.0
GM_PORTBind/container port (default 7437)
GM_DATA_DIRData directory — the image sets /data (SQLite db + embedding model cache)
GM_DBPostgres URL (postgres://...) to use Cloud SQL instead of SQLite
GM_API_KEYSComma-separated bearer keys — required on any non-loopback bind
GM_CORS_ORIGINSExact allowed origins; replaces the permissive localhost default
GM_EMBEDDERfastembed (default, local ONNX) | ollama | openai
GM_LLMnone (default) | ollama | openai — enables fact extraction

Path A: Cloud Run + Cloud SQL

Cloud Run's filesystem is ephemeral — anything written to /data disappears when the instance is recycled. So on Cloud Run, SQLite is not an option for durable state: use Cloud SQL for PostgreSQL with the pgvector extension via GM_DB. (The /data model cache still works; it just re-downloads on cold starts.)

  1. Create the database: a Cloud SQL for PostgreSQL instance (pgvector is supported on current versions). Enable the extension once in your database:
CREATE EXTENSION IF NOT EXISTS vector;

If your Cloud SQL role can't run CREATE EXTENSION (a DBA provisions it for you), have a privileged role create it once and set GM_DB_ASSUME_PGVECTOR=1 — see Enterprise database (pgvector).

  1. Deploy:
gcloud run deploy greatmemory \
  --image <region>-docker.pkg.dev/<project>/<repo>/greatmemory:latest \
  --port 7437 \
  --cpu 1 --memory 2Gi \
  --min-instances 1 --max-instances 1 \
  --add-cloudsql-instances <project>:<region>:<instance> \
  --set-secrets GM_DB=gm-db-url:latest,GM_API_KEYS=gm-api-keys:latest \
  --set-env-vars GM_CORS_ORIGINS=https://app.example.com \
  --no-allow-unauthenticated   # or --allow-unauthenticated and rely on GM_API_KEYS

Notes:

  • --min-instances 1 matters. The embedding model loads into the instance's memory (and downloads into the ephemeral /data on a fresh instance). With scale-to-zero, every cold start pays that load — keep one instance warm.
  • Connecting to Cloud SQL — two options:
    • --add-cloudsql-instances (unix socket): the Cloud SQL connector exposes a socket at /cloudsql/<project>:<region>:<instance>; your GM_DB URL must reference it as the host (Postgres URLs encode a socket directory as a URL-encoded host parameter). Simple IAM-controlled setup, no IPs to manage.
    • Private IP: give the Cloud SQL instance a private IP, attach the Cloud Run service to the same VPC (Direct VPC egress or a connector), and use an ordinary postgres://gm:pw@10.x.x.x:5432/gm URL in GM_DB. This is the more conventional URL shape and avoids socket-path encoding.
  • Secrets: store the database URL and API keys in Secret Manager (--set-secrets), not plain env vars.
  • TLS is handled by Cloud Run's HTTPS endpoint automatically; the container speaks plain HTTP on 7437.
  • Health checks: point the startup/readiness probe at /v1/readyz (200 only once storage is reachable and migrated) and liveness at /v1/healthz; both are always unauthenticated. See Upgrades & migrations.
  • Cloud Run can run multiple instances; with Postgres that is safe. Keep --max-instances 1 only if you have a reason to serialize writes — the engine itself is fine with several replicas on one database.

Path B: GCE VM + Docker Compose (stateful disk)

If you'd rather have SQLite and a real disk, a small Compute Engine VM is the straightforward alternative:

  1. Create a VM (e2-medium is plenty) with a persistent disk; install Docker and the compose plugin.
  2. Run greatmemory with the data dir on the persistent disk:
# /opt/greatmemory/docker-compose.yml
services:
  greatmemory:
    image: greatmemory          # your pushed image tag (e.g. in Artifact Registry)
    restart: unless-stopped
    ports:
      - "7437:7437"
    volumes:
      - /mnt/disks/gm-data:/data
    environment:
      GM_API_KEYS: ${GM_API_KEYS}
      GM_CORS_ORIGINS: https://app.example.com
docker compose up -d
curl -s http://127.0.0.1:7437/v1/readyz
  1. Expose it: keep the VM private and reach it over the VPC/IAP, or put an HTTPS load balancer (or nginx/Caddy on the VM) in front for TLS. If you open port 7437 in a firewall rule, restrict the source ranges — and GM_API_KEYS is mandatory.
  2. Backups: schedule persistent-disk snapshots; the whole SQLite store is one file (/mnt/disks/gm-data/greatmemory.db).

Storing the API key in Secret Manager

Create the secret once:

printf 'gm_%s' "$(openssl rand -hex 32)" | \
  gcloud secrets create gm-api-keys --data-file=-

Cloud Run — the --set-secrets GM_API_KEYS=gm-api-keys:latest flag in Path A mounts it as the env var. The service's runtime service account needs access:

gcloud secrets add-iam-policy-binding gm-api-keys \
  --member "serviceAccount:<runtime-sa>@<project>.iam.gserviceaccount.com" \
  --role roles/secretmanager.secretAccessor

Because the mapping pins :latest, rotation = add a new version (old + new keys comma-separated during the transition) and redeploy a revision, then add a final version without the old key.

GCE — fetch at boot with the VM's service account (same IAM binding):

umask 077
echo "GM_API_KEYS=$(gcloud secrets versions access latest \
  --secret gm-api-keys)" > /opt/greatmemory/.env

Security checklist

  • GM_API_KEYS set (long random values, in Secret Manager) — never expose the port without it
  • TLS terminated by Cloud Run / a load balancer; the container itself speaks plain HTTP
  • GM_CORS_ORIGINS set to your real origins
  • Cloud SQL reachable only via the connector or private IP — no public IP on the database
  • Durable state in Cloud SQL automated backups or persistent-disk snapshots
  • /v1/readyz for probes; watch rss_bytes from /v1/stats — it should stay flat