greatmemory

Integrations

Google Vertex AI

Use Vertex AI as greatmemory's LLM and embedder via the OpenAI-compatible endpoint - models, OAuth token auth, and embedding dimensions.

Google Vertex AI

Use Google Vertex AI as greatmemory's LLM (fact extraction + prospective reflection) and, optionally, its embedder. Vertex exposes an OpenAI-compatible endpoint, so greatmemory talks to it through the openai provider kind - no code changes, just configuration.

Prefer a plain API key and no Google Cloud project? The Gemini API (Google AI Studio) is the simpler sibling of this page.

How it fits

greatmemory POSTs to {GM_LLM_URL}/chat/completions and {GM_EMBEDDER_URL}/embeddings with an Authorization: Bearer token. GM_*_URL is the full versioned base - greatmemory appends only the operation path - so Vertex's …/endpoints/openapi base slots straight in.

Prerequisites

  • A Google Cloud project with the Vertex AI API enabled and a region (e.g. us-central1), plus access to the Gemini models you intend to call.
  • The gcloud CLI authenticated (gcloud auth login).

Authentication

Vertex uses a short-lived OAuth access token as the bearer - there is no static API key. Mint one with:

gcloud auth print-access-token

The token expires after ~1 hour. That makes Vertex a great fit for experiments and batch ingestion; for a long-running server, restart with a fresh token or front it with a token-refreshing proxy. (Need a static credential instead? Use the Gemini API key.)

Configuration

LOCATION must appear in both the host and the path:

PROJECT=my-gcp-project
LOCATION=us-central1
BASE=https://$LOCATION-aiplatform.googleapis.com/v1/projects/$PROJECT/locations/$LOCATION/endpoints/openapi

# LLM: fact extraction + prospective reflection
GM_LLM=openai \
GM_LLM_URL=$BASE \
GM_LLM_API_KEY=$(gcloud auth print-access-token) \
GM_LLM_MODEL=google/gemini-2.5-flash \
gmem serve

Add Vertex embeddings alongside the LLM vars (text-embedding-005 returns 768 dimensions, so set GM_EMBEDDER_DIM=768):

GM_EMBEDDER=openai \
GM_EMBEDDER_URL=$BASE \
GM_EMBEDDER_API_KEY=$(gcloud auth print-access-token) \
GM_EMBEDDER_MODEL=google/text-embedding-005 \
GM_EMBEDDER_DIM=768

A global location works too - drop the region prefix from the host: https://aiplatform.googleapis.com/v1/projects/$PROJECT/locations/global/endpoints/openapi.

Models

Model IDs take a google/ prefix on Vertex. These are current as of June 2026; check the Vertex model list for newer GA models in your region.

Rolemodel valueNotes
Chat (default)google/gemini-2.5-flashGood price/performance for JSON extraction
Chat (cheapest)google/gemini-2.5-flash-liteLowest cost
Chat (highest quality)google/gemini-2.5-proComplex reasoning
Embeddingsgoogle/text-embedding-005Default 768 dims
Embeddings (large)google/gemini-embedding-001Default 3072 dims

Ingest large Google Cloud sources

Vertex config controls the LLM/embedder. The source data can come from any Google Cloud service as long as you extract text and POST it to greatmemory. Adds are asynchronous, so a simple streaming job is usually enough for batch backfills.

Cloud Storage documents

Use gcloud storage cp to copy a bucket prefix locally, then POST each text-like file into a dedicated space. The Google Cloud CLI documents gcloud storage cp as copying data between Cloud Storage and the local file system.

export GM_URL=http://127.0.0.1:7437
export SPACE=gcp-knowledge-base

mkdir -p /tmp/gm-gcs
gcloud storage cp --recursive gs://my-company-docs/policies /tmp/gm-gcs

find /tmp/gm-gcs -type f \( -name '*.md' -o -name '*.txt' -o -name '*.json' -o -name '*.csv' \) -print0 |
while IFS= read -r -d '' file; do
  jq -n --rawfile content "$file" \
    --arg space "$SPACE" \
    --arg source "gcs:${file#/tmp/gm-gcs/}" \
    '{space:$space, content:("SOURCE: " + $source + "\n\n" + $content)}' |
  curl -sS "$GM_URL/v1/memories" \
    -H 'Content-Type: application/json' \
    -d @- >/dev/null
done

For PDFs, DOCX, or slide decks, run your normal extractor first (for example pdftotext for PDF) and POST the extracted text. Keep the object path in the content prefix so retrieved memories retain provenance.

BigQuery rows

For analytical records, use the bq command-line tool to export the rows you want as newline-delimited JSON, then store one memory per row or per grouped entity.

export GM_URL=http://127.0.0.1:7437
export SPACE=gcp-bigquery

bq query \
  --use_legacy_sql=false \
  --format=json \
  'SELECT customer_id, updated_at, summary
   FROM `my_project.support.case_summaries`
   WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)' |
jq -c '.[]' |
while read -r row; do
  customer=$(jq -r '.customer_id' <<<"$row")
  jq -n \
    --arg space "$SPACE" \
    --arg customer "$customer" \
    --argjson row "$row" \
    '{space:$space, content:("BigQuery customer case summary for " + $customer + ":\n" + ($row|tojson))}' |
  curl -sS "$GM_URL/v1/memories" \
    -H 'Content-Type: application/json' \
    -d @- >/dev/null
done

Cloud SQL for PostgreSQL

For operational data in Cloud SQL, connect with psql and emit JSON rows. Google documents Cloud SQL for PostgreSQL connections with the psql client; from there, the pipeline is the same as any Postgres source.

export GM_URL=http://127.0.0.1:7437
export SPACE=gcp-cloudsql
export PGHOST=<cloud-sql-host-or-connector-host>
export PGDATABASE=appdb
export PGUSER=readonly

psql -At -c "
  select json_build_object(
    'account_id', account_id,
    'title', title,
    'notes', notes,
    'updated_at', updated_at
  )
  from account_notes
  where updated_at > now() - interval '180 days';
" |
while read -r row; do
  account=$(jq -r '.account_id' <<<"$row")
  jq -n \
    --arg space "$SPACE" \
    --arg account "$account" \
    --argjson row "$row" \
    '{space:$space, content:("Cloud SQL account note for " + $account + ":\n" + ($row|tojson))}' |
  curl -sS "$GM_URL/v1/memories" \
    -H 'Content-Type: application/json' \
    -d @- >/dev/null
done

Managed ETL with Dataflow

For scheduled or high-volume imports, run the same extract-to-text pattern in Google Cloud Dataflow and store returned memory ids in a BigQuery manifest table. See Cloud ETL & data management for the Dataflow diagram, Apache Beam job shape, update strategy, and delete-by-manifest cleanup flow.

Notes & caveats

  • Token expiry (~1 hour) is the main operational caveat - see above.
  • Embedding dimension must match exactly. greatmemory validates that the returned vector length equals GM_EMBEDDER_DIM and does not request a reduced dimension, so set the dim to the model's default (768 for text-embedding-005, 3072 for gemini-embedding-001).
  • Switching embedders needs a fresh data dir - existing vectors won't match a new model.
  • Keep the local fastembed embedder and use Vertex for the LLM only if you'd rather not run embeddings through the cloud.

See the Configuration reference for the full GM_* variable table and the feature toggles that the LLM enables (reflection).