Skip to main content

Quickstart

The Mastra Memory Gateway is an OpenAI-compatible API proxy with built-in Observational Memory. Point any HTTP client, SDK, or framework at the gateway and every conversation is automatically remembered without any memory management code.

Create an account & get an API key

Go to gateway.mastra.ai and sign up for a Mastra account. During the onboarding you'll get your personal API key to authenticate requests. Copy the key to a safe location, you'll need it in the next step.

Make your first request

The gateway is compatible with the OpenAI Chat Completions API. Replace YOUR_API_KEY with your msk_ key. A successful response returns a JSON object with a choices array containing the model's reply.

curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      { "role": "user", "content": "Hello World!" }
    ]
  }'

The gateway routes the request to the model provider, returns the response, and automatically stores observations from the conversation.

Add memory with thread and resource IDs

To enable persistent memory across requests, pass x-thread-id and x-resource-id headers. The gateway uses these to scope observations so that all requests with the same thread ID share the same conversation context.

curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-thread-id: my-thread-1" \
  -H "x-resource-id: user-42" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      { "role": "user", "content": "My name is Alex and I prefer concise answers." }
    ]
  }'

Send a follow-up request with the same x-thread-id. The gateway injects prior observations into the context automatically, so the model remembers details from earlier messages without your app managing any context.

curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-thread-id: my-thread-1" \
  -H "x-resource-id: user-42" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [
      { "role": "user", "content": "What is my name?" }
    ]
  }'

The model responds with "Alex" because the gateway loaded the observations from the previous request and injected them as context.

Open the Threads page in the Memory Gateway dashboard to see your conversations and the observations extracted from each exchange.

Next steps