Skip to main content

Features

The Mastra Memory Gateway combines an OpenAI-compatible API proxy with built-in memory and tool capabilities. This page covers each feature in detail.

Observational Memory

Observational Memory is the gateway's core differentiator. It automatically extracts and stores observations from every conversation, then injects relevant context into future requests. Your application does not need any memory management code.

How it works

  1. Your app sends a request with x-thread-id and x-resource-id headers
  2. The gateway loads existing observations for that thread and resource
  3. Observations are injected into the context
  4. The request is proxied to the model provider
  5. New observations are extracted from the response and stored
  6. The response is streamed back to your app

Your application only sends the current message. The gateway handles all context assembly behind the scenes.

Thread and resource headers

Observations are scoped per thread. Each thread maintains its own observation history.

  • Thread ID (x-thread-id): Groups messages into a conversation. Use a unique ID per conversation, topic, or session.
  • Resource ID (x-resource-id): Identifies the end user or entity that owns the thread. Use this to associate threads with a user, account, or organization in your application.

Both headers are required to activate Observational Memory. If x-thread-id is provided without x-resource-id, the gateway returns a 400 error.

note

A thread is bound to its resource ID on creation. If a subsequent request sends a different x-resource-id for the same x-thread-id, the gateway rejects it with a 404 because one resource cannot access another resource's thread.

curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-thread-id: topic-physics-101" \
  -H "x-resource-id: student-jane" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{ "role": "user", "content": "Explain Newton'\''s first law." }]
  }'

Memory configuration

Configure observational memory per project in the Memory Gateway dashboard under Settings → Observational Memory Thresholds.

observationTokens?:
number
= 30000
Maximum token budget for the observation context injected into prompts.
reflectionTokens?:
number
= 40000
Maximum token budget for reflection summaries generated from observations.

Model selection

Switch between models by changing the model field in your request. No provider configuration, SDK changes, or API key swaps required.

# Use Claude
curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "anthropic/claude-sonnet-4-6", "messages": [{ "role": "user", "content": "Hello!" }] }'

# Switch to GPT, same endpoint, same key
curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "openai/gpt-5.4", "messages": [{ "role": "user", "content": "Hello!" }] }'

The gateway supports hundreds of models across multiple providers. See the Models page for the full list of supported providers and routing details.

Bring your own key

By default, requests are routed through shared infrastructure with no provider keys needed. With Bring Your Own Key (BYOK), you use your own provider API keys for direct access to OpenAI, Anthropic, Google, and other providers while keeping the gateway's memory and tool features.

note

BYOK is available on the Teams plan and above.

Configure in the dashboard

Add provider keys in the Memory Gateway dashboard under Settings → Bring your own key:

  1. Open your project settings
  2. In the Bring your own key section, click Add key
  3. Select a provider (OpenAI, Anthropic, Google, or a custom provider)
  4. Paste your provider API key

Once configured, all requests for that provider are routed directly using your key.

Per-request pass-through

To send a provider key with a single request instead of storing it, pass your provider key in the Authorization header and your Mastra key in X-Memory-Gateway-Authorization:

curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "X-Memory-Gateway-Authorization: Bearer YOUR_MASTRA_KEY" \
  -H "Authorization: Bearer YOUR_OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -H "x-thread-id: my-thread" \
  -H "x-resource-id: user-42" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [{ "role": "user", "content": "Hello!" }]
  }'

The gateway authenticates with the Mastra key, resolves the provider from the model ID, and forwards the request using your provider key. Memory features continue to work normally.

API compatibility

The gateway exposes three proxy endpoints that match the native API formats. Visit the API reference for full details on each endpoint.

OpenAI Chat Completions API

OpenAI Chat Completions format. Works with any OpenAI-compatible SDK or HTTP client.

Anthropic Messages API

Anthropic Messages API format. Use the x-api-key header or Anthropic SDK for authentication. A successful response returns a content array containing the model's reply.

OpenAI Responses API

OpenAI Responses format for multi-turn, agentic workflows.

Gateway tools

The gateway can inject server-side tools into requests. Tools are added transparently so that the model sees them as available functions and the gateway handles execution.

The web_search tool gives models access to current information from the web. Enable it per project in the dashboard or per request with the x-gateway-tools header.

# Enable web search for this request
curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-gateway-tools: web_search" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{ "role": "user", "content": "What happened in tech news today?" }]
  }'

Tool header overrides

The x-gateway-tools header controls tool injection per request:

Header valueBehavior
web_searchEnable web search (even if not in project config)
noneDisable all gateway tools for this request
(omitted)Fall back to project-level tool configuration

Gateway tools are only injected when the model doesn't already define a tool with the same name in the request body.

Streaming

All three proxy endpoints support streaming. The gateway passes through the upstream stream as-is, so standard SDK streaming patterns work without changes.

main.py
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gateway-api.mastra.ai/v1",
)

stream = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
    extra_headers={
        "x-thread-id": "story-thread",
        "x-resource-id": "user-1",
    },
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Authentication

The gateway supports two authentication modes:

ModeHeadersUse case
DirectAuthorization: Bearer msk_...Standard usage with gateway-managed provider keys
Pass-through (BYOK)X-Memory-Gateway-Authorization: Bearer msk_... + Authorization: Bearer <provider-key>Use your own provider key with gateway memory

All API keys use the msk_ prefix and are created from the Mastra dashboard.

The Anthropic SDK sends credentials via x-api-key instead of Authorization. The gateway accepts both formats, so the Anthropic SDK works without any auth workarounds:

main.py
import anthropic

# x-api-key is sent automatically by the SDK
client = anthropic.Anthropic(
    api_key="msk_...",
    base_url="https://gateway-api.mastra.ai/v1",
)

Memory API

In addition to automatic memory through proxy requests, the gateway provides a REST API for direct memory management. Use it to create threads, retrieve conversation history, and inspect observations.

See the API reference for the full endpoint documentation.