Features

The Mastra Memory Gateway combines an OpenAI-compatible API proxy with built-in memory and tool capabilities. This page covers each feature in detail.

Observational Memory

Observational Memory is the gateway's core differentiator. It automatically extracts and stores observations from every conversation, then injects relevant context into future requests. Your application does not need any memory management code.

How it works

Your app sends a request with x-thread-id and x-resource-id headers
The gateway loads existing observations for that thread and resource
Observations are injected into the context
The request is proxied to the model provider
New observations are extracted from the response and stored
The response is streamed back to your app

Your application only sends the current message. The gateway handles all context assembly behind the scenes.

Thread and resource headers

Observations are scoped per thread. Each thread maintains its own observation history.

Thread ID (x-thread-id): Groups messages into a conversation. Use a unique ID per conversation, topic, or session.
Resource ID (x-resource-id): Identifies the end user or entity that owns the thread. Use this to associate threads with a user, account, or organization in your application.

Both headers are required to activate Observational Memory. If x-thread-id is provided without x-resource-id, the gateway returns a 400 error.

note

A thread is bound to its resource ID on creation. If a subsequent request sends a different x-resource-id for the same x-thread-id, the gateway rejects it with a 404 because one resource cannot access another resource's thread.

curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-thread-id: topic-physics-101" \
  -H "x-resource-id: student-jane" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{ "role": "user", "content": "Explain Newton'\''s first law." }]
  }'

Memory configuration

Configure observational memory per project in the Memory Gateway dashboard in the Settings tab.

observationTokens?:

number

= 30000

Maximum token budget for the observation context injected into prompts.

reflectionTokens?:

number

= 40000

Maximum token budget for reflection summaries generated from observations.

note

The observer and reflection models are not configurable. The gateway selects these models automatically to improve service quality.

Model selection

Switch between models by changing the model field in your request. No provider configuration, SDK changes, or API key swaps required.

# Use Claude
curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "anthropic/claude-sonnet-4-6", "messages": [{ "role": "user", "content": "Hello!" }] }'

# Switch to GPT, same endpoint, same key
curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "openai/gpt-5.4", "messages": [{ "role": "user", "content": "Hello!" }] }'

The gateway supports hundreds of models across multiple providers. See the Models page for the full list of supported providers and routing details.

Bring your own key

By default, requests are routed through shared infrastructure with no provider keys needed. With Bring Your Own Key (BYOK), you use your own provider API keys for direct access to OpenAI, Anthropic, Google, and other providers while keeping the gateway's memory and tool features.

note

BYOK is available on the Teams plan and above.

Configure in the dashboard

Add provider keys in the Memory Gateway dashboard under Settings → Bring your own key:

Open your project settings
In the Bring your own key section, click Add key
Select a provider (OpenAI, Anthropic, Google, or a custom provider)
Paste your provider API key

Once configured, all requests for that provider are routed directly using your key.

Per-request pass-through

To send a provider key with a single request instead of storing it, pass your provider key in the Authorization header and your Mastra key in X-Memory-Gateway-Authorization:

curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "X-Memory-Gateway-Authorization: Bearer YOUR_MASTRA_KEY" \
  -H "Authorization: Bearer YOUR_OPENAI_KEY" \
  -H "Content-Type: application/json" \
  -H "x-thread-id: my-thread" \
  -H "x-resource-id: user-42" \
  -d '{
    "model": "openai/gpt-5.4",
    "messages": [{ "role": "user", "content": "Hello!" }]
  }'

The gateway authenticates with the Mastra key, resolves the provider from the model ID, and forwards the request using your provider key. Memory features continue to work normally.

API compatibility

The gateway exposes three proxy endpoints that match the native API formats. Visit the API reference for full details on each endpoint.

OpenAI Chat Completions API

OpenAI Chat Completions format. Works with any OpenAI-compatible SDK or HTTP client.

Anthropic Messages API

Anthropic Messages API format. Use the x-api-key header or Anthropic SDK for authentication. A successful response returns a content array containing the model's reply.

OpenAI Responses API

OpenAI Responses format for multi-turn, agentic workflows.

Gateway tools

The gateway can inject server-side tools into requests. Tools are added transparently so that the model sees them as available functions and the gateway handles execution.

Web search

The web_search tool gives models access to current information from the web. Enable it per project in the dashboard or per request with the x-gateway-tools header.

# Enable web search for this request
curl https://gateway-api.mastra.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "x-gateway-tools: web_search" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{ "role": "user", "content": "What happened in tech news today?" }]
  }'

Tool header overrides

The x-gateway-tools header controls tool injection per request:

Header value	Behavior
`web_search`	Enable web search (even if not in project config)
`none`	Disable all gateway tools for this request
(omitted)	Fall back to project-level tool configuration

Gateway tools are only injected when the model doesn't already define a tool with the same name in the request body.

Streaming

All three proxy endpoints support streaming. The gateway passes through the upstream stream as-is, so standard SDK streaming patterns work without changes.

Python
TypeScript

main.py

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gateway-api.mastra.ai/v1",
)

stream = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
    extra_headers={
        "x-thread-id": "story-thread",
        "x-resource-id": "user-1",
    },
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

index.ts

import OpenAI from 'openai'

const client = new OpenAI({
	apiKey: 'YOUR_API_KEY',
	baseURL: 'https://gateway-api.mastra.ai/v1',
})

const stream = await client.chat.completions.create(
	{
		model: 'google/gemini-2.5-flash',
		messages: [{ role: 'user', content: 'Tell me a story.' }],
		stream: true,
	},
	{
		headers: {
			'x-thread-id': 'story-thread',
			'x-resource-id': 'user-1',
		},
	},
)

for await (const chunk of stream) {
	const content = chunk.choices[0]?.delta?.content
	if (content) process.stdout.write(content)
}

Authentication

The gateway supports two authentication modes:

Mode	Headers	Use case
Direct	`Authorization: Bearer msk_...`	Standard usage with gateway-managed provider keys
Pass-through (BYOK)	`X-Memory-Gateway-Authorization: Bearer msk_...` + `Authorization: Bearer <provider-key>`	Use your own provider key with gateway memory

All API keys use the msk_ prefix and are created from the Mastra dashboard.

The Anthropic SDK sends credentials via x-api-key instead of Authorization. The gateway accepts both formats, so the Anthropic SDK works without any auth workarounds:

main.py

import anthropic

# x-api-key is sent automatically by the SDK
client = anthropic.Anthropic(
    api_key="msk_...",
    base_url="https://gateway-api.mastra.ai/v1",
)

Memory API

In addition to automatic memory through proxy requests, the gateway provides a REST API for direct memory management. Use it to create threads, retrieve conversation history, and inspect observations.

See the API reference for the full endpoint documentation.

Observational Memory​

How it works​

Thread and resource headers​

Memory configuration​

Model selection​

Bring your own key​

Configure in the dashboard​

Per-request pass-through​

API compatibility​

OpenAI Chat Completions API​

Anthropic Messages API​

OpenAI Responses API​

Gateway tools​

Web search​

Tool header overrides​

Streaming​

Authentication​

Memory API​

Observational Memory

How it works

Thread and resource headers

Memory configuration

Model selection

Bring your own key

Configure in the dashboard

Per-request pass-through

API compatibility

OpenAI Chat Completions API

Anthropic Messages API

OpenAI Responses API

Gateway tools

Web search

Tool header overrides

Streaming

Authentication

Memory API