Features
The Mastra Memory Gateway combines an OpenAI-compatible API proxy with built-in memory and tool capabilities. This page covers each feature in detail.
Observational Memory
Observational Memory is the gateway's core differentiator. It automatically extracts and stores observations from every conversation, then injects relevant context into future requests. Your application does not need any memory management code.
How it works
- Your app sends a request with
x-thread-idandx-resource-idheaders - The gateway loads existing observations for that thread and resource
- Observations are injected into the context
- The request is proxied to the model provider
- New observations are extracted from the response and stored
- The response is streamed back to your app
Your application only sends the current message. The gateway handles all context assembly behind the scenes.
Thread and resource headers
Observations are scoped per thread. Each thread maintains its own observation history.
- Thread ID (
x-thread-id): Groups messages into a conversation. Use a unique ID per conversation, topic, or session. - Resource ID (
x-resource-id): Identifies the end user or entity that owns the thread. Use this to associate threads with a user, account, or organization in your application.
Both headers are required to activate Observational Memory. If x-thread-id is provided without x-resource-id, the gateway returns a 400 error.
A thread is bound to its resource ID on creation. If a subsequent request sends a different x-resource-id for the same x-thread-id, the gateway rejects it with a 404 because one resource cannot access another resource's thread.
curl https://gateway-api.mastra.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "x-thread-id: topic-physics-101" \
-H "x-resource-id: student-jane" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [{ "role": "user", "content": "Explain Newton'\''s first law." }]
}'
Memory configuration
Configure observational memory per project in the Memory Gateway dashboard under Settings → Observational Memory Thresholds.
Model selection
Switch between models by changing the model field in your request. No provider configuration, SDK changes, or API key swaps required.
# Use Claude
curl https://gateway-api.mastra.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "model": "anthropic/claude-sonnet-4-6", "messages": [{ "role": "user", "content": "Hello!" }] }'
# Switch to GPT, same endpoint, same key
curl https://gateway-api.mastra.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "model": "openai/gpt-5.4", "messages": [{ "role": "user", "content": "Hello!" }] }'
The gateway supports hundreds of models across multiple providers. See the Models page for the full list of supported providers and routing details.
Bring your own key
By default, requests are routed through shared infrastructure with no provider keys needed. With Bring Your Own Key (BYOK), you use your own provider API keys for direct access to OpenAI, Anthropic, Google, and other providers while keeping the gateway's memory and tool features.
BYOK is available on the Teams plan and above.
Configure in the dashboard
Add provider keys in the Memory Gateway dashboard under Settings → Bring your own key:
- Open your project settings
- In the Bring your own key section, click Add key
- Select a provider (OpenAI, Anthropic, Google, or a custom provider)
- Paste your provider API key
Once configured, all requests for that provider are routed directly using your key.
Per-request pass-through
To send a provider key with a single request instead of storing it, pass your provider key in the Authorization header and your Mastra key in X-Memory-Gateway-Authorization:
curl https://gateway-api.mastra.ai/v1/chat/completions \
-H "X-Memory-Gateway-Authorization: Bearer YOUR_MASTRA_KEY" \
-H "Authorization: Bearer YOUR_OPENAI_KEY" \
-H "Content-Type: application/json" \
-H "x-thread-id: my-thread" \
-H "x-resource-id: user-42" \
-d '{
"model": "openai/gpt-5.4",
"messages": [{ "role": "user", "content": "Hello!" }]
}'
The gateway authenticates with the Mastra key, resolves the provider from the model ID, and forwards the request using your provider key. Memory features continue to work normally.
API compatibility
The gateway exposes three proxy endpoints that match the native API formats. Visit the API reference for full details on each endpoint.
OpenAI Chat Completions API
OpenAI Chat Completions format. Works with any OpenAI-compatible SDK or HTTP client.
Anthropic Messages API
Anthropic Messages API format. Use the x-api-key header or Anthropic SDK for authentication. A successful response returns a content array containing the model's reply.
OpenAI Responses API
OpenAI Responses format for multi-turn, agentic workflows.
Gateway tools
The gateway can inject server-side tools into requests. Tools are added transparently so that the model sees them as available functions and the gateway handles execution.
Web search
The web_search tool gives models access to current information from the web. Enable it per project in the dashboard or per request with the x-gateway-tools header.
# Enable web search for this request
curl https://gateway-api.mastra.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "x-gateway-tools: web_search" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [{ "role": "user", "content": "What happened in tech news today?" }]
}'
Tool header overrides
The x-gateway-tools header controls tool injection per request:
| Header value | Behavior |
|---|---|
web_search | Enable web search (even if not in project config) |
none | Disable all gateway tools for this request |
| (omitted) | Fall back to project-level tool configuration |
Gateway tools are only injected when the model doesn't already define a tool with the same name in the request body.
Streaming
All three proxy endpoints support streaming. The gateway passes through the upstream stream as-is, so standard SDK streaming patterns work without changes.
- Python
- TypeScript
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://gateway-api.mastra.ai/v1",
)
stream = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
extra_headers={
"x-thread-id": "story-thread",
"x-resource-id": "user-1",
},
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://gateway-api.mastra.ai/v1",
});
const stream = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [{ role: "user", content: "Tell me a story." }],
stream: true,
}, {
headers: {
"x-thread-id": "story-thread",
"x-resource-id": "user-1",
},
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}Authentication
The gateway supports two authentication modes:
| Mode | Headers | Use case |
|---|---|---|
| Direct | Authorization: Bearer msk_... | Standard usage with gateway-managed provider keys |
| Pass-through (BYOK) | X-Memory-Gateway-Authorization: Bearer msk_... + Authorization: Bearer <provider-key> | Use your own provider key with gateway memory |
All API keys use the msk_ prefix and are created from the Mastra dashboard.
The Anthropic SDK sends credentials via x-api-key instead of Authorization. The gateway accepts both formats, so the Anthropic SDK works without any auth workarounds:
import anthropic
# x-api-key is sent automatically by the SDK
client = anthropic.Anthropic(
api_key="msk_...",
base_url="https://gateway-api.mastra.ai/v1",
)Memory API
In addition to automatic memory through proxy requests, the gateway provides a REST API for direct memory management. Use it to create threads, retrieve conversation history, and inspect observations.
See the API reference for the full endpoint documentation.