Quickstart
The Mastra Memory Gateway is an OpenAI-compatible API proxy with built-in Observational Memory. Point any HTTP client, SDK, or framework at the gateway and every conversation is automatically remembered without any memory management code.
Create an account & get an API key
Go to gateway.mastra.ai and sign up for a Mastra account. During the onboarding you'll get your personal API key to authenticate requests. Copy the key to a safe location, you'll need it in the next step.
Make your first request
The gateway is compatible with the OpenAI Chat Completions API. Replace YOUR_API_KEY with your msk_ key. A successful response returns a JSON object with a choices array containing the model's reply.
- cURL
- Python
- TypeScript
curl https://gateway-api.mastra.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "Hello World!" }
]
}'from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://gateway-api.mastra.ai/v1",
)
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "Hello World!"}
],
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://gateway-api.mastra.ai/v1",
});
const response = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "Hello World!" },
],
});
console.log(response.choices[0].message.content);The gateway routes the request to the model provider, returns the response, and automatically stores observations from the conversation.
Add memory with thread and resource IDs
To enable persistent memory across requests, pass x-thread-id and x-resource-id headers. The gateway uses these to scope observations so that all requests with the same thread ID share the same conversation context.
- cURL
- Python
- TypeScript
curl https://gateway-api.mastra.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "x-thread-id: my-thread-1" \
-H "x-resource-id: user-42" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "My name is Alex and I prefer concise answers." }
]
}'Send a follow-up request with the same x-thread-id. The gateway injects prior observations into the context automatically, so the model remembers details from earlier messages without your app managing any context.
curl https://gateway-api.mastra.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "x-thread-id: my-thread-1" \
-H "x-resource-id: user-42" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [
{ "role": "user", "content": "What is my name?" }
]
}'response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "My name is Alex and I prefer concise answers."}
],
extra_headers={
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
)Send a follow-up request with the same x-thread-id. The gateway injects prior observations into the context automatically, so the model remembers details from earlier messages without your app managing any context.
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "What is my name?"}
],
extra_headers={
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
)const response = await client.chat.completions.create(
{
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "My name is Alex and I prefer concise answers." },
],
},
{
headers: {
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
},
);Send a follow-up request with the same x-thread-id. The gateway injects prior observations into the context automatically, so the model remembers details from earlier messages without your app managing any context.
const response = await client.chat.completions.create(
{
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "What is my name?" },
],
},
{
headers: {
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
},
);The model responds with "Alex" because the gateway loaded the observations from the previous request and injected them as context.
Open the Threads page in the Memory Gateway dashboard to see your conversations and the observations extracted from each exchange.