OpenAI SDK
The gateway is a drop-in replacement for the OpenAI API. Point the official openai Python SDK at the gateway and all requests are proxied to the model provider with automatic memory.
Before you begin
Complete the OpenAI Developer quickstart first. It covers SDK installation, API keys, and the base project setup.
Configure the client
Create an OpenAI client with your gateway API key and base URL.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://gateway-api.mastra.ai/v1",
)All subsequent examples use this client instance.
Chat completions
Send a standard chat completion request.
completion = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "What is 2+2? Reply with just the number."}
],
max_tokens=20,
)
print(completion.choices[0].message.content)
# "4"System messages
Set the model's behavior with a system message in the messages list.
completion = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "system", "content": "You are a calculator. Only respond with numbers, no words."},
{"role": "user", "content": "What is 10 * 5?"},
],
max_tokens=100,
)
print(completion.choices[0].message.content)
# "50"
Multi-turn conversations
Pass the full conversation history in the messages list so the model retains context across turns.
completion = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "Remember this word: banana"},
{"role": "assistant", "content": "Got it, I will remember it."},
{"role": "user", "content": "What word did I ask you to remember? Reply with just the word."},
],
max_tokens=100,
)
print(completion.choices[0].message.content)
# "banana"
Streaming
Pass stream=True to receive chunks incrementally.
stream = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "Count from 1 to 5, separated by commas."}
],
max_tokens=50,
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# "1, 2, 3, 4, 5"Memory with thread and resource IDs
Pass x-thread-id and x-resource-id via extra_headers to enable observational memory. The gateway stores observations per thread and injects them as context on subsequent requests.
# First request: introduce yourself
client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "My name is Alex and I prefer concise answers."}
],
extra_headers={
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
)
# Second request: the gateway remembers
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "What is my name?"}
],
extra_headers={
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
)
print(response.choices[0].message.content)
# "Alex"Streaming with memory
Combine streaming with memory headers to receive incremental responses that reference prior observations.
stream = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{"role": "user", "content": "Summarize what you know about me."}
],
stream=True,
extra_headers={
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Related
- Features: Observational memory, streaming, BYOK, and gateway tools
- Models: Supported providers and model routing
- API reference: Complete endpoint documentation