OpenAI SDK
The gateway is a drop-in replacement for the OpenAI API. Point the official openai TypeScript SDK at the gateway and all requests are proxied to the model provider with automatic memory.
Before you begin
Complete the OpenAI Developer quickstart first. It covers SDK installation, API keys, and the base project setup.
Configure the client
Create an OpenAI client with your gateway API key and base URL.
import OpenAI from "openai";
export const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://gateway-api.mastra.ai/v1",
});All subsequent examples import this client instance from ./gateway.
Chat completions
Send a standard chat completion request. The gateway routes it to the model provider and returns an OpenAI-compatible response.
import { client } from "./gateway";
const completion = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "What is 2+2? Reply with just the number." },
],
max_tokens: 20,
});
console.log(completion.choices[0].message.content);
// "4"System messages
Set the model's behavior with a system message in the messages array.
const completion = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [
{ role: "system", content: "You are a calculator. Only respond with numbers, no words." },
{ role: "user", content: "What is 10 * 5?" },
],
max_tokens: 100,
});
console.log(completion.choices[0].message.content);
// "50"
Multi-turn conversations
Pass the full conversation history in the messages array so the model retains context across turns.
const completion = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "Remember this word: banana" },
{ role: "assistant", content: "Got it, I will remember it." },
{ role: "user", content: "What word did I ask you to remember? Reply with just the word." },
],
max_tokens: 100,
});
console.log(completion.choices[0].message.content);
// "banana"
Streaming
Pass stream: true to receive chunks incrementally.
import { client } from "./gateway";
const stream = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "Count from 1 to 5, separated by commas." },
],
max_tokens: 50,
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) {
process.stdout.write(delta);
}
}
// "1, 2, 3, 4, 5"Memory with thread and resource IDs
Pass x-thread-id and x-resource-id headers to enable observational memory. The gateway stores observations per thread and injects them as context on subsequent requests.
import { client } from "./gateway";
// First request: introduce yourself
await client.chat.completions.create(
{
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "My name is Alex and I prefer concise answers." },
],
},
{
headers: {
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
},
);
// Second request: the gateway remembers
const response = await client.chat.completions.create(
{
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "What is my name?" },
],
},
{
headers: {
"x-thread-id": "my-thread-1",
"x-resource-id": "user-42",
},
},
);
console.log(response.choices[0].message.content);
// "Alex"Tool calling
Define tools using the standard OpenAI function calling format. The gateway forwards tool definitions to the provider.
import OpenAI from "openai";
import { client } from "./gateway";
const helloWorldTool: OpenAI.ChatCompletionTool = {
type: "function",
function: {
name: "helloWorld",
description: "Returns a greeting for the given name",
parameters: {
type: "object",
properties: {
name: { type: "string", description: "The name to greet" },
},
required: ["name"],
},
},
};
// Step 1: Send the request with tools
const first = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "Use the helloWorld tool to greet Alice." },
],
tools: [helloWorldTool],
tool_choice: { type: "function", function: { name: "helloWorld" } },
max_tokens: 200,
});
const toolCall = first.choices[0].message.tool_calls![0];
const args = JSON.parse(toolCall.function.arguments);
// args.name === "Alice"
// Execute the tool locally
const result = { greeting: `Hello, ${args.name}!` };
// Step 2: Send the tool result back
const second = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "Use the helloWorld tool to greet Alice." },
first.choices[0].message,
{
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(result),
},
],
tools: [helloWorldTool],
max_tokens: 200,
});
console.log(second.choices[0].message.content);
// "Hello, Alice!"Streaming tool calls
Combine stream: true with tools to receive tool call arguments as incremental deltas.
const stream = await client.chat.completions.create({
model: "google/gemini-2.5-flash",
messages: [
{ role: "user", content: "Use the helloWorld tool to greet Bob." },
],
tools: [helloWorldTool],
tool_choice: { type: "function", function: { name: "helloWorld" } },
max_tokens: 200,
stream: true,
});
const toolCalls: Record<number, { id?: string; name?: string; args: string }> = {};
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
for (const tc of delta?.tool_calls ?? []) {
const idx = tc.index ?? 0;
const entry = toolCalls[idx] ?? (toolCalls[idx] = { args: "" });
if (tc.id) entry.id = tc.id;
if (tc.function?.name && !entry.name) entry.name = tc.function.name;
if (tc.function?.arguments) entry.args += tc.function.arguments;
}
}
const firstCall = toolCalls[0];
console.log(firstCall.name); // "helloWorld"
console.log(JSON.parse(firstCall.args)); // { name: "Bob" }
Responses API
The gateway also proxies the OpenAI Responses API (POST /v1/responses).
import { client } from "./gateway";
const response = await client.responses.create({
model: "openai/gpt-5.4-mini",
input: "What is 3+3? Reply with just the number.",
max_output_tokens: 10,
});
const message = response.output[0];
if (message.type === "message") {
const text = message.content[0];
if (text.type === "output_text") {
console.log(text.text); // "6"
}
}Streaming responses
Pass stream: true to receive response events incrementally.
const stream = await client.responses.create({
model: "openai/gpt-5.4-mini",
input: "Count from 1 to 5, separated by commas.",
max_output_tokens: 50,
stream: true,
});
for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
}
BYOK pass-through
BYOK is available on the Teams plan and above. To use your own provider API key instead of the gateway's provisioned keys, send both keys:
import OpenAI from "openai";
export const client = new OpenAI({
apiKey: "sk-your-openai-key", // Your provider key
baseURL: "https://gateway-api.mastra.ai/v1",
defaultHeaders: {
"X-Memory-Gateway-Authorization": "Bearer YOUR_API_KEY", // msk_ key
},
});
const completion = await client.chat.completions.create({
model: "openai/gpt-5.4-mini",
messages: [
{ role: "user", content: "Hello!" },
],
});The gateway authenticates with the msk_ key, then forwards your provider key to OpenAI. Memory and observational features still work.
Related
- Features: Observational memory, streaming, BYOK, and gateway tools
- Models: Supported providers and model routing
- API reference: Complete endpoint documentation