OpenAI SDK

The gateway is a drop-in replacement for the OpenAI API. Point the official openai TypeScript SDK at the gateway and all requests are proxied to the model provider with automatic memory.

Before you begin

Complete the OpenAI Developer quickstart first. It covers SDK installation, API keys, and the base project setup.

Configure the client

Create an OpenAI client with your gateway API key and base URL.

src/gateway.ts

import OpenAI from "openai";

export const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://gateway-api.mastra.ai/v1",
});

All subsequent examples import this client instance from ./gateway.

Chat completions

Send a standard chat completion request. The gateway routes it to the model provider and returns an OpenAI-compatible response.

src/chat.ts

import { client } from "./gateway";

const completion = await client.chat.completions.create({
  model: "google/gemini-2.5-flash",
  messages: [
    { role: "user", content: "What is 2+2? Reply with just the number." },
  ],
  max_tokens: 20,
});

console.log(completion.choices[0].message.content);
// "4"

System messages

Set the model's behavior with a system message in the messages array.

const completion = await client.chat.completions.create({
  model: "google/gemini-2.5-flash",
  messages: [
    { role: "system", content: "You are a calculator. Only respond with numbers, no words." },
    { role: "user", content: "What is 10 * 5?" },
  ],
  max_tokens: 100,
});

console.log(completion.choices[0].message.content);
// "50"

Multi-turn conversations

Pass the full conversation history in the messages array so the model retains context across turns.

const completion = await client.chat.completions.create({
  model: "google/gemini-2.5-flash",
  messages: [
    { role: "user", content: "Remember this word: banana" },
    { role: "assistant", content: "Got it, I will remember it." },
    { role: "user", content: "What word did I ask you to remember? Reply with just the word." },
  ],
  max_tokens: 100,
});

console.log(completion.choices[0].message.content);
// "banana"

Streaming

Pass stream: true to receive chunks incrementally.

src/stream.ts

import { client } from "./gateway";

const stream = await client.chat.completions.create({
  model: "google/gemini-2.5-flash",
  messages: [
    { role: "user", content: "Count from 1 to 5, separated by commas." },
  ],
  max_tokens: 50,
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) {
    process.stdout.write(delta);
  }
}
// "1, 2, 3, 4, 5"

Memory with thread and resource IDs

Pass x-thread-id and x-resource-id headers to enable observational memory. The gateway stores observations per thread and injects them as context on subsequent requests.

src/memory.ts

import { client } from "./gateway";

// First request: introduce yourself
await client.chat.completions.create(
  {
    model: "google/gemini-2.5-flash",
    messages: [
      { role: "user", content: "My name is Alex and I prefer concise answers." },
    ],
  },
  {
    headers: {
      "x-thread-id": "my-thread-1",
      "x-resource-id": "user-42",
    },
  },
);

// Second request: the gateway remembers
const response = await client.chat.completions.create(
  {
    model: "google/gemini-2.5-flash",
    messages: [
      { role: "user", content: "What is my name?" },
    ],
  },
  {
    headers: {
      "x-thread-id": "my-thread-1",
      "x-resource-id": "user-42",
    },
  },
);

console.log(response.choices[0].message.content);
// "Alex"

Tool calling

Define tools using the standard OpenAI function calling format. The gateway forwards tool definitions to the provider.

src/tools.ts

import OpenAI from "openai";
import { client } from "./gateway";

const helloWorldTool: OpenAI.ChatCompletionTool = {
  type: "function",
  function: {
    name: "helloWorld",
    description: "Returns a greeting for the given name",
    parameters: {
      type: "object",
      properties: {
        name: { type: "string", description: "The name to greet" },
      },
      required: ["name"],
    },
  },
};

// Step 1: Send the request with tools
const first = await client.chat.completions.create({
  model: "google/gemini-2.5-flash",
  messages: [
    { role: "user", content: "Use the helloWorld tool to greet Alice." },
  ],
  tools: [helloWorldTool],
  tool_choice: { type: "function", function: { name: "helloWorld" } },
  max_tokens: 200,
});

const toolCall = first.choices[0].message.tool_calls![0];
const args = JSON.parse(toolCall.function.arguments);
// args.name === "Alice"

// Execute the tool locally
const result = { greeting: `Hello, ${args.name}!` };

// Step 2: Send the tool result back
const second = await client.chat.completions.create({
  model: "google/gemini-2.5-flash",
  messages: [
    { role: "user", content: "Use the helloWorld tool to greet Alice." },
    first.choices[0].message,
    {
      role: "tool",
      tool_call_id: toolCall.id,
      content: JSON.stringify(result),
    },
  ],
  tools: [helloWorldTool],
  max_tokens: 200,
});

console.log(second.choices[0].message.content);
// "Hello, Alice!"

Streaming tool calls

Combine stream: true with tools to receive tool call arguments as incremental deltas.

const stream = await client.chat.completions.create({
  model: "google/gemini-2.5-flash",
  messages: [
    { role: "user", content: "Use the helloWorld tool to greet Bob." },
  ],
  tools: [helloWorldTool],
  tool_choice: { type: "function", function: { name: "helloWorld" } },
  max_tokens: 200,
  stream: true,
});

const toolCalls: Record<number, { id?: string; name?: string; args: string }> = {};

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
  for (const tc of delta?.tool_calls ?? []) {
    const idx = tc.index ?? 0;
    const entry = toolCalls[idx] ?? (toolCalls[idx] = { args: "" });
    if (tc.id) entry.id = tc.id;
    if (tc.function?.name && !entry.name) entry.name = tc.function.name;
    if (tc.function?.arguments) entry.args += tc.function.arguments;
  }
}

const firstCall = toolCalls[0];
console.log(firstCall.name); // "helloWorld"
console.log(JSON.parse(firstCall.args)); // { name: "Bob" }

Responses API

The gateway also proxies the OpenAI Responses API (POST /v1/responses).

src/responses.ts

import { client } from "./gateway";

const response = await client.responses.create({
  model: "openai/gpt-5.4-mini",
  input: "What is 3+3? Reply with just the number.",
  max_output_tokens: 10,
});

const message = response.output[0];
if (message.type === "message") {
  const text = message.content[0];
  if (text.type === "output_text") {
    console.log(text.text); // "6"
  }
}

Streaming responses

Pass stream: true to receive response events incrementally.

const stream = await client.responses.create({
  model: "openai/gpt-5.4-mini",
  input: "Count from 1 to 5, separated by commas.",
  max_output_tokens: 50,
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

BYOK pass-through

BYOK is available on the Teams plan and above. To use your own provider API key instead of the gateway's provisioned keys, send both keys:

src/gateway.ts

import OpenAI from "openai";

export const client = new OpenAI({
  apiKey: "sk-your-openai-key",    // Your provider key
  baseURL: "https://gateway-api.mastra.ai/v1",
  defaultHeaders: {
    "X-Memory-Gateway-Authorization": "Bearer YOUR_API_KEY",  // msk_ key
  },
});

const completion = await client.chat.completions.create({
  model: "openai/gpt-5.4-mini",
  messages: [
    { role: "user", content: "Hello!" },
  ],
});

The gateway authenticates with the msk_ key, then forwards your provider key to OpenAI. Memory and observational features still work.

Features: Observational memory, streaming, BYOK, and gateway tools
Models: Supported providers and model routing
API reference: Complete endpoint documentation

Before you begin​

Configure the client​

Chat completions​

System messages​

Multi-turn conversations​

Streaming​

Memory with thread and resource IDs​

Tool calling​

Streaming tool calls​

Responses API​

Streaming responses​

BYOK pass-through​

Related​