# Limits

The Mastra Memory Gateway enforces rate limits and pagination constraints to protect the service and ensure fair usage.

## Rate limits

Rate limits are applied per API key. When a limit is exceeded, the gateway returns a `429` status code with a `Retry-After` header.

### LLM proxy

Applies to `POST /v1/chat/completions`, `POST /v1/messages`, and `POST /v1/responses`.

| Rule                   | Default limit  | Window     |
| ---------------------- | -------------- | ---------- |
| Global (all endpoints) | 5,000 requests | 60 seconds |
| LLM proxy              | 2,000 requests | 60 seconds |
| LLM burst              | 400 requests   | 10 seconds |

### Memory API

Applies to the `/v1/memory/*` endpoints. Read and write operations have separate limits.

| Rule                               | Default limit  | Window     |
| ---------------------------------- | -------------- | ---------- |
| Memory read (GET, HEAD)            | 1,200 requests | 60 seconds |
| Memory write (POST, PATCH, DELETE) | 600 requests   | 60 seconds |

### Rate limit headers

Every response includes rate limit headers:

| Header                  | Description                                                         |
| ----------------------- | ------------------------------------------------------------------- |
| `X-RateLimit-Limit`     | Maximum requests allowed in the current window                      |
| `X-RateLimit-Remaining` | Remaining requests in the current window                            |
| `X-RateLimit-Reset`     | Unix timestamp when the window resets                               |
| `X-RateLimit-Scope`     | Which limit applies (`llm_proxy`, `memory_read`, or `memory_write`) |
| `Retry-After`           | Seconds until the limit resets (only present on `429` responses)    |

### Rate limit error response

```json
{
  "error": {
    "message": "Rate limit exceeded. Please retry after the reset time.",
    "type": "rate_limit_error",
    "scope": "llm_proxy",
    "retry_after_seconds": 12
  }
}
```

## Pagination limits

List endpoints accept `limit` and `offset` query parameters.

| Parameter                     | Default | Maximum |
| ----------------------------- | ------- | ------- |
| `limit` (threads, messages)   | 50      | 200     |
| `limit` (observation history) | 10      | 200     |
| `offset`                      | 0       | —       |

## Authentication

- All API keys use the `msk_` prefix
- An invalid or missing API key returns a `401` error

## Error format

All error responses follow the same structure:

```json
{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type"
  }
}
```

Common error types:

| Type                    | Status code | Description                                                    |
| ----------------------- | ----------- | -------------------------------------------------------------- |
| `authentication_error`  | 401         | Invalid or missing API key                                     |
| `not_found`             | 404         | Thread or resource not found                                   |
| `rate_limit_error`      | 429         | Rate limit exceeded                                            |
| `invalid_request_error` | 400         | Malformed request or unsupported provider/endpoint combination |