How to Architect a Multi-Tenant MCP Server for Enterprise B2B SaaS
Learn how to architect a multi-tenant MCP server with cryptographic URL scoping, proactive OAuth refresh, dynamic tool generation, and least-privilege filtering.
Orchestrating an AI agent on your laptop is deceptively easy. You define a persona, hand it a few Python functions, paste a vendor API key into a .env file, and watch the agent reason through tasks. But deploying that same agent into a production B2B SaaS environment exposes a massive architectural gap.
If you've shipped a local Model Context Protocol (MCP) server with credentials baked into environment variables and now need to scale it to thousands of B2B customers, the gap between those two states is brutal. A multi-tenant MCP server is one where a single deployed service securely brokers AI agent access to thousands of customer accounts, each with their own OAuth tokens, scopes, and tool permissions, without ever leaking credentials between tenants or to the LLM itself.
The framework handles the agentic reasoning—whether you're using CrewAI, AutoGen, or LangGraph, as covered in our guide to multi-agent frameworks—but it does not solve the enterprise integration problem. When your AI agent needs to act on behalf of your users inside external systems—reading Jira tickets, updating Salesforce opportunities, or pulling BambooHR employee records—you suddenly have to manage multi-tenant OAuth 2.0 lifecycles, handle vendor-specific rate limits, and ensure strict isolation between what different agents are allowed to access.
This guide walks through the architectural patterns that actually hold up in production, drawing on what's been learned the hard way across the industry. We will cover cryptographically scoped URLs, abstracted OAuth token management, dynamic tool generation, and enforcing least privilege at the server level. The goal: a setup that survives both an enterprise security review and a Tuesday morning at 10x traffic.
The Multi-Tenant MCP Server Bottleneck
Most MCP demos use the stdio transport with credentials read at boot time. As the MCP specification notes: No explicit authentication takes place between the MCP client and the MCP server in that flow, and the authorization process for services behind the MCP server is managed by passing environment variables.
That works for a developer's laptop. It collapses the moment your CRM agent needs to act on behalf of 4,000 distinct Salesforce instances. The core problem is that the MCP server is no longer a tool—it is a multi-tenant gateway. Each request needs to be routed to the correct customer's credentials, scoped to that customer's permissions, and isolated from every other tenant's data.
The MCP specification deliberately leaves this to you: The exact mechanism for authentication and authorization for requests made by an MCP server is outside the scope of the MCP specification, but you need one.
This is the difference between an integration demo and an integration platform. If you want context on the broader category, our 2026 architecture guide for SaaS PMs is a good primer. The rest of this post is about what the demo doesn't show you.
The Security Reality of Enterprise AI Agents
The stakes for getting this architecture right are incredibly high. According to Gartner, by the end of 2026, 40% of enterprise applications will feature task-specific AI agents, up from less than 5% today. Yet, the infrastructure to secure these agents is lagging. A 2026 Gravitee survey highlights that only 24.4% of organizations have full visibility into which AI agents are communicating with each other or external systems.
Every one of those agents is going to need authenticated access to SaaS systems on behalf of end users, and most engineering teams are still figuring out how to do that without committing security malpractice.
The failure mode you most want to avoid is passing OAuth tokens or API credentials directly through the AI model's context window. Security experts at Kiteworks explicitly warn against this pattern. The model has no business seeing them, and just as with SaaS data PII redaction, tool-use logs, prompt caches, and agent traces become a credential leakage surface the moment you do. Passing credentials through the AI context exposes them to prompt injection attacks, where a malicious input could trick the model into leaking the token or performing unauthorized destructive actions.
The June 2025 MCP specification addresses this directly: When MCP servers need to call upstream APIs, they must act as OAuth clients to those services and obtain separate tokens. Never pass through the token received from the MCP client, as this creates confused deputy vulnerabilities where downstream services may incorrectly trust tokens not intended for them.
Which means your architecture has to satisfy three properties at once:
- Tenant isolation: A token issued to Customer A's MCP server can never be used to read Customer B's data.
- Credential opacity: The AI model and the MCP client never see raw API tokens, refresh tokens, or vendor secrets.
- Scope enforcement: Tools available through the server are constrained to what the customer (and their plan) authorized.
To build a secure system, the MCP server must act as an impenetrable boundary, ideally employing a zero data retention architecture. The AI model requests an action, and the server independently resolves the authentication context, enforces permissions, and executes the request.
Core Architecture: Cryptographically Scoped Server URLs
A cryptographically scoped MCP URL is an endpoint that uses a securely hashed token within the URL path to authenticate the client, identify the specific tenant, and define the exact set of allowed tools, completely removing the need for the AI model to handle underlying API credentials.
The cleanest tenant isolation primitive for remote MCP is to make the server URL itself the security boundary. In a multi-tenant environment, you cannot use a single global MCP server. If Tenant A and Tenant B both connect to https://api.example.com/mcp, the server has no reliable way to know which tenant's Salesforce instance to query unless the AI model explicitly passes a tenant ID—which is a severe security risk.
Instead, the architecture must generate unique, self-contained MCP servers for every connected account. When a user connects a third-party integration, the platform mints a unique URL per integrated account:
https://api.example.com/mcp/<random-hex-token>
Here is how the generation pipeline works:
- The system creates a database record linking the new server to the specific tenant and integration.
- A random hexadecimal string is generated server-side to act as the public token.
- Critically, you never store the raw token. This raw token is hashed using an HMAC signing key.
- The hashed token is stored in a fast, distributed Key-Value (KV) store, mapping it to the tenant's context and integrated account.
- The system returns the raw URL to the client:
https://api.example.com/mcp/<raw_hex_token>.
When a request hits /mcp/:token, the gateway hashes the token, looks up the hashed value, and resolves it to the tenant + integrated account it represents.
sequenceDiagram
participant Client as MCP Client<br>(Claude / ChatGPT)
participant Edge as MCP Gateway
participant KV as Token Store<br>(hashed lookup)
participant Vault as Credential Store
participant API as Third-Party API
Client->>Edge: POST /mcp/{raw_token}<br>JSON-RPC tools/call
Edge->>Edge: HMAC(raw_token, signing_key)
Edge->>KV: GET hashed_token
KV-->>Edge: {account_id, tenant_id, expires_at}
Edge->>Vault: load credentials for account
Vault-->>Edge: decrypted access_token
Edge->>API: authenticated upstream call
API-->>Edge: response
Edge-->>Client: JSON-RPC resultThree properties fall out of this design:
- Stolen KV data is useless. If the lookup store leaks, attackers get HMAC digests they can't reverse. The raw token is only ever returned once—in the response that creates the server.
- No session sprawl. The URL is the session. There's no separate "login" step on the MCP client side.
- Cheap revocation. Deleting the KV entry kills the server instantly. No token blacklist, no propagation delay.
For higher-trust deployments, you can layer a second factor: require the MCP client to also send a Bearer token (your platform's API token) in the Authorization header on top of the URL token. That covers the case where the URL might leak through screenshots, logs, or config files.
Do not use predictable URLs (UUIDs derived from the account ID, sequential IDs, or anything timestamp-based). The token should be high-entropy random bytes. Treat the URL like a bearer credential because that's exactly what it is.
Abstracting OAuth Token Management and Proactive Refresh
Handling OAuth 2.0 lifecycles is notoriously difficult. Once a request is routed to the right tenant, the next problem is keeping that tenant's upstream credentials alive. Most B2B SaaS APIs hand out access tokens with 30-60 minute lifetimes. If your AI agent tries to call a tool and the token is expired, the request fails. You cannot expect an LLM to handle a 401 Unauthorized error, parse the invalid_grant response, execute a token refresh flow, and retry the request.
The pattern that scales: treat OAuth lifecycle as platform infrastructure, completely abstracted from the MCP layer. The platform must ensure that whenever the MCP server receives a tool call, the underlying credentials are valid. Read our deep dive on OAuth at Scale: The Architecture of Reliable Token Refreshes for a complete breakdown, but practically, this means:
1. Encrypted Credential Storage at Rest
Access tokens, refresh tokens, client secrets, and any other sensitive context fields get encrypted before they hit your database. They're decrypted only inside the request path that actually needs to call the upstream API.
2. Concurrency Control with Distributed Locks
Because AI agents often execute multiple tools in parallel, you will encounter severe race conditions. If an agent calls five tools simultaneously and the token is expired, all five requests will attempt to use the refresh token at the exact same time. The first request succeeds, but the integration provider immediately invalidates the refresh token. The other four requests fail, and your integration is now completely disconnected.
You need a per-account distributed lock (a mutex) around the refresh operation. When the first request detects an expired token, it acquires the lock. The other concurrent requests hit the lock and await the result. Once the first request finishes refreshing the token and updates the database, the lock releases, and the waiting requests proceed using the newly minted access token.
3. Proactive Scheduled Alarms
Relying solely on on-demand refreshes increases latency for the end user, as the AI agent has to wait for the HTTP round-trip of the OAuth exchange. Don't wait for a 401 from the upstream API. Schedule a background worker or durable alarm to proactively refresh tokens 60 to 180 seconds before they expire. Combine that with a just-in-time check (with a 30-second safety buffer) on every API call as a fallback. By the time the AI agent makes a request, the token is already fresh.
4. Failure-Mode Webhooks
When a refresh ultimately fails (revoked grant, expired refresh token, account disabled), flip the integrated account into a needs_reauth state and notify the customer's app via webhook. Don't silently 500 every agent request after that.
Dynamic Tool Generation vs. Hardcoded Endpoints
If your engineering team is writing custom handler functions for every MCP tool, you are building legacy technical debt. You write list_hubspot_contacts, then list_salesforce_contacts, then list_pipedrive_contacts, and a year later you have 4,000 lines of near-identical TypeScript that all break differently when a vendor rotates their pagination format.
Dynamic tool generation is the architectural pattern of deriving MCP tool definitions directly from an integration's API documentation and JSON schemas at runtime, eliminating the need to hand-code individual API connectors.
The scalable approach is to generate tools as data, from two sources you already maintain:
- Resource configuration: which endpoints exist, what HTTP methods they accept, which path placeholders they need.
- Documentation records: human-readable descriptions plus JSON Schema for query parameters and request bodies.
When an MCP client sends a tools/list request, the server dynamically constructs the available tools by intersecting the integration's defined endpoints with the available documentation records. If an endpoint lacks documentation, it is excluded from the MCP server. This acts as a strict quality gate, ensuring AI models only see endpoints with clear instructions.
Tool names are derived consistently:
// Pseudocode for tool name derivation
function toolName(integration: string, resource: string, method: string): string {
switch (method) {
case 'list': return snakeCase(`list all ${integration} ${resource}`)
case 'get': return snakeCase(`get single ${integration} ${singular(resource)} by id`)
case 'create': return snakeCase(`create a ${integration} ${singular(resource)}`)
case 'update': return snakeCase(`update a ${integration} ${singular(resource)} by id`)
case 'delete': return snakeCase(`delete a ${integration} ${singular(resource)} by id`)
default: return snakeCase(`${integration} ${resource} ${method}`)
}
}Schemas are injected with context-aware instructions. For list operations, you inject limit and next_cursor properties into the query schema automatically, with descriptions that explicitly tell the LLM how to handle pagination cursors ("Pass back exactly the cursor value you received, do not decode or modify"). For individual record operations (get, update, delete), you inject an id property. This is the boring infrastructure work that decides whether agents actually paginate correctly or hallucinate cursor values.
For more on this pattern, review our guide on Auto-Generated MCP Tools: Documentation-Driven Tool Creation for AI Agents (2026).
Enforcing Least Privilege: Method and Tag Filtering
AI models are unpredictable. If you expose a full CRUD (Create, Read, Update, Delete) API to an autonomous agent, it will eventually hallucinate a destructive action. You cannot rely on prompt engineering to prevent data loss. Security must be enforced at the infrastructure level.
A multi-tenant MCP server with full read/write access to every endpoint is a liability. When authorization is required and not yet proven by the client, servers MUST respond with HTTP 401 Unauthorized, but that's only the outer perimeter. Inside the perimeter, multi-tenant MCP servers require granular, server-level filtering. Two orthogonal axes of filtering give you most of what you need:
Method-Level Filtering
Beyond restricting resources, you must restrict operations. Let customers create read-only servers, write-only servers, or specific-method servers. The implementation is a small predicate:
function isMethodAllowed(method: string, allowed?: string[]): boolean {
if (!allowed?.length) return true
return allowed.some(rule => {
switch (rule) {
case 'read': return ['get', 'list'].includes(method)
case 'write': return ['create', 'update', 'delete'].includes(method)
case 'custom': return !['get','list','create','update','delete'].includes(method)
default: return method === rule
}
})
}Tag-Based Resource Filtering
Integrations contain dozens of resources. A CRM integration might have endpoints for contacts, deals, tickets, and internal user directories. You rarely want an AI agent to have access to all of them. By tagging resources in your integration configuration, you can scope MCP servers to specific functional areas:
tool_tags: {
contacts: ['crm', 'sales'],
deals: ['crm', 'sales'],
tickets: ['support'],
ticket_comments: ['support'],
users: ['directory']
}A server created with { tags: ['support'] } only ever sees tools for tickets and ticket_comments. Combine with { methods: ['read'] } and you've got a strictly read-only support agent that can't touch your CRM data.
| Server profile | methods |
tags |
Example use case |
|---|---|---|---|
| Read-only support agent | ['read'] |
['support'] |
Triage assistant |
| CRM data writer | ['write'] |
['crm'] |
Lead enrichment agent |
| Full directory | ['read'] |
['directory'] |
Org-chart Q&A bot |
| Custom searches only | ['custom'] |
- | Search-augmented retrieval |
Validate filters at server-creation time. If the requested combination of methods and tags produces zero tools, reject the request with a clear error. This is also where short-lived servers shine. A contractor who needs MCP access for a week should get a server with expires_at set seven days out, after which the underlying token entries vanish on their own.
Handling Flat Input Namespaces in JSON-RPC
A subtle but frustrating technical reality of the Model Context Protocol is how it structures arguments. When an AI client invokes a tool via a tools/call message, all arguments are delivered as a single, flat JSON object.
REST APIs, however, do not use flat namespaces. They strictly separate query parameters from request bodies. If an AI agent wants to update a Salesforce contact and also paginate the response, it might send:
{
"contact_id": "12345",
"first_name": "Alice",
"limit": 50
}The proxy API layer needs to know that first_name goes in the JSON body, limit goes in the URL query string, and contact_id belongs in the URL path. Your gateway has to disambiguate. The pragmatic approach is to use the tool's JSON schemas as the splitter:
function splitArguments(
args: Record<string, unknown>,
querySchema: { properties: Record<string, unknown> },
bodySchema: { properties: Record<string, unknown> }
) {
const queryKeys = new Set(Object.keys(querySchema?.properties ?? {}))
const bodyKeys = new Set(Object.keys(bodySchema?.properties ?? {}))
const query: Record<string, unknown> = {}
const body: Record<string, unknown> = {}
for (const [k, v] of Object.entries(args)) {
if (queryKeys.has(k)) query[k] = v
else if (bodyKeys.has(k)) body[k] = v
}
// For get/update/delete, lift `id` out of query into the URL path
return { query, body, id: query.id }
}A few real-world edge cases worth flagging:
- Name collisions. If a query schema and a body schema both define a
nameproperty, pick one as canonical (typically query) or throw a validation error depending on your strictness. Document the precedence so integration authors know. - Cursor passthrough. LLMs love to "helpfully" decode opaque cursor strings. Your
next_cursordescription must explicitly forbid this. Most production-grade agents follow descriptions if they're firm enough. - Custom methods. Things like
search,download, orimportdon't fit CRUD shapes. Treat them as a separate dispatch path that takes the full args object and lets the underlying integration config decide what to do with it. - Rate limit headers. Your gateway should pass upstream rate-limit signals to the caller, not absorb them. Surface standardized
ratelimit-limit,ratelimit-remaining, andratelimit-resetheaders per the IETF spec and let the calling agent (or its host framework) decide how to back off. Trying to silently retry inside the MCP gateway just hides problems and burns through quotas faster.
Infrastructure for the Agentic Era
A multi-tenant MCP server is, fundamentally, an infrastructure problem dressed up as an AI problem. The actual LLM integration requires very little code. The hard parts—URL-based tenant isolation, encrypted credential vaulting, proactive token refresh with concurrency control, declarative tool generation, least-privilege filtering, schema-driven argument splitting—are all classic distributed systems work. The MCP protocol just sits on top.
If you're building this in-house, give yourself a realistic estimate. A team of two senior engineers can usually get a working MVP in 6-8 weeks for a single integration. Each additional integration adds 1-3 weeks of OAuth quirks, pagination edge cases, and webhook verification work. Multiply by the number of CRMs, HRIS systems, ticketing tools, and accounting platforms your customers care about, and the headcount math gets uncomfortable.
The alternative is a managed unified API platform that hands you all of this as a primitive. Leveraging a platform that handles the OAuth lifecycle, provides per-account MCP URLs, and dynamically generates tools from normalized data models allows your engineers to focus on what actually matters: the agentic reasoning and the user experience.
Whichever path you pick, the architectural principles don't change. Isolate tenants cryptographically. Keep credentials out of the model context. Generate tools as data. Default to least privilege. Surface errors honestly. Get those right and your MCP layer holds up under enterprise scrutiny. Get them wrong and you're one prompt-injection demo away from a very bad week.
FAQ
- How do you isolate OAuth tokens between tenants in an MCP server?
- Mint a unique cryptographically random URL per integrated account, store only an HMAC of the token, and resolve it to a single tenant's credentials at request time. The OAuth tokens themselves stay encrypted in a credential store and never enter the AI model's context.
- Can AI agents manage OAuth token refreshes?
- No. Passing refresh tokens to an AI model exposes them to prompt injection and confused deputy vulnerabilities. The underlying integration platform must handle proactive token refreshes using distributed locks to prevent race conditions.
- How do you stop AI agents from calling write endpoints they shouldn't have access to?
- Apply method and tag filters at the server level when the MCP URL is created. Restrict by method category (read, write, custom) and by resource tag (such as 'support' or 'crm') to ensure the AI agent only sees authorized tools.
- Should the MCP server retry on 429 rate limit errors?
- No. The MCP gateway should pass upstream rate-limit signals back to the caller via standardized IETF headers and let the agent or host framework decide how to back off. Silently retrying inside the gateway burns through quotas faster.
- How do you map flat JSON-RPC inputs to complex API requests?
- Use the tool's defined query and body JSON schemas to split the flat arguments object from the MCP client into distinct query parameters, URL paths, and request bodies before proxying the request to the upstream API.