---
title: Implementing End-User OAuth Identity Passthrough for Remote MCP Servers
slug: implementing-end-user-oauth-identity-passthrough-for-remote-mcp-servers
date: 2026-05-13
author: Uday Gajavalli
categories: [Engineering, Security, "AI & Agents"]
excerpt: "A senior engineer's guide to implementing end-user OAuth identity passthrough for remote MCP servers using OAuth 2.1, PKCE, and dynamic tool generation."
tldr: "Securing AI agents requires abandoning shared API keys in favor of end-user OAuth identity passthrough, ensuring agents inherit exact human permissions via OAuth 2.1, PKCE, and audience-validated tokens."
canonical: https://truto.one/blog/implementing-end-user-oauth-identity-passthrough-for-remote-mcp-servers/
---

# Implementing End-User OAuth Identity Passthrough for Remote MCP Servers


If you are about to expose your B2B SaaS platform to AI agents over the [Model Context Protocol (MCP)](https://truto.one/what-is-an-mcp-server-the-2026-architecture-guide-for-saas-pms/), the single most important decision you will make is how the agent inherits the end-user's identity. Get it wrong, and you ship a shared service account masquerading as personalization. Get it right, and the agent operates inside the exact same permissions, scopes, and audit trail your human user already has.

Orchestrating an AI agent on a local machine is a solved problem. You define a persona, hand it a few Python functions, paste a vendor API key into your environment variables, and watch the agent reason through tasks. But deploying that same agent into a production B2B SaaS environment exposes a massive architectural gap. When your AI agent needs to act on behalf of your users inside external systems—reading Jira tickets, updating Salesforce opportunities, or pulling BambooHR employee records—you suddenly have to manage multi-tenant OAuth 2.0 lifecycles and ensure strict isolation between what different agents are allowed to access, a challenge we explored in our guide to [architecting a multi-tenant MCP server](https://truto.one/how-to-architect-a-multi-tenant-mcp-server-for-enterprise-b2b-saas/).

**End-user OAuth identity passthrough for remote MCP servers means the access token an AI agent presents to your MCP server is bound to a specific human user, scoped to their permissions in the downstream SaaS, and validated by your authorization server as the resource owner—never a static API key shared across tenants.** That sentence is the whole architectural battle. 

This guide breaks down the architectural patterns required to securely expose your SaaS platform to AI agents. We will cover the mechanics of OAuth 2.1 identity passthrough, handling the brutal realities of token lifecycles, managing third-party rate limits, and why dynamic tool generation is the only way to scale MCP servers in production. This is written for senior PMs and engineering leads who have already shipped OAuth integrations and are now being asked to make their product "AI-agent ready" without becoming the next confused-deputy CVE.

## The AI Agent Identity Crisis in B2B SaaS

The scale of the problem is no longer theoretical. A recent Cloud Security Alliance survey found that <cite index="1-1">82% of enterprises have unknown AI agents operating in their environments</cite>, and the same research reports <cite index="1-1">65% have experienced AI agent incidents in the past year</cite>, ranging from silent data exfiltration to operational outages. Microsoft's Cyber Pulse data adds that <cite index="1-1">roughly 80% of Fortune 500 companies are already running active AI agents</cite> in production workflows.

Most early multi-agent deployments rely on service accounts and static API keys. The developer hardcodes a credential, and the agent uses it to authenticate against an external system. In a single-tenant environment, this is risky. In a multi-tenant B2B SaaS environment, it is a catastrophic vulnerability waiting to happen.

Currently, most agents are authenticating with one of three patterns, and all three are broken at enterprise scale:

1.  **A shared service-account API key** copied into a `.env` file or secrets manager.
2.  **A long-lived OAuth refresh token** issued to a single internal user, reused across every customer.
3.  **A bearer token minted by the SaaS itself** with no upstream tie to the end-user's identity.

Each of these collapses the principle of least privilege. When an AI agent uses a shared API key, it inherits the aggregate privileges of that service account. If the agent is compromised—via prompt injection or a hallucinated reasoning loop—it has unbounded access to the entire connected application. The blast radius is massive. There is no way to revoke access for one user without breaking everyone. 

Enterprise buyers know this. They are no longer treating AI agent security as a future-state roadmap item. <cite index="2-9,2-10">Access tokens must be scoped and rotated in ephemeral, cloud-native environments, and agents blur the audit trail by blending delegated user authority with autonomous action.</cite> Identity passthrough—especially when combined with a [zero data retention architecture](https://truto.one/zero-data-retention-mcp-servers-building-soc-2-gdpr-compliant-ai-agents/) for SOC 2 compliance—is the only pattern that survives a rigorous security review.

## What is End-User OAuth Identity Passthrough?

End-user OAuth identity passthrough is an architectural pattern where an AI agent does not authenticate to a remote system using its own service credentials. Instead, the agent authenticates as the specific human user who invoked it, inheriting their exact permissions, scopes, and role-based access controls (RBAC) in the target SaaS application.

If User A asks an agent to summarize their open Jira tickets, the agent connects to the remote MCP server using an OAuth token explicitly tied to User A's Jira identity. The MCP server validates the token, extracts the user identity, and proxies the request to Jira using User A's specific credentials. If the agent hallucinates and tries to read User B's tickets, the target SaaS API rejects the request because User A's token lacks the necessary permissions.

Identity passthrough is a chain of three OAuth relationships that share a single subject claim: the human user.

```mermaid
sequenceDiagram
    participant U as End User
    participant A as AI Agent / MCP Client
    participant M as Your MCP Server<br>(Resource Server)
    participant AS as Authorization Server
    participant T as Third-Party SaaS<br>(Salesforce, Jira, etc.)

    A->>M: tools/call (no token)
    M-->>A: 401 + WWW-Authenticate<br>resource_metadata=...
    A->>AS: Authorization Code + PKCE (user consents)
    AS-->>A: Access token (aud=MCP, sub=user)
    A->>M: tools/call (Bearer token)
    M->>M: Validate aud, sub, scope & extract identity
    M->>T: API call with user's<br>downstream OAuth token
    T-->>M: Data scoped to user's permissions
    M-->>A: Tool response
```

The agent never sees the downstream SaaS token. Your MCP server never accepts a token minted for some other audience. The end-user explicitly consents to the scopes the agent will use. If the user is fired in your HRIS, deprovisioning their identity in the IdP cascades through every agent acting on their behalf, eliminating privilege creep.

> [!WARNING]
> A common mistake is to take the access token the MCP client presents and forward it directly to the downstream API. The MCP spec explicitly forbids this. <cite index="5-9,5-10,5-11">Never pass through the token received from the MCP client - this creates confused deputy vulnerabilities where downstream services may incorrectly trust tokens not intended for them. The June 2025 spec explicitly prohibits MCP servers from passing through tokens to upstream APIs.</cite> Mint or look up a separate, audience-scoped token for each downstream call.

## Implementing End-User OAuth Identity Passthrough for Remote MCP Servers

The Model Context Protocol's authorization specification is now explicit about what compliant remote servers must do to secure AI agent access. The November 2025 revision settled the architecture: <cite index="6-2">MCP auth implementations MUST implement OAuth 2.1 with appropriate security measures for both confidential and public clients.</cite> Building this requires a strict separation between the MCP client (the agent framework) and the remote MCP server (the integration middleware).

There are six moving parts you need to implement to do this correctly:

### 1. Treat your MCP server as a resource server, not an authorization server

<cite index="5-4,5-5,5-6">MCP servers act as OAuth 2.1 resource servers only, validating tokens issued by external, dedicated authorization servers. The MCP server's job is to validate tokens and enforce scopes internally, but not to manage user logins or token issuance.</cite> If you already run an IdP (WorkOS, Auth0, Okta, your own Keycloak), point your MCP server at it. Do not build a parallel authentication stack inside the protocol handler.

### 2. Publish Protected Resource Metadata (RFC 9728)

When an MCP client hits a protected tool without a token, you must respond with a 401 and a `WWW-Authenticate` header that points to a discovery document:

```http
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer realm="mcp",
  resource_metadata="https://api.yoursaas.com/.well-known/oauth-protected-resource"
```

That document declares your authorization server, supported scopes, and bearer methods. <cite index="16-3,16-4">MCP servers MUST implement OAuth 2.0 Protected Resource Metadata (RFC9728), and MCP clients MUST use it for authorization server discovery.</cite> Skip this, and Claude, ChatGPT, and Cursor's connector UIs literally cannot bootstrap your server.

### 3. Mandate PKCE on every authorization code exchange

<cite index="10-2,10-3">OAuth 2.1 mandates PKCE for all clients using the Authorization Code flow. By adopting the 2.1 stack, MCP inherits this PKCE-by-default security posture, transforming PKCE from a patch into a foundational security layer.</cite> There are no exceptions for "trusted" agents. The MCP client (Claude, ChatGPT, or an internal LangGraph runner in a [multi-agent framework](https://truto.one/handling-auth-tool-sharing-in-multi-agent-frameworks-via-mcp/)) generates a `code_verifier`, hashes it into a `code_challenge`, and the authorization server only redeems the code if the verifier matches.

### 4. Validate the audience claim and extract context

The single most common production exploit in MCP deployments is audience confusion: a server accepts a token minted for a sibling service. <cite index="16-20,16-21,16-22">An attacker can compromise an MCP server if it accepts tokens issued for other resources. When an MCP server doesn't verify that tokens were specifically intended for it via the audience claim, it may accept tokens originally issued for other services.</cite>

When the MCP client sends a JSON-RPC request to the remote MCP server, it includes the user's access token in the `Authorization: Bearer` header. The remote MCP server must intercept this request before parsing the JSON-RPC payload. The middleware validates the JWT signature, checks the expiration, verifies the audience claim, and extracts the user identity (e.g., the `sub` claim). It then uses this identity to look up the corresponding third-party SaaS credentials (the integrated account) stored securely in your database.

A minimal middleware check in any HTTP server framework:

```typescript
async function validateMcpToken(req, res, next) {
  const token = extractBearer(req.headers.authorization)
  const claims = await jwt.verify(token, jwks, {
    issuer: 'https://auth.yoursaas.com',
    audience: 'https://api.yoursaas.com/mcp', // exact resource URI is mandatory
  })
  if (!claims.sub) return res.status(401).end()
  
  // Look up SaaS credentials based on the exact user identity
  req.user = { id: claims.sub, scopes: claims.scope?.split(' ') ?? [] }
  next()
}
```

### 5. Decide what to do about Dynamic Client Registration (DCR)

In a production environment, you cannot always hardcode client credentials into every deployed agent. <cite index="16-1">MCP clients and authorization servers SHOULD support the OAuth 2.0 Dynamic Client Registration Protocol (RFC 7591) to allow MCP clients to obtain OAuth client IDs without user interaction.</cite> Claude Desktop, in particular, <cite index="18-9">requires Dynamic Client Registration support and does not yet support a way for users to specify a client ID or secret</cite> for OAuth-based remote servers.

DCR solves a real problem: <cite index="19-6,19-7,19-8,19-9,19-10">without DCR, three MCP clients and five servers means potentially fifteen separate OAuth app registrations to manage. DCR collapses that - clients register once, dynamically, and the authorization server handles the rest.</cite> It also ensures that every agent instance has a unique cryptographic identity, making it possible to revoke access for a single compromised agent without impacting the rest of the fleet.

But it ships with sharp edges. <cite index="20-16,20-17,20-18">Using DCR on a remote MCP server, you're effectively letting anyone in the world register as a client with your OAuth provider. There are no guardrails by default. Servers must accept registration requests from clients they've never seen before.</cite> Harden your `/register` endpoint with rate limits, redirect URI allowlists (no wildcards, ever), and ideally a software-statement signature so only verified MCP hosts can register.

### 6. Conditional API Token Authentication

Relying solely on a bearer token in the header can be risky if the MCP server URL is ever leaked in logs or configuration files. A defense-in-depth approach involves conditional API token authentication. The remote MCP server enforces a secondary check: the caller must provide both a valid MCP token (to identify the target integration) and a valid platform session cookie or API token (to prove they are an authenticated user of your SaaS application).

## Architectural Challenges: Token Lifecycles and Rate Limits

Implementing the authentication handshake is only the first step. Operating a multi-tenant MCP server at scale introduces severe operational challenges around concurrency and vendor API quirks. Token lifecycle management and rate limits are what actually wake up your on-call engineer.

### The Thundering Herd: OAuth Token Refresh

Access tokens expire, typically every 30 to 60 minutes. A human user refreshes a Salesforce access token through invisible background activity. But an AI agent making 200 tool calls in a five-minute reasoning loop will hit token expiry mid-conversation. If you wait until the token expires to refresh it, the API request will fail, throwing an error back to the agent and stalling its reasoning chain.

Worse, if an agent makes parallel tool calls (e.g., fetching 10 different Salesforce records concurrently), and the token is expired, you will hit a severe race condition. Ten concurrent threads will attempt to refresh the exact same OAuth token simultaneously. The provider will issue a new token to the first request and immediately revoke the old refresh token, causing the other nine requests to fail with an `invalid_grant` error, permanently breaking the connection.

To solve this, your infrastructure requires two mechanisms:

1.  **Proactive Refresh:** Schedule background tasks to refresh OAuth tokens 60 to 180 seconds before they actually expire, rather than on demand.
2.  **Mutex Locks for Refresh Operations:** When a token must be refreshed, the operation must be protected by a distributed lock. The first request acquires the lock and initiates the HTTP call to the provider. Any concurrent requests for the same integrated account must await the resolution of that single promise. Multiple tool calls for the same user must be serialized. However, Tenant A's refresh must never block Tenant B's—the mutex key is the integrated account ID.

```mermaid
flowchart LR
    A[Token stored<br>with expires_at] --> B{Time until<br>expiry < 60s?}
    B -- Yes --> C[Refresh now]
    B -- No --> D[Schedule refresh<br>60-180s before expiry]
    D --> E[Background worker<br>fires]
    E --> C
    C --> F{Refresh<br>succeeded?}
    F -- Yes --> G[Update token,<br>reschedule]
    F -- 4xx --> H[Mark needs_reauth,<br>fire webhook]
    F -- 5xx --> I[Retry with<br>exponential backoff]
```

For a deeper treatment of refresh failure modes, see our guide to [handling OAuth token refresh failures in production](https://truto.one/handling-oauth-token-refresh-failures-in-production-for-third-party-integrations/).

### Handling Rate Limits the Right Way

Third-party vendor APIs have terrible, inconsistent rate limits. When your downstream API returns an HTTP 429 Too Many Requests, your MCP server has two honest choices: swallow it and retry, or surface it and let the caller decide.

The instinct of most backend engineers is to handle the retry and exponential backoff inside the middleware. For AI agents, this is an architectural mistake. Agents make burst patterns that look nothing like human traffic. If the middleware absorbs the 429 and stalls the HTTP connection for 30 seconds while it retries, the LLM waiting on the other side of the MCP connection will likely time out. Furthermore, a hidden retry storm inside your server will trip circuit breakers on the third-party API and get every tenant rate-limited.

The pragmatic pattern is to fail fast and **normalize rate-limit signals, passing the error through**. If the agent knows the API is exhausted, it can pivot its reasoning strategy—perhaps falling back to a different tool, querying a local cache, or asking the user to wait. The middleware should normalize the chaotic vendor-specific rate limit headers into the IETF standard draft headers:

```http
HTTP/1.1 429 Too Many Requests
ratelimit-limit: 100
ratelimit-remaining: 0
ratelimit-reset: 12
```

By passing these standardized headers back through the JSON-RPC response, the agent framework (or the developer writing the orchestration logic) reads `ratelimit-reset` and decides whether to retry, switch to another tool, or surface the delay to the user.

> [!TIP]
> If you build retries into your MCP server, you remove the agent's ability to reason about cost and latency. A reasoning loop that thinks one tool call took 200ms when it actually took 14 seconds will make bad planning decisions. Honesty in errors beats clever absorption.

## Dynamic Tool Generation vs. Static Endpoints

The other half of an MCP server is the tool list. If you want to expose a SaaS integration to an AI agent, the naive approach is to handcode tool definitions or feed the agent the entire OpenAPI specification for that vendor.

This fails immediately in production. A typical enterprise SaaS OpenAPI spec (like Salesforce or Microsoft Dynamics) contains thousands of endpoints. Feeding that into an LLM context window consumes massive amounts of tokens, increases latency, and guarantees hallucinations. The agent will attempt to call endpoints that the user's OAuth scopes do not permit, or it will invent query parameters that do not exist.

Furthermore, static tools rot. Your Salesforce custom objects change. Your Jira workflow adds a status. A vendor deprecates an endpoint. Every change means a code deploy on your MCP server, a new release for the AI client to discover, and a guaranteed window where the agent calls a tool that no longer exists.

### Documentation-Driven Tool Derivation

The alternative is to derive tools dynamically from a documentation-driven manifest at request time, scoped specifically to the integrated account.

1.  The integration declares its resources and methods as data, not code.
2.  Each `(resource, method)` pair has a JSON Schema for query and body parameters plus a human-readable description.
3.  A tool only appears in the `tools/list` response if it has a documentation record. No docs, no tool.

This acts as a strict quality gate. It prevents half-built endpoints from leaking into agent context, and it gives PMs a clean lever to control which surface area is AI-ready.

When a tool is dynamically generated, it provides a strict JSON Schema to the LLM:

```json
{
  "name": "list_all_hubspot_contacts",
  "description": "Fetch a list of contacts from HubSpot. Use the next_cursor from previous responses to paginate.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "limit": {
        "type": "string",
        "description": "The number of records to fetch"
      },
      "next_cursor": {
        "type": "string",
        "description": "Pass back exactly the cursor value you received without decoding or modifying it."
      }
    }
  }
}
```

Notice the explicit instruction regarding pagination. LLMs are notoriously bad at handling opaque cursor strings and love to "helpfully" decode them, breaking the pagination. The tool description must explicitly instruct the model to pass the cursor back unmodified.

For more on this pattern, see our walkthrough of [generating MCP servers for your SaaS users](https://truto.one/how-to-generate-mcp-servers-for-your-saas-users-2026-architecture-guide/).

## How Truto Solves MCP Server Authentication

Building this infrastructure from scratch—managing distributed locks for token refreshes, normalizing 429 headers across 100+ APIs, and dynamically generating JSON-RPC tools—is an engineering black hole. It pulls your team away from building your core product. For a single integration, it is reasonable to build this yourself. Once you cross the threshold of two or three providers per category, the math stops working.

Truto provides a production-ready unified API architecture for this exact problem. When a customer connects their third-party account (e.g., Salesforce, HubSpot, Jira) through Truto, the platform can automatically expose that specific connected account as an MCP-compatible tool server.

Here is how Truto handles the heavy lifting end-to-end:

*   **Per-Tenant Isolation:** Each connected account gets its own MCP endpoint at `/mcp/<token>`. The token cryptographically binds the server to one integrated account, so a request to one tenant's MCP URL physically cannot return data for another tenant.
*   **Automated Token Lifecycles:** Truto automatically manages the OAuth token lifecycle. Tokens are proactively refreshed on a randomized schedule (60-180 seconds before `expires_at`), serialized per account through a mutex lock to prevent thundering-herd race conditions.
*   **Conditional Defense in Depth:** Truto supports an optional `require_api_token_auth` flag on the MCP server, requiring the agent to present both the URL-embedded token and a valid platform API token in the `Authorization` header to execute tools.
*   **Dynamic Tool Derivation:** MCP servers in Truto are dynamically generated per-tenant based on integration documentation and schemas. Adding documentation for a new endpoint surfaces it as a tool on the next `tools/list` call. No deploy required.
*   **Transparent Rate Limiting:** Truto normalizes upstream rate limit info into standardized headers (`ratelimit-limit`, `ratelimit-remaining`, `ratelimit-reset`) per the IETF spec. When an upstream API returns HTTP 429, Truto passes that error directly to the caller, leaving retry and backoff strategy to whoever owns the agent loop.
*   **Method and Tag Filters:** A single connected account can produce multiple MCP servers. One can be scoped to read-only `support` tools, another to read/write `directory` tools, all backed by the same underlying credentials.

## Where to Take This Next

The gap between "my agent works on my laptop" and "my agent passes a Fortune 500 security review" is mostly the work outlined above. If you are about to ship MCP for your platform, here are your immediate action items:

1.  **Audit your current agent auth.** If anything in production is a shared service-account API key, that is the first thing to retire.
2.  **Wire up Protected Resource Metadata and audience-validated JWTs.** This is the table-stakes work that makes external connectors actually function.
3.  **Decide your DCR posture.** If you want Claude Desktop users to self-connect, you need DCR or a control plane in front of your IdP. 
4.  **Move refresh off the request path.** Proactive, mutex-protected refresh is the difference between an agent that pauses for OAuth latency mid-reasoning and one that operates smoothly.
5.  **Stop hardcoding tools.** Derive them from schema and documentation so your AI surface evolves with your API, not behind it.

If you are building an AI agent that needs to act on external SaaS data, you cannot rely on shared API keys and static OpenAPI specs. You need a secure, multi-tenant middleware layer that respects end-user identity. The protocol is finally stable enough to build against. The remaining decisions are yours.

> Stop burning engineering cycles on OAuth token refreshes and rate limit normalization. Talk to our team about how Truto's per-tenant MCP servers fit into your architecture so you can focus on agentic reasoning.
>
> [Talk to us](https://cal.com/truto/partner-with-truto)
