Skip to content

How to Debug MCP Servers in Production: Local Inspector to Remote Transport

Debug production MCP servers end-to-end: JSON-RPC inspection, token lifecycle, agent-side 429 handling with code, and a StackOne vs Composio vs Truto platform comparison for AI agents.

Nachi Raman Nachi Raman · · 21 min read
How to Debug MCP Servers in Production: Local Inspector to Remote Transport

Your MCP server runs flawlessly against the local Inspector. You wire it into Claude or Cursor, deploy it behind HTTPS, and the AI agent silently stops calling tools—or worse, calls them and reports cheerful success while the third-party API actually returned an HTTP 401 Unauthorized.

Debugging Model Context Protocol (MCP) servers in production requires isolating failures across three distinct layers: the JSON-RPC protocol, the transport mechanism, and the underlying third-party API. When your AI agent fails to fetch data from Salesforce or Jira, the root cause is rarely the LLM itself. It is usually a dropped HTTP connection, a schema mismatch in your tool definition, or a silent rate limit error that the client failed to handle.

This guide breaks down exactly how to debug MCP servers as you scale from local prototypes to production-grade remote deployments. We will walk through the workflow senior engineers and PMs at B2B SaaS companies use to ship reliable MCP servers when local STDIO is no longer enough.

The protocol has reached the kind of scale where these failure modes are now everyone's problem. MCP hit roughly 97 million monthly SDK downloads by March 2026, growing from 2 million at launch—a roughly 4,750% climb in just 16 months. Production debugging is no longer optional knowledge.

The Shift from Local STDIO to Remote MCP Servers

Most MCP development starts with the local STDIO transport. When you build an MCP server locally using the official SDKs, the AI client (like Claude Desktop) spawns your server as a child process and communicates by piping JSON-RPC messages directly over stdin and stdout.

It is trivial to debug. You can console.log to stderr, attach a debugger, and read the raw protocol traffic in a single terminal. There is no network latency, no load balancer dropping idle connections, and no OAuth token expiration to worry about. As of April 2026, roughly 67% of MCP servers still run over local STDIO, while around 28% use Streamable HTTP for remote, OAuth-mediated workloads.

Moving that exact same server to production breaks everything. That 28% is where everything gets harder.

Enterprise SaaS workloads—like multi-tenant Salesforce, Workday, or Jira access—cannot run as a subprocess on a single laptop. As detailed in our guide to architecting multi-tenant MCP servers, they need to be reachable, authenticated per tenant, horizontally scalable, and resilient to connection drops. You have to put your MCP server behind an API gateway, secure it with authentication, and route traffic over the internet. The moment you switch transports, your debug surface explodes into four distinct layers:

  • Network layer: TLS termination, API gateways, idle timeouts, and CDN buffering of Server-Sent Events (SSE) streams.
  • Auth layer: OAuth refresh tokens that expire mid-conversation, or bearer tokens silently dropped by reverse proxies.
  • State layer: Streamable HTTP sessions, resumability via Last-Event-ID, and sticky routing requirements.
  • Tenant layer: One AI agent accessing many customer accounts, each with its own credentials, schemas, and API quotas.
flowchart LR
  A[AI Client<br>Claude / ChatGPT / Cursor] -- JSON-RPC 2.0 --> B[Transport<br>STDIO or Streamable HTTP]
  B --> C[MCP Server]
  C --> D[OAuth Token Store]
  C --> E[Third-Party SaaS API<br>Salesforce / Jira / HubSpot]
  E -. 429 / 401 / 5xx .-> C
  C -. JSON-RPC error<br>or isError: true .-> A

Local STDIO collapses everything in this diagram into a single local process. Remote MCP forces you to debug every single hop. Debugging this shift requires understanding that the transport layer is entirely separate from the protocol layer. A perfectly valid JSON-RPC tool call will still fail if the underlying HTTP request gets blocked by a Web Application Firewall. If you are still weighing whether to own this infrastructure stack, our analysis on the hidden costs of custom MCP servers covers the long-tail maintenance work in detail.

Why AI Clients Hide Your MCP Errors

Claude Desktop, ChatGPT, Cursor, and custom LangGraph agents (which we cover in our guide to multi-agent MCP systems) are built to provide a smooth user experience and keep a conversation flowing. When an underlying MCP tool call fails, the client typically abstracts the raw JSON-RPC error into a generic natural language response, masking the technical root cause from the developer.

If you ask an AI agent to "List all recent Jira tickets," and the underlying MCP server returns an HTTP 401 Unauthorized because the OAuth token expired, the LLM will not show you the stack trace. It will confidently say, "I'm sorry, I couldn't access Jira right now," or worse, it might hallucinate a successful response based on old context.

This is a deliberate UX choice, not a bug. But it means you cannot debug a production MCP server through the AI client alone. You do not know if the failure was caused by a network timeout, a malformed query schema, a missing required parameter, or an authentication failure.

The failure modes most engineering teams hit, in rough order of frequency:

  1. Tool schema mismatch: The LLM sends customer_id as a string, your server's schema declares it as an integer, JSON Schema validation throws an error, and the client swallows it.
  2. Silent upstream failures: The third-party API returns an HTTP 200 OK with {"error": "invalid_grant"} in the body. Your MCP server blindly returns success. The agent confidently lies to the user.
  3. Transport drops: An SSE connection idles past your load balancer's 30-second timeout. The client reconnects, but session state was lost, and the tool call fails silently.
  4. Auth header stripping: A reverse proxy or WAF strips Authorization headers it doesn't recognize before they ever reach your MCP server.

None of these are visible from the chat window. You need a tool that speaks raw MCP to see exactly what the LLM sent and exactly what the server returned.

Using the MCP Inspector for Local Contract Testing

The official MCP Inspector is an interactive, browser-based debugging client maintained alongside the protocol specification. Think of it as Postman for the Model Context Protocol. It allows developers to connect directly to an MCP server over STDIO, SSE, or Streamable HTTP, list available tools, inspect JSON schemas, and manually execute JSON-RPC requests without involving an LLM.

Before you ever connect your MCP server to an AI agent, you must validate the protocol contract. While testing and mocking MCP servers in CI/CD handles automated validation, local debugging starts with the Inspector.

You can run it against a local server with:

npx @modelcontextprotocol/inspector node ./build/server.js

Or for a remote production server:

npx @modelcontextprotocol/inspector
# Then point the UI at https://your-server.example.com/mcp

Here is the exact checklist of what to test before any LLM touches your server:

  1. Initialize handshake: Confirm the server returns the right protocolVersion and advertises the capabilities you expect (tools, resources, prompts). Mismatched protocol versions are a massive source of silent failures.
  2. tools/list output: Inspect every tool's inputSchema. Look for missing required arrays, ambiguous enum values, and vague description fields. If a description is vague, the LLM will invent arguments. If a parameter is required by the upstream API but missing from the schema, the tool will eventually fail.
  3. tools/call with edge cases: Manually construct JSON payloads. Send null, empty strings, oversized payloads, and known-invalid IDs. Verify the server returns a JSON-RPC error with a useful message, or a content block with isError: true.
  4. Pagination cursors: If your tools return cursors, call them twice and confirm the cursor round-trips byte-for-byte. LLMs love to "helpfully" decode or modify cursor strings.

When you test tools/call, you catch schema mapping errors natively.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "create_a_hubspot_contact",
    "arguments": {
      "email": "test@example.com",
      "first_name": "Alice"
    }
  }
}

If this call fails, the Inspector will show you the exact JSON-RPC error response. You might discover that the upstream API actually requires firstname instead of first_name. Fixing this at the documentation layer ensures the LLM generates the correct payload in production.

Tip

A March 2026 arXiv study analyzing 407 real-world MCP issues found that server settings, tool configuration, and host configuration were the most prevalent fault categories—not the protocol itself. Catching these with the Inspector before deployment saves hours of agent-side debugging.

Quickstart: Registering an MCP Server with Claude, ChatGPT, and Custom Agents

Once your MCP server passes the Inspector checks, the next step is wiring it into actual AI clients. If you are using a managed platform like Truto, server creation is a single API call. The response gives you a URL that encodes all the authentication and scoping you need - no additional config on the client side.

Creating an MCP Server via API

curl -X POST https://api.truto.one/integrated-account/<account_id>/mcp \
  -H "Authorization: Bearer <your_api_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "HubSpot Read-Only MCP",
    "config": {
      "methods": ["read"]
    },
    "expires_at": "2026-07-01T00:00:00Z"
  }'

The response includes a ready-to-use URL:

{
  "id": "abc-123",
  "name": "HubSpot Read-Only MCP",
  "config": { "methods": ["read"] },
  "expires_at": "2026-07-01T00:00:00Z",
  "url": "https://api.truto.one/mcp/a1b2c3d4e5f6..."
}

That url is the only thing the client needs. The token embedded in the URL authenticates the request and scopes it to a specific connected account. You can restrict the server to read-only methods, filter by tool tags (e.g., only support tools), and set an expiration date - all at creation time.

Registering with Claude

  1. Copy the MCP server URL from the API response.
  2. In Claude: Settings -> Connectors -> Add custom connector.
  3. Paste the URL and click Add.
  4. Claude discovers tools via MCP automatically. Custom connectors via remote MCP are available on Free, Pro, Max, Team, and Enterprise plans (Free is limited to one connector).

Registering with ChatGPT

  1. In ChatGPT: Settings -> Apps -> Advanced settings.
  2. Enable Developer mode (MCP support is behind this flag).
  3. Under MCP servers, add a new server with your URL and a descriptive name.
  4. Save. ChatGPT connects and lists available tools.

Developer Mode is available on Pro, Plus, Business, Enterprise, and Education accounts.

Connecting from a Python Agent

For custom agents built with LangChain, OpenAI Agents SDK, or a bare httpx client, point them at the URL with a standard HTTP POST:

import httpx
 
MCP_URL = "https://api.truto.one/mcp/a1b2c3d4e5f6..."
 
# Discover available tools
response = httpx.post(MCP_URL, json={
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/list",
    "params": {}
})
 
tools = response.json()["result"]["tools"]
for tool in tools:
    print(f"{tool['name']}: {tool['description'][:80]}")

No SDK wrappers, no special auth headers - the token is in the URL path.

Token Lifecycle: Generation, Storage, and Validation

Understanding how MCP server tokens work end-to-end is essential for debugging authentication failures in production. When a token lookup fails, you need to know whether the issue is an expired TTL, a corrupted hash, or a revoked server.

sequenceDiagram
    participant Dev as Developer
    participant API as MCP Platform API
    participant KV as Token Store
    participant Sched as Expiry Scheduler

    Dev->>API: POST /integrated-account/:id/mcp<br>{name, config, expires_at}
    API->>API: Generate random hex token
    API->>API: HMAC-hash the raw token
    API->>KV: Store hashed_token -> metadata<br>(account_id, environment_id, team_id)
    API->>KV: Store token_id -> hashed_token<br>(reverse lookup for deletion)
    opt expires_at is set
        API->>KV: Set TTL on both entries
        API->>Sched: Schedule cleanup alarm
    end
    API-->>Dev: {id, url with raw token}
    Note over Dev: Raw token returned once.<br>Never stored by the platform.

Generation

When you create an MCP server, the platform generates a random hex string as the token. This raw token is immediately hashed using HMAC with a signing key before it is stored anywhere. The raw value is returned exactly once in the creation response - the platform never persists it. If you lose it, you create a new server.

Storage

Two entries are created for bidirectional lookup:

  • Forward entry (hashed_token -> metadata): Used during request authentication. Contains the integrated account ID, environment ID, team ID, and optional expiration.
  • Reverse entry (token_id -> hashed_token): Used during deletion. Given a database ID, look up the hashed token to delete the forward entry.

Both entries share the same TTL when the server has an expires_at.

Validation (Every Request)

On every incoming request to /mcp/<raw_token>, the server:

  1. Extracts the raw token from the URL path.
  2. HMAC-hashes it with the signing key.
  3. Looks up the hashed value in the token store.
  4. Checks expiration.
  5. Loads the associated integrated account and its config (method filters, tag filters).
  6. Proceeds to tool execution.

If any step fails - missing token, expired TTL, deleted account - the request gets a 401 Unauthorized before the JSON-RPC layer ever runs. This is the first thing to check when an agent goes silent after a server was working.

Expiration and TTL

Expiring servers are enforced at multiple levels. The token store entries carry a TTL, so lookups fail immediately after expiry. A scheduled cleanup alarm also fires to delete the database record and both token store entries, ensuring no stale data remains. The expires_at must be at least 60 seconds in the future at creation time.

You can update expiration with a PATCH request - set a new future datetime to extend, or set null to make the server permanent.

Anatomy of a Full JSON-RPC Tool Call

Seeing the complete request-response cycle - including pagination and error states - saves time when you are staring at agent logs trying to figure out why a tool call silently produced no results.

Successful List Call with Pagination

Request:

{
  "jsonrpc": "2.0",
  "id": 42,
  "method": "tools/call",
  "params": {
    "name": "list_all_hub_spot_contacts",
    "arguments": {
      "limit": "10"
    }
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 42,
  "result": {
    "content": [{
      "type": "text",
      "text": "{\"result\": [{\"id\": \"501\", \"first_name\": \"Alice\"}, {\"id\": \"502\", \"first_name\": \"Bob\"}], \"next_cursor\": \"eyJhZnRlciI6NTAyfQ==\", \"request_id\": \"abc-123\"}"
    }]
  }
}

The next_cursor value must be passed back to the next call byte-for-byte. The tool schema explicitly instructs the LLM not to decode or modify it. If your agent is silently producing empty second pages, check whether the LLM is URL-decoding or base64-decoding the cursor.

Error Response with isError: true

When the upstream API fails, a well-built MCP server returns the error inside a successful JSON-RPC response with the isError flag:

{
  "jsonrpc": "2.0",
  "id": 42,
  "result": {
    "content": [{
      "type": "text",
      "text": "{\"error\": \"INVALID_TYPE\", \"message\": \"Object type 'CustomWidget__c' not found. Check the resource name.\"}"
    }],
    "isError": true
  }
}

This lets the LLM read the error text and adjust its next tool call. If you instead throw a hard JSON-RPC error at the transport layer, most clients will give up rather than retry with corrected arguments.

Diagnosing Remote Transport Errors and Rate Limits

Once your server is reachable over HTTPS, a new class of bugs appears. Streamable HTTP is stateful enough to break in interesting ways but stateless enough that you cannot rely on sticky sessions.

The transport checklist when an agent goes quiet:

  • Idle timeouts: If your server sits behind Cloudflare or AWS API Gateway, those proxies enforce strict timeout limits (often 30 to 60 seconds). If a complex third-party API search takes 45 seconds, the gateway kills the connection. The MCP client receives a 502 Bad Gateway or 504 Gateway Timeout. Send keep-alive events, or design the tool to return a job ID. The long-running task pattern covers this in depth.
  • Buffering: CDNs and reverse proxies sometimes buffer SSE responses, defeating streaming entirely. Set X-Accel-Buffering: no for Nginx and disable buffering at the CDN.
  • CORS: Browser-based MCP clients need Access-Control-Allow-Origin, plus Mcp-Session-Id, Mcp-Protocol-Version, and Last-Event-ID in Access-Control-Expose-Headers.
  • Session affinity: If your server stores per-session state, you need sticky routing. Stateless HTTP POST architectures sidestep this entirely.

Rate Limits Are the Caller's Problem

This catches teams off guard constantly. When an AI agent is given a task like "analyze all support tickets from the last year," it will aggressively call the list_tickets tool in a tight loop, paging through cursors as fast as the network allows. It will hit the third-party API's rate limit almost immediately.

When a third-party SaaS API returns an HTTP 429 Too Many Requests, a well-designed MCP server passes that error straight through to the caller with standardized headers. It does not silently retry, sleep, and pretend success.

At platforms like Truto, we pass upstream 429s through to the caller untouched and normalize the upstream rate limit info into IETF-standard headers:

HTTP/1.1 429 Too Many Requests
ratelimit-limit: 100
ratelimit-remaining: 0
ratelimit-reset: 42

The caller—meaning your AI agent, your client SDK, or your custom orchestrator—is responsible for inspecting these headers, applying exponential backoff, and adding jitter.

sequenceDiagram
    participant LLM as AI Agent
    participant MCP as MCP Server
    participant API as Third-Party SaaS

    LLM->>MCP: tools/call (list_records)<br>Cursor: page_4
    MCP->>API: GET /v1/records?page=4
    API-->>MCP: HTTP 429 Too Many Requests<br>Retry-After: 60
    MCP-->>LLM: JSON-RPC Error<br>Status: 429<br>Headers: ratelimit-reset
    Note over LLM: Agent pauses execution<br>Applies exponential backoff
    LLM->>MCP: tools/call (list_records)<br>Cursor: page_4

An MCP server that swallows 429s with internal retries by holding the connection open breaks the agent's ability to reason about cost, latency, and quota budgets. It also guarantees you will hit the gateway timeout limits mentioned earlier. For the deeper treatment of this pattern across many upstreams at once, see our guide on handling rate limits and retries across third-party APIs.

Agent-Side 429 Handling: Sample Implementation

Your agent or orchestrator needs to handle rate limits explicitly. Here is a minimal Python implementation using exponential backoff with jitter that reads the standardized ratelimit-reset header:

import httpx
import time
import random
 
def call_mcp_tool(mcp_url: str, tool_name: str, arguments: dict,
                  max_retries: int = 5) -> dict:
    payload = {
        "jsonrpc": "2.0",
        "id": 1,
        "method": "tools/call",
        "params": {"name": tool_name, "arguments": arguments}
    }
 
    for attempt in range(max_retries):
        resp = httpx.post(mcp_url, json=payload, timeout=30)
 
        if resp.status_code == 429:
            # Prefer the server's reset hint over blind backoff
            reset_seconds = int(resp.headers.get("ratelimit-reset", 60))
            jitter = random.uniform(0, min(reset_seconds * 0.1, 5))
            wait = reset_seconds + jitter
            print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
            time.sleep(wait)
            continue
 
        result = resp.json()
 
        # Check for application-level errors
        content = result.get("result", {}).get("content", [{}])
        if result.get("result", {}).get("isError"):
            print(f"Tool error: {content[0].get('text', 'unknown')}")
            return result  # Let the agent reason about the error
 
        return result
 
    raise TimeoutError(f"Exhausted {max_retries} retries for {tool_name}")

The key principle: use the ratelimit-reset header as the wait floor, then add jitter to avoid thundering-herd retries when multiple agents hit the same upstream. If the header is missing, fall back to exponential backoff starting at 60 seconds.

Normalizing Third-Party API Errors in MCP Responses

Error normalization is the process of translating inconsistent, proprietary error payloads from third-party APIs into a standardized format.

Third-party APIs return errors in roughly 404 different shapes. Slack will return an HTTP 200 OK with {"ok": false, "error": "channel_not_found"} in the body. Salesforce returns an HTTP 400 with a nested array: [{"errorCode": "INVALID_TYPE", "message": "..."}]. NetSuite stuffs structured errors into a free-form string. Older enterprise systems might return an HTTP 500 with a raw HTML stack trace.

If your MCP server blindly passes these raw shapes back as tool results, the LLM will either treat the call as successful (because the HTTP status was 200), try to parse HTML as JSON and crash, or hallucinate an explanation for a malformed error blob. The LLM needs a clear, concise string explaining exactly why the tool failed so it can adjust its parameters and try again.

The fix is to normalize errors at the MCP layer before they reach the model. Two patterns work in production:

1. Use isError: true in the tool result. The MCP specification allows you to return a structured error inside a successful JSON-RPC response. This signals to the AI client that the tool executed, but the outcome was a failure. This allows the LLM to read the text content, realize it provided a malformed argument, and autonomously issue a new tool call with the corrected format.

{
  "jsonrpc": "2.0",
  "id": 42,
  "result": {
    "content": [{
      "type": "text",
      "text": "{\"error\": \"channel_not_found\", \"hint\": \"The channel may have been archived.\"}"
    }],
    "isError": true
  }
}

If you just throw a hard HTTP 500 error at the transport layer, the LLM cannot recover.

2. Use expression-based error extraction. Hardcoding error parsers per integration does not scale. At Truto, we use JSONata expressions at the integration or per-method level to map any provider error shape into a clean status code, message, and metadata block. A Slack 200-with-error becomes a proper 400. A nested Salesforce errors [0].message becomes the top-level message.

For a tour of the worst offenders and how to tame them, see our piece on 404 reasons third-party APIs can't get their errors straight.

A second massive benefit of normalization: when a normalized 401 Unauthorized propagates up, your platform can flag the integrated account as needs_reauth, fire a webhook, and surface a clear "reconnect Slack" prompt in your UI, rather than letting the agent loop endlessly on a dead token.

How Managed MCP Platforms Compare: StackOne vs Composio vs Truto

The honest read on debugging production MCP servers is that most of the pain is not in the JSON-RPC protocol itself. It is in everything around it: OAuth refresh storms, upstream schema drift, malformed errors, transport quirks, and per-tenant credential isolation.

You can absolutely build this yourself. Around 30% of MCP builders already route traffic through API gateways to handle scaling and security. However, debugging custom MCP servers is a massive drain on engineering resources. You end up spending more time writing error extraction regexes, handling token refresh edge cases, and debugging SSE connection drops than you do actually building AI features.

This is why engineering teams increasingly evaluate managed platforms. If you are comparing StackOne, Composio, and Truto as MCP server platforms for AI agents, the differences come down to where each platform draws the boundary between what it handles silently and what it exposes to your agent.

Architectural Comparison

Dimension StackOne Composio Truto
Integration count 280+ connectors 500-850+ apps 200+ integrations (unified + proxy)
MCP approach MCP gateway - single endpoint for all integrations Rube universal server + Tool Router for dynamic discovery Per-account MCP servers with cryptographic token URLs
Tool generation Pre-built per-connector actions Pre-built action library with SDK bindings Dynamic - generated from integration resource definitions and documentation
Rate limit handling Absorbs 429s internally with automatic retries Absorbs 429s with built-in retry logic Passes 429s through with IETF-standard ratelimit-* headers
Auth model Connect Session with per-user OAuth SDK-managed OAuth + API key enforcement Per-tenant tokens with HMAC hashing; optional API token auth layer
Server scoping Per-account filtering via dashboard Tool Router selects relevant tools per prompt Method filters (read/write/custom), tag filters, and TTL expiry per server
Data retention Zero storage by default Encryption at rest (credential vault) Stateless pass-through - no data at rest
Open source Defender (prompt injection) is open source Core SDK is MIT-licensed (27k+ GitHub stars) Closed source, cloud-hosted
Extra capabilities Prompt injection defense (Defender), A2A protocol support MCP Gateway for enterprise RBAC, on-premises deployment option Unified API + Proxy API dual layer, JSONata-based error normalization

StackOne: Full Abstraction

StackOne runs an execution engine that absorbs all network complexity on your behalf. It retries failed requests, queues rate-limited calls, and scans responses for prompt injection before they reach the LLM. The agent sends a request and gets a clean result - it never sees the retries or the backoff logic. This is ideal for teams that want zero infrastructure concern and can tolerate occasional opaque latency spikes when the platform is silently retrying behind the scenes. The trade-off: your agent cannot reason about upstream API state because it never receives rate limit signals.

Composio: Framework Breadth

Composio is optimized for fast prototyping and framework compatibility. With native SDKs for LangChain, CrewAI, OpenAI Agents SDK, Google ADK, and every major agent framework, it is typically the fastest path from zero to a working demo. The Tool Router dynamically selects relevant tools per prompt to keep context windows clean. The trade-off: the unified data model is secondary to the tool-calling interface. If you need normalized CRM or HRIS schemas across providers, you build that mapping yourself. Observability for production debugging is still maturing compared to enterprise-focused platforms.

Truto: Agent Control and Transparency

Truto takes a different approach by keeping the agent in the loop. Rate limit errors pass through with standardized headers so the agent or orchestrator decides when to retry. Tool generation is documentation-driven - if a resource method has a description and a JSON Schema, it becomes an MCP tool automatically, which eliminates schema mismatch errors between what the LLM sees and what the upstream API expects. Per-tenant servers with cryptographic tokens, method/tag filtering, and TTL expiry give fine-grained access control. The trade-off: fewer raw integration actions than Composio's catalog count, and the transparent rate-limit model requires your agent code to handle 429s explicitly (as shown in the sample code above).

For the full side-by-side comparison with pricing, security analysis, and prototype evaluation rubric, see our dedicated StackOne vs Composio vs Truto breakdown. Our buyer's guide to MCP server platforms covers additional platforms including Arcade.dev.

What a Managed Platform Changes for Debugging

Regardless of which platform you choose, the debugging surface area shrinks dramatically:

  • Stateless Transport: Instead of dealing with stateful SSE connections, managed platforms handle JSON-RPC 2.0 traffic over standard stateless HTTP POST endpoints. This completely eliminates the load balancing, buffering, and connection timeout headaches associated with remote deployments.
  • Dynamic Tool Generation: Tool generation is documentation-driven. Rather than hand-coding tool definitions for every single integration, the platform derives them directly from the integration's resource definitions. If a resource method has a description and a schema, it becomes an MCP tool. This guarantees the JSON schema exposed to the LLM perfectly matches the payload expected by the upstream proxy, eliminating schema mismatch errors entirely.
  • Automated Authentication: The platform refreshes OAuth tokens ahead of expiry, automatically injects the correct bearer token based on the cryptographic session ID, and maps the flat MCP input namespace into the correct query and body parameters.

If an authentication failure does occur, the platform's error expressions catch it, mark the account for re-authentication, and return a clean error to the LLM. You stop debugging transport layers and start shipping agentic workflows.

Production Testing Checklist: OAuth, TTL, Rate Limits, and Webhooks

Before you ship an MCP server to production, run these seven tests. Each one catches a specific class of failure that will not surface during development but will absolutely surface at 2 AM when your largest customer's agent stops working.

1. OAuth Token Rotation

Force-expire an OAuth token while a conversation is active. The platform should refresh the token transparently and the tool call should succeed on retry. If the tool returns a raw 401 instead of triggering a token refresh, your auth lifecycle has a gap.

2. TTL Server Expiry

Create an MCP server with a short expires_at (e.g., 5 minutes from now). Confirm that:

  • Tool calls succeed before expiry.
  • Tool calls return 401 Unauthorized after expiry.
  • The token store entries are cleaned up (no stale data).

3. Rate Limit Passthrough

Trigger an HTTP 429 from the upstream API (many sandbox environments have low rate limits for this purpose). Verify:

  • The MCP response includes ratelimit-reset in the headers.
  • Your agent code waits the specified duration and retries.
  • The retry succeeds.

4. Schema Validation

Send a tools/call with deliberately wrong argument types - a string where the schema expects an integer, a missing required field, a null ID. Verify the response contains a useful error message with isError: true, not a raw stack trace.

5. Pagination Cursor Round-Trip

Call a list tool, extract the next_cursor from the response, and feed it back unchanged in the next call. Verify page 2 returns fresh results. Then intentionally mangle the cursor (decode it, trim it, re-encode it) and verify the server returns a clear error rather than an empty result set.

6. Auth Header Propagation

If your MCP server uses additional API token authentication, confirm the Authorization header survives your entire infrastructure stack: reverse proxy, CDN, WAF, and load balancer. Strip-and-forward bugs in reverse proxies are the single most common cause of "it works locally but fails in production."

7. Error Normalization

Trigger a known error from each upstream provider you support. For example:

  • A Slack 200-with-error ({"ok": false, "error": "channel_not_found"}).
  • A Salesforce 400 with nested error array.
  • An expired OAuth token that returns a 200 with {"error": "invalid_grant"}.

Verify each one produces a normalized, structured error with isError: true in the MCP response, not raw provider noise.

Where to Go From Here

A production-ready MCP debugging workflow looks roughly like this:

  1. Build and test locally over STDIO with the MCP Inspector. Lock the tool schemas, test edge cases, and verify error contracts before any LLM touches them.
  2. Promote to remote Streamable HTTP behind your reverse proxy. Re-run the Inspector against the live URL to confirm CORS, auth header propagation, and transport behaviors survive the network.
  3. Instrument the JSON-RPC layer. Log every tools/call invocation with arguments, latency, upstream HTTP status, and normalized error code. AI clients will not surface these failures for you.
  4. Push retry and backoff to the caller. Standardize on IETF rate limit headers and let the agent or client SDK handle HTTP 429s with exponential backoff and jitter.
  5. Normalize upstream errors with expression-based extraction so the LLM sees a clean, structured failure signal with isError: true—never raw provider noise.

If you would rather spend your engineering cycles on the agent logic instead of the transport plumbing, we are happy to walk through how Truto generates and operates MCP servers for hundreds of SaaS integrations out of the box.

FAQ

How do StackOne, Composio, and Truto compare as MCP server platforms for AI agents?
StackOne absorbs all network complexity (rate limits, retries) internally, giving agents clean results but hiding upstream state. Composio provides the broadest integration catalog (500-850+ apps) with fast framework-native SDKs, optimized for prototyping speed. Truto passes rate limit errors through with IETF-standard headers and dynamically generates MCP tools from integration documentation, giving agents full context to reason about failures. Choose StackOne for simplicity, Composio for breadth, Truto for agent control.
How do I register a remote MCP server with Claude Desktop or ChatGPT?
In Claude: go to Settings, then Connectors, then Add custom connector, and paste your MCP server URL. In ChatGPT: go to Settings, then Apps, then Advanced settings, enable Developer mode, and add your server URL under MCP servers. The URL alone handles authentication - no extra configuration needed if the token is embedded in the URL path.
How should an AI agent handle HTTP 429 rate limit errors from MCP servers?
Read the ratelimit-reset header from the response to determine how long to wait, add random jitter to avoid thundering-herd retries, and retry the same tool call after the wait period. Never have your MCP server silently retry internally - this hides upstream API state from the agent and causes unpredictable latency spikes.
How do MCP server tokens work for authentication?
The platform generates a random hex token at server creation time, HMAC-hashes it before storage, and returns the raw token once in the API response. On each request, the raw token from the URL is hashed and looked up in the token store. If the server has an expires_at, the token store entries carry a TTL and a cleanup alarm deletes stale records automatically.
What should I test before shipping an MCP server to production?
Run seven tests: OAuth token rotation mid-conversation, TTL server expiry behavior, rate limit passthrough with ratelimit-reset headers, schema validation with wrong argument types, pagination cursor round-trip integrity, auth header propagation through your infrastructure stack, and error normalization for each upstream provider's error format.

More from our Blog