How to Debug MCP Servers in Production: Local Inspector to Remote Transport
Debug production MCP servers end-to-end: JSON-RPC inspection, token lifecycle, agent-side 429 handling with code, and a StackOne vs Composio vs Truto platform comparison for AI agents.
Your MCP server runs flawlessly against the local Inspector. You wire it into Claude or Cursor, deploy it behind HTTPS, and the AI agent silently stops calling tools—or worse, calls them and reports cheerful success while the third-party API actually returned an HTTP 401 Unauthorized.
Debugging Model Context Protocol (MCP) servers in production requires isolating failures across three distinct layers: the JSON-RPC protocol, the transport mechanism, and the underlying third-party API. When your AI agent fails to fetch data from Salesforce or Jira, the root cause is rarely the LLM itself. It is usually a dropped HTTP connection, a schema mismatch in your tool definition, or a silent rate limit error that the client failed to handle.
This guide breaks down exactly how to debug MCP servers as you scale from local prototypes to production-grade remote deployments. We will walk through the workflow senior engineers and PMs at B2B SaaS companies use to ship reliable MCP servers when local STDIO is no longer enough.
The protocol has reached the kind of scale where these failure modes are now everyone's problem. MCP hit roughly 97 million monthly SDK downloads by March 2026, growing from 2 million at launch—a roughly 4,750% climb in just 16 months. Production debugging is no longer optional knowledge.
The Shift from Local STDIO to Remote MCP Servers
Most MCP development starts with the local STDIO transport. When you build an MCP server locally using the official SDKs, the AI client (like Claude Desktop) spawns your server as a child process and communicates by piping JSON-RPC messages directly over stdin and stdout.
It is trivial to debug. You can console.log to stderr, attach a debugger, and read the raw protocol traffic in a single terminal. There is no network latency, no load balancer dropping idle connections, and no OAuth token expiration to worry about. As of April 2026, roughly 67% of MCP servers still run over local STDIO, while around 28% use Streamable HTTP for remote, OAuth-mediated workloads.
Moving that exact same server to production breaks everything. That 28% is where everything gets harder.
Enterprise SaaS workloads—like multi-tenant Salesforce, Workday, or Jira access—cannot run as a subprocess on a single laptop. As detailed in our guide to architecting multi-tenant MCP servers, they need to be reachable, authenticated per tenant, horizontally scalable, and resilient to connection drops. You have to put your MCP server behind an API gateway, secure it with authentication, and route traffic over the internet. The moment you switch transports, your debug surface explodes into four distinct layers:
- Network layer: TLS termination, API gateways, idle timeouts, and CDN buffering of Server-Sent Events (SSE) streams.
- Auth layer: OAuth refresh tokens that expire mid-conversation, or bearer tokens silently dropped by reverse proxies.
- State layer: Streamable HTTP sessions, resumability via
Last-Event-ID, and sticky routing requirements. - Tenant layer: One AI agent accessing many customer accounts, each with its own credentials, schemas, and API quotas.
flowchart LR A[AI Client<br>Claude / ChatGPT / Cursor] -- JSON-RPC 2.0 --> B[Transport<br>STDIO or Streamable HTTP] B --> C[MCP Server] C --> D[OAuth Token Store] C --> E[Third-Party SaaS API<br>Salesforce / Jira / HubSpot] E -. 429 / 401 / 5xx .-> C C -. JSON-RPC error<br>or isError: true .-> A
Local STDIO collapses everything in this diagram into a single local process. Remote MCP forces you to debug every single hop. Debugging this shift requires understanding that the transport layer is entirely separate from the protocol layer. A perfectly valid JSON-RPC tool call will still fail if the underlying HTTP request gets blocked by a Web Application Firewall. If you are still weighing whether to own this infrastructure stack, our analysis on the hidden costs of custom MCP servers covers the long-tail maintenance work in detail.
Why AI Clients Hide Your MCP Errors
Claude Desktop, ChatGPT, Cursor, and custom LangGraph agents (which we cover in our guide to multi-agent MCP systems) are built to provide a smooth user experience and keep a conversation flowing. When an underlying MCP tool call fails, the client typically abstracts the raw JSON-RPC error into a generic natural language response, masking the technical root cause from the developer.
If you ask an AI agent to "List all recent Jira tickets," and the underlying MCP server returns an HTTP 401 Unauthorized because the OAuth token expired, the LLM will not show you the stack trace. It will confidently say, "I'm sorry, I couldn't access Jira right now," or worse, it might hallucinate a successful response based on old context.
This is a deliberate UX choice, not a bug. But it means you cannot debug a production MCP server through the AI client alone. You do not know if the failure was caused by a network timeout, a malformed query schema, a missing required parameter, or an authentication failure.
The failure modes most engineering teams hit, in rough order of frequency:
- Tool schema mismatch: The LLM sends
customer_idas a string, your server's schema declares it as an integer, JSON Schema validation throws an error, and the client swallows it. - Silent upstream failures: The third-party API returns an HTTP 200 OK with
{"error": "invalid_grant"}in the body. Your MCP server blindly returns success. The agent confidently lies to the user. - Transport drops: An SSE connection idles past your load balancer's 30-second timeout. The client reconnects, but session state was lost, and the tool call fails silently.
- Auth header stripping: A reverse proxy or WAF strips
Authorizationheaders it doesn't recognize before they ever reach your MCP server.
None of these are visible from the chat window. You need a tool that speaks raw MCP to see exactly what the LLM sent and exactly what the server returned.
Using the MCP Inspector for Local Contract Testing
The official MCP Inspector is an interactive, browser-based debugging client maintained alongside the protocol specification. Think of it as Postman for the Model Context Protocol. It allows developers to connect directly to an MCP server over STDIO, SSE, or Streamable HTTP, list available tools, inspect JSON schemas, and manually execute JSON-RPC requests without involving an LLM.
Before you ever connect your MCP server to an AI agent, you must validate the protocol contract. While testing and mocking MCP servers in CI/CD handles automated validation, local debugging starts with the Inspector.
You can run it against a local server with:
npx @modelcontextprotocol/inspector node ./build/server.jsOr for a remote production server:
npx @modelcontextprotocol/inspector
# Then point the UI at https://your-server.example.com/mcpHere is the exact checklist of what to test before any LLM touches your server:
- Initialize handshake: Confirm the server returns the right
protocolVersionand advertises the capabilities you expect (tools, resources, prompts). Mismatched protocol versions are a massive source of silent failures. tools/listoutput: Inspect every tool'sinputSchema. Look for missingrequiredarrays, ambiguous enum values, and vaguedescriptionfields. If a description is vague, the LLM will invent arguments. If a parameter is required by the upstream API but missing from the schema, the tool will eventually fail.tools/callwith edge cases: Manually construct JSON payloads. Sendnull, empty strings, oversized payloads, and known-invalid IDs. Verify the server returns a JSON-RPC error with a usefulmessage, or acontentblock withisError: true.- Pagination cursors: If your tools return cursors, call them twice and confirm the cursor round-trips byte-for-byte. LLMs love to "helpfully" decode or modify cursor strings.
When you test tools/call, you catch schema mapping errors natively.
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "create_a_hubspot_contact",
"arguments": {
"email": "test@example.com",
"first_name": "Alice"
}
}
}If this call fails, the Inspector will show you the exact JSON-RPC error response. You might discover that the upstream API actually requires firstname instead of first_name. Fixing this at the documentation layer ensures the LLM generates the correct payload in production.
A March 2026 arXiv study analyzing 407 real-world MCP issues found that server settings, tool configuration, and host configuration were the most prevalent fault categories—not the protocol itself. Catching these with the Inspector before deployment saves hours of agent-side debugging.
Quickstart: Registering an MCP Server with Claude, ChatGPT, and Custom Agents
Once your MCP server passes the Inspector checks, the next step is wiring it into actual AI clients. If you are using a managed platform like Truto, server creation is a single API call. The response gives you a URL that encodes all the authentication and scoping you need - no additional config on the client side.
Creating an MCP Server via API
curl -X POST https://api.truto.one/integrated-account/<account_id>/mcp \
-H "Authorization: Bearer <your_api_token>" \
-H "Content-Type: application/json" \
-d '{
"name": "HubSpot Read-Only MCP",
"config": {
"methods": ["read"]
},
"expires_at": "2026-07-01T00:00:00Z"
}'The response includes a ready-to-use URL:
{
"id": "abc-123",
"name": "HubSpot Read-Only MCP",
"config": { "methods": ["read"] },
"expires_at": "2026-07-01T00:00:00Z",
"url": "https://api.truto.one/mcp/a1b2c3d4e5f6..."
}That url is the only thing the client needs. The token embedded in the URL authenticates the request and scopes it to a specific connected account. You can restrict the server to read-only methods, filter by tool tags (e.g., only support tools), and set an expiration date - all at creation time.
Registering with Claude
- Copy the MCP server URL from the API response.
- In Claude: Settings -> Connectors -> Add custom connector.
- Paste the URL and click Add.
- Claude discovers tools via MCP automatically. Custom connectors via remote MCP are available on Free, Pro, Max, Team, and Enterprise plans (Free is limited to one connector).
Registering with ChatGPT
- In ChatGPT: Settings -> Apps -> Advanced settings.
- Enable Developer mode (MCP support is behind this flag).
- Under MCP servers, add a new server with your URL and a descriptive name.
- Save. ChatGPT connects and lists available tools.
Developer Mode is available on Pro, Plus, Business, Enterprise, and Education accounts.
Connecting from a Python Agent
For custom agents built with LangChain, OpenAI Agents SDK, or a bare httpx client, point them at the URL with a standard HTTP POST:
import httpx
MCP_URL = "https://api.truto.one/mcp/a1b2c3d4e5f6..."
# Discover available tools
response = httpx.post(MCP_URL, json={
"jsonrpc": "2.0",
"id": 1,
"method": "tools/list",
"params": {}
})
tools = response.json()["result"]["tools"]
for tool in tools:
print(f"{tool['name']}: {tool['description'][:80]}")No SDK wrappers, no special auth headers - the token is in the URL path.
Token Lifecycle: Generation, Storage, and Validation
Understanding how MCP server tokens work end-to-end is essential for debugging authentication failures in production. When a token lookup fails, you need to know whether the issue is an expired TTL, a corrupted hash, or a revoked server.
sequenceDiagram
participant Dev as Developer
participant API as MCP Platform API
participant KV as Token Store
participant Sched as Expiry Scheduler
Dev->>API: POST /integrated-account/:id/mcp<br>{name, config, expires_at}
API->>API: Generate random hex token
API->>API: HMAC-hash the raw token
API->>KV: Store hashed_token -> metadata<br>(account_id, environment_id, team_id)
API->>KV: Store token_id -> hashed_token<br>(reverse lookup for deletion)
opt expires_at is set
API->>KV: Set TTL on both entries
API->>Sched: Schedule cleanup alarm
end
API-->>Dev: {id, url with raw token}
Note over Dev: Raw token returned once.<br>Never stored by the platform.Generation
When you create an MCP server, the platform generates a random hex string as the token. This raw token is immediately hashed using HMAC with a signing key before it is stored anywhere. The raw value is returned exactly once in the creation response - the platform never persists it. If you lose it, you create a new server.
Storage
Two entries are created for bidirectional lookup:
- Forward entry (
hashed_token-> metadata): Used during request authentication. Contains the integrated account ID, environment ID, team ID, and optional expiration. - Reverse entry (
token_id-> hashed_token): Used during deletion. Given a database ID, look up the hashed token to delete the forward entry.
Both entries share the same TTL when the server has an expires_at.
Validation (Every Request)
On every incoming request to /mcp/<raw_token>, the server:
- Extracts the raw token from the URL path.
- HMAC-hashes it with the signing key.
- Looks up the hashed value in the token store.
- Checks expiration.
- Loads the associated integrated account and its config (method filters, tag filters).
- Proceeds to tool execution.
If any step fails - missing token, expired TTL, deleted account - the request gets a 401 Unauthorized before the JSON-RPC layer ever runs. This is the first thing to check when an agent goes silent after a server was working.
Expiration and TTL
Expiring servers are enforced at multiple levels. The token store entries carry a TTL, so lookups fail immediately after expiry. A scheduled cleanup alarm also fires to delete the database record and both token store entries, ensuring no stale data remains. The expires_at must be at least 60 seconds in the future at creation time.
You can update expiration with a PATCH request - set a new future datetime to extend, or set null to make the server permanent.
Anatomy of a Full JSON-RPC Tool Call
Seeing the complete request-response cycle - including pagination and error states - saves time when you are staring at agent logs trying to figure out why a tool call silently produced no results.
Successful List Call with Pagination
Request:
{
"jsonrpc": "2.0",
"id": 42,
"method": "tools/call",
"params": {
"name": "list_all_hub_spot_contacts",
"arguments": {
"limit": "10"
}
}
}Response:
{
"jsonrpc": "2.0",
"id": 42,
"result": {
"content": [{
"type": "text",
"text": "{\"result\": [{\"id\": \"501\", \"first_name\": \"Alice\"}, {\"id\": \"502\", \"first_name\": \"Bob\"}], \"next_cursor\": \"eyJhZnRlciI6NTAyfQ==\", \"request_id\": \"abc-123\"}"
}]
}
}The next_cursor value must be passed back to the next call byte-for-byte. The tool schema explicitly instructs the LLM not to decode or modify it. If your agent is silently producing empty second pages, check whether the LLM is URL-decoding or base64-decoding the cursor.
Error Response with isError: true
When the upstream API fails, a well-built MCP server returns the error inside a successful JSON-RPC response with the isError flag:
{
"jsonrpc": "2.0",
"id": 42,
"result": {
"content": [{
"type": "text",
"text": "{\"error\": \"INVALID_TYPE\", \"message\": \"Object type 'CustomWidget__c' not found. Check the resource name.\"}"
}],
"isError": true
}
}This lets the LLM read the error text and adjust its next tool call. If you instead throw a hard JSON-RPC error at the transport layer, most clients will give up rather than retry with corrected arguments.
Diagnosing Remote Transport Errors and Rate Limits
Once your server is reachable over HTTPS, a new class of bugs appears. Streamable HTTP is stateful enough to break in interesting ways but stateless enough that you cannot rely on sticky sessions.
The transport checklist when an agent goes quiet:
- Idle timeouts: If your server sits behind Cloudflare or AWS API Gateway, those proxies enforce strict timeout limits (often 30 to 60 seconds). If a complex third-party API search takes 45 seconds, the gateway kills the connection. The MCP client receives a 502 Bad Gateway or 504 Gateway Timeout. Send keep-alive events, or design the tool to return a job ID. The long-running task pattern covers this in depth.
- Buffering: CDNs and reverse proxies sometimes buffer SSE responses, defeating streaming entirely. Set
X-Accel-Buffering: nofor Nginx and disable buffering at the CDN. - CORS: Browser-based MCP clients need
Access-Control-Allow-Origin, plusMcp-Session-Id,Mcp-Protocol-Version, andLast-Event-IDinAccess-Control-Expose-Headers. - Session affinity: If your server stores per-session state, you need sticky routing. Stateless HTTP POST architectures sidestep this entirely.
Rate Limits Are the Caller's Problem
This catches teams off guard constantly. When an AI agent is given a task like "analyze all support tickets from the last year," it will aggressively call the list_tickets tool in a tight loop, paging through cursors as fast as the network allows. It will hit the third-party API's rate limit almost immediately.
When a third-party SaaS API returns an HTTP 429 Too Many Requests, a well-designed MCP server passes that error straight through to the caller with standardized headers. It does not silently retry, sleep, and pretend success.
At platforms like Truto, we pass upstream 429s through to the caller untouched and normalize the upstream rate limit info into IETF-standard headers:
HTTP/1.1 429 Too Many Requests
ratelimit-limit: 100
ratelimit-remaining: 0
ratelimit-reset: 42The caller—meaning your AI agent, your client SDK, or your custom orchestrator—is responsible for inspecting these headers, applying exponential backoff, and adding jitter.
sequenceDiagram
participant LLM as AI Agent
participant MCP as MCP Server
participant API as Third-Party SaaS
LLM->>MCP: tools/call (list_records)<br>Cursor: page_4
MCP->>API: GET /v1/records?page=4
API-->>MCP: HTTP 429 Too Many Requests<br>Retry-After: 60
MCP-->>LLM: JSON-RPC Error<br>Status: 429<br>Headers: ratelimit-reset
Note over LLM: Agent pauses execution<br>Applies exponential backoff
LLM->>MCP: tools/call (list_records)<br>Cursor: page_4An MCP server that swallows 429s with internal retries by holding the connection open breaks the agent's ability to reason about cost, latency, and quota budgets. It also guarantees you will hit the gateway timeout limits mentioned earlier. For the deeper treatment of this pattern across many upstreams at once, see our guide on handling rate limits and retries across third-party APIs.
Agent-Side 429 Handling: Sample Implementation
Your agent or orchestrator needs to handle rate limits explicitly. Here is a minimal Python implementation using exponential backoff with jitter that reads the standardized ratelimit-reset header:
import httpx
import time
import random
def call_mcp_tool(mcp_url: str, tool_name: str, arguments: dict,
max_retries: int = 5) -> dict:
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {"name": tool_name, "arguments": arguments}
}
for attempt in range(max_retries):
resp = httpx.post(mcp_url, json=payload, timeout=30)
if resp.status_code == 429:
# Prefer the server's reset hint over blind backoff
reset_seconds = int(resp.headers.get("ratelimit-reset", 60))
jitter = random.uniform(0, min(reset_seconds * 0.1, 5))
wait = reset_seconds + jitter
print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
time.sleep(wait)
continue
result = resp.json()
# Check for application-level errors
content = result.get("result", {}).get("content", [{}])
if result.get("result", {}).get("isError"):
print(f"Tool error: {content[0].get('text', 'unknown')}")
return result # Let the agent reason about the error
return result
raise TimeoutError(f"Exhausted {max_retries} retries for {tool_name}")The key principle: use the ratelimit-reset header as the wait floor, then add jitter to avoid thundering-herd retries when multiple agents hit the same upstream. If the header is missing, fall back to exponential backoff starting at 60 seconds.
Normalizing Third-Party API Errors in MCP Responses
Error normalization is the process of translating inconsistent, proprietary error payloads from third-party APIs into a standardized format.
Third-party APIs return errors in roughly 404 different shapes. Slack will return an HTTP 200 OK with {"ok": false, "error": "channel_not_found"} in the body. Salesforce returns an HTTP 400 with a nested array: [{"errorCode": "INVALID_TYPE", "message": "..."}]. NetSuite stuffs structured errors into a free-form string. Older enterprise systems might return an HTTP 500 with a raw HTML stack trace.
If your MCP server blindly passes these raw shapes back as tool results, the LLM will either treat the call as successful (because the HTTP status was 200), try to parse HTML as JSON and crash, or hallucinate an explanation for a malformed error blob. The LLM needs a clear, concise string explaining exactly why the tool failed so it can adjust its parameters and try again.
The fix is to normalize errors at the MCP layer before they reach the model. Two patterns work in production:
1. Use isError: true in the tool result.
The MCP specification allows you to return a structured error inside a successful JSON-RPC response. This signals to the AI client that the tool executed, but the outcome was a failure. This allows the LLM to read the text content, realize it provided a malformed argument, and autonomously issue a new tool call with the corrected format.
{
"jsonrpc": "2.0",
"id": 42,
"result": {
"content": [{
"type": "text",
"text": "{\"error\": \"channel_not_found\", \"hint\": \"The channel may have been archived.\"}"
}],
"isError": true
}
}If you just throw a hard HTTP 500 error at the transport layer, the LLM cannot recover.
2. Use expression-based error extraction.
Hardcoding error parsers per integration does not scale. At Truto, we use JSONata expressions at the integration or per-method level to map any provider error shape into a clean status code, message, and metadata block. A Slack 200-with-error becomes a proper 400. A nested Salesforce errors [0].message becomes the top-level message.
For a tour of the worst offenders and how to tame them, see our piece on 404 reasons third-party APIs can't get their errors straight.
A second massive benefit of normalization: when a normalized 401 Unauthorized propagates up, your platform can flag the integrated account as needs_reauth, fire a webhook, and surface a clear "reconnect Slack" prompt in your UI, rather than letting the agent loop endlessly on a dead token.
How Managed MCP Platforms Compare: StackOne vs Composio vs Truto
The honest read on debugging production MCP servers is that most of the pain is not in the JSON-RPC protocol itself. It is in everything around it: OAuth refresh storms, upstream schema drift, malformed errors, transport quirks, and per-tenant credential isolation.
You can absolutely build this yourself. Around 30% of MCP builders already route traffic through API gateways to handle scaling and security. However, debugging custom MCP servers is a massive drain on engineering resources. You end up spending more time writing error extraction regexes, handling token refresh edge cases, and debugging SSE connection drops than you do actually building AI features.
This is why engineering teams increasingly evaluate managed platforms. If you are comparing StackOne, Composio, and Truto as MCP server platforms for AI agents, the differences come down to where each platform draws the boundary between what it handles silently and what it exposes to your agent.
Architectural Comparison
| Dimension | StackOne | Composio | Truto |
|---|---|---|---|
| Integration count | 280+ connectors | 500-850+ apps | 200+ integrations (unified + proxy) |
| MCP approach | MCP gateway - single endpoint for all integrations | Rube universal server + Tool Router for dynamic discovery | Per-account MCP servers with cryptographic token URLs |
| Tool generation | Pre-built per-connector actions | Pre-built action library with SDK bindings | Dynamic - generated from integration resource definitions and documentation |
| Rate limit handling | Absorbs 429s internally with automatic retries | Absorbs 429s with built-in retry logic | Passes 429s through with IETF-standard ratelimit-* headers |
| Auth model | Connect Session with per-user OAuth | SDK-managed OAuth + API key enforcement | Per-tenant tokens with HMAC hashing; optional API token auth layer |
| Server scoping | Per-account filtering via dashboard | Tool Router selects relevant tools per prompt | Method filters (read/write/custom), tag filters, and TTL expiry per server |
| Data retention | Zero storage by default | Encryption at rest (credential vault) | Stateless pass-through - no data at rest |
| Open source | Defender (prompt injection) is open source | Core SDK is MIT-licensed (27k+ GitHub stars) | Closed source, cloud-hosted |
| Extra capabilities | Prompt injection defense (Defender), A2A protocol support | MCP Gateway for enterprise RBAC, on-premises deployment option | Unified API + Proxy API dual layer, JSONata-based error normalization |
StackOne: Full Abstraction
StackOne runs an execution engine that absorbs all network complexity on your behalf. It retries failed requests, queues rate-limited calls, and scans responses for prompt injection before they reach the LLM. The agent sends a request and gets a clean result - it never sees the retries or the backoff logic. This is ideal for teams that want zero infrastructure concern and can tolerate occasional opaque latency spikes when the platform is silently retrying behind the scenes. The trade-off: your agent cannot reason about upstream API state because it never receives rate limit signals.
Composio: Framework Breadth
Composio is optimized for fast prototyping and framework compatibility. With native SDKs for LangChain, CrewAI, OpenAI Agents SDK, Google ADK, and every major agent framework, it is typically the fastest path from zero to a working demo. The Tool Router dynamically selects relevant tools per prompt to keep context windows clean. The trade-off: the unified data model is secondary to the tool-calling interface. If you need normalized CRM or HRIS schemas across providers, you build that mapping yourself. Observability for production debugging is still maturing compared to enterprise-focused platforms.
Truto: Agent Control and Transparency
Truto takes a different approach by keeping the agent in the loop. Rate limit errors pass through with standardized headers so the agent or orchestrator decides when to retry. Tool generation is documentation-driven - if a resource method has a description and a JSON Schema, it becomes an MCP tool automatically, which eliminates schema mismatch errors between what the LLM sees and what the upstream API expects. Per-tenant servers with cryptographic tokens, method/tag filtering, and TTL expiry give fine-grained access control. The trade-off: fewer raw integration actions than Composio's catalog count, and the transparent rate-limit model requires your agent code to handle 429s explicitly (as shown in the sample code above).
For the full side-by-side comparison with pricing, security analysis, and prototype evaluation rubric, see our dedicated StackOne vs Composio vs Truto breakdown. Our buyer's guide to MCP server platforms covers additional platforms including Arcade.dev.
What a Managed Platform Changes for Debugging
Regardless of which platform you choose, the debugging surface area shrinks dramatically:
- Stateless Transport: Instead of dealing with stateful SSE connections, managed platforms handle JSON-RPC 2.0 traffic over standard stateless HTTP POST endpoints. This completely eliminates the load balancing, buffering, and connection timeout headaches associated with remote deployments.
- Dynamic Tool Generation: Tool generation is documentation-driven. Rather than hand-coding tool definitions for every single integration, the platform derives them directly from the integration's resource definitions. If a resource method has a description and a schema, it becomes an MCP tool. This guarantees the JSON schema exposed to the LLM perfectly matches the payload expected by the upstream proxy, eliminating schema mismatch errors entirely.
- Automated Authentication: The platform refreshes OAuth tokens ahead of expiry, automatically injects the correct bearer token based on the cryptographic session ID, and maps the flat MCP input namespace into the correct query and body parameters.
If an authentication failure does occur, the platform's error expressions catch it, mark the account for re-authentication, and return a clean error to the LLM. You stop debugging transport layers and start shipping agentic workflows.
Production Testing Checklist: OAuth, TTL, Rate Limits, and Webhooks
Before you ship an MCP server to production, run these seven tests. Each one catches a specific class of failure that will not surface during development but will absolutely surface at 2 AM when your largest customer's agent stops working.
1. OAuth Token Rotation
Force-expire an OAuth token while a conversation is active. The platform should refresh the token transparently and the tool call should succeed on retry. If the tool returns a raw 401 instead of triggering a token refresh, your auth lifecycle has a gap.
2. TTL Server Expiry
Create an MCP server with a short expires_at (e.g., 5 minutes from now). Confirm that:
- Tool calls succeed before expiry.
- Tool calls return
401 Unauthorizedafter expiry. - The token store entries are cleaned up (no stale data).
3. Rate Limit Passthrough
Trigger an HTTP 429 from the upstream API (many sandbox environments have low rate limits for this purpose). Verify:
- The MCP response includes
ratelimit-resetin the headers. - Your agent code waits the specified duration and retries.
- The retry succeeds.
4. Schema Validation
Send a tools/call with deliberately wrong argument types - a string where the schema expects an integer, a missing required field, a null ID. Verify the response contains a useful error message with isError: true, not a raw stack trace.
5. Pagination Cursor Round-Trip
Call a list tool, extract the next_cursor from the response, and feed it back unchanged in the next call. Verify page 2 returns fresh results. Then intentionally mangle the cursor (decode it, trim it, re-encode it) and verify the server returns a clear error rather than an empty result set.
6. Auth Header Propagation
If your MCP server uses additional API token authentication, confirm the Authorization header survives your entire infrastructure stack: reverse proxy, CDN, WAF, and load balancer. Strip-and-forward bugs in reverse proxies are the single most common cause of "it works locally but fails in production."
7. Error Normalization
Trigger a known error from each upstream provider you support. For example:
- A Slack 200-with-error (
{"ok": false, "error": "channel_not_found"}). - A Salesforce 400 with nested error array.
- An expired OAuth token that returns a 200 with
{"error": "invalid_grant"}.
Verify each one produces a normalized, structured error with isError: true in the MCP response, not raw provider noise.
Where to Go From Here
A production-ready MCP debugging workflow looks roughly like this:
- Build and test locally over STDIO with the MCP Inspector. Lock the tool schemas, test edge cases, and verify error contracts before any LLM touches them.
- Promote to remote Streamable HTTP behind your reverse proxy. Re-run the Inspector against the live URL to confirm CORS, auth header propagation, and transport behaviors survive the network.
- Instrument the JSON-RPC layer. Log every
tools/callinvocation with arguments, latency, upstream HTTP status, and normalized error code. AI clients will not surface these failures for you. - Push retry and backoff to the caller. Standardize on IETF rate limit headers and let the agent or client SDK handle HTTP 429s with exponential backoff and jitter.
- Normalize upstream errors with expression-based extraction so the LLM sees a clean, structured failure signal with
isError: true—never raw provider noise.
If you would rather spend your engineering cycles on the agent logic instead of the transport plumbing, we are happy to walk through how Truto generates and operates MCP servers for hundreds of SaaS integrations out of the box.
FAQ
- How do StackOne, Composio, and Truto compare as MCP server platforms for AI agents?
- StackOne absorbs all network complexity (rate limits, retries) internally, giving agents clean results but hiding upstream state. Composio provides the broadest integration catalog (500-850+ apps) with fast framework-native SDKs, optimized for prototyping speed. Truto passes rate limit errors through with IETF-standard headers and dynamically generates MCP tools from integration documentation, giving agents full context to reason about failures. Choose StackOne for simplicity, Composio for breadth, Truto for agent control.
- How do I register a remote MCP server with Claude Desktop or ChatGPT?
- In Claude: go to Settings, then Connectors, then Add custom connector, and paste your MCP server URL. In ChatGPT: go to Settings, then Apps, then Advanced settings, enable Developer mode, and add your server URL under MCP servers. The URL alone handles authentication - no extra configuration needed if the token is embedded in the URL path.
- How should an AI agent handle HTTP 429 rate limit errors from MCP servers?
- Read the ratelimit-reset header from the response to determine how long to wait, add random jitter to avoid thundering-herd retries, and retry the same tool call after the wait period. Never have your MCP server silently retry internally - this hides upstream API state from the agent and causes unpredictable latency spikes.
- How do MCP server tokens work for authentication?
- The platform generates a random hex token at server creation time, HMAC-hashes it before storage, and returns the raw token once in the API response. On each request, the raw token from the URL is hashed and looked up in the token store. If the server has an expires_at, the token store entries carry a TTL and a cleanup alarm deletes stale records automatically.
- What should I test before shipping an MCP server to production?
- Run seven tests: OAuth token rotation mid-conversation, TTL server expiry behavior, rate limit passthrough with ratelimit-reset headers, schema validation with wrong argument types, pagination cursor round-trip integrity, auth header propagation through your infrastructure stack, and error normalization for each upstream provider's error format.