How to Build a Comprehensive Developer API Reference with Runnable Examples
A senior engineering playbook for building interactive API documentation that reduces TTFC, handles rate limits, and ships MCP-layer PII redaction with token vault patterns for AI agent data governance.
When enterprise procurement teams evaluate your B2B SaaS product, the real decision-maker is rarely the person holding the budget. The true buyer is a lead architect or staff engineer who evaluates your platform by opening your documentation, finding a code snippet, and attempting to run it. If you sell B2B SaaS with a public API, the highest-leverage asset your product team can ship is a comprehensive developer API reference with runnable examples—one where an evaluating engineer can paste a snippet, hit run, and get a real 200 OK from a real provider in under five minutes. Everything else (feature matrix, pricing page, sales decks) is downstream of that single experience.
Developers evaluate APIs based on friction. Static Markdown reference pages and Swagger dumps no longer cut it. If your step-by-step developer tutorial requires them to spend three hours reverse-engineering undocumented payloads, guessing OAuth scopes, or writing custom retry logic from scratch, your product fails the technical evaluation. Modern documentation has to ship interactive consoles, copy-pasteable snippets for multiple languages, authentication that actually works on the first try, and machine-readable formats that AI agents can consume.
Building on our framework for publishing end-to-end developer tutorials, this guide is for senior PMs and DevRel leaders who are tired of "write better docs" platitudes. We will cover the structural anatomy of an effective reference, the painful engineering realities of runnable code samples (OAuth, rate limits, idempotency), the architectural shift that lets you scale examples across dozens of integrations without rewriting them per provider, and how to prepare your reference for the AI agents that are increasingly the primary consumer of your API.
Why Time to First Call (TTFC) Is Your Most Important API Metric
Static documentation is dead. Modern developers expect to interact with your API directly from the browser, using their own credentials, against real endpoints. The metric that governs this entire experience is Time to First Call (TTFC).
TTFC measures the elapsed time from a developer landing on your docs (or signing up for your service) to executing their first successful, authenticated API request that returns a non-error response. It is the single most important metric for developer conversion and the strongest predictor of whether an evaluating engineer becomes a paying customer.
A high TTFC leads to massive drop-off rates during developer onboarding. Industry surveys show that the early-stage quit rate for developers is between 50% and 70% when they encounter friction in API documentation. If your API reference is just a list of endpoints without context, developers will simply close the tab and evaluate your competitor.
Conversely, optimizing TTFC generates massive returns. Postman ran an experiment across multiple API publishers and found that developers were 1.7 times faster making their first call when using a collection provided by the API publisher, with some publishers showing developers making a successful call 1.7 to 56 times faster when using a forked collection. In one case study, PayPal reduced their time to first call from 60 minutes to one minute by shipping a ready-to-run collection.
Those are not vanity numbers. They are conversion numbers. Postman's data director has explicitly argued that if you are not investing in TTFC as your most important API metric, you are limiting the size of your potential developer base throughout your remaining adoption funnel. Every minute you shave off TTFC compounds across every developer who ever touches your docs.
A warning from the same source that most teams ignore: be careful of artificially hacking TTFC, perhaps by hiding away the tricky parts or ignoring the gotchas, as you may be shifting the friction to the implementation stage. A runnable example that papers over OAuth scope errors or rate limits will simply move the abandonment from minute five to minute fifty. Your reference must handle the hard parts honestly.
Measure TTFC objectively. Instrument the time between signup and the first request that returns a 2xx response from your API. Then track it weekly. If it goes up after a docs change, roll back.
The Anatomy of a Comprehensive Developer API Reference
Building a high-converting API reference requires moving beyond generic, auto-generated documentation. A reference that converts has five non-negotiable layers. Skip any one of them and your TTFC bloats.
1. Unified Schemas and Predictable Data Models
Developers hate surprises. Every endpoint that touches a Contact, Invoice, or Ticket should reference the same typed schema with explicit enums, required fields, and example payloads. There should be no drift between resources.
If your platform integrates with 50 different CRMs, your documentation cannot force the developer to learn 50 different data models. Your API reference must present a single, canonical JSON Schema for each resource type. For example, a unified Contact model should look identical whether the underlying data came from Salesforce, HubSpot, or Pipedrive.
2. Interactive "Try It" Consoles
An interactive console allows developers to inject their API keys directly into the documentation UI, fill in a path parameter, and execute a live request against a sandbox or live environment. This feature single-handedly slashes TTFC by removing the immediate need to configure a local development environment, though providing a runnable sample repo for headless vs iframe integrations is still necessary for deeper architectural evaluations.
sequenceDiagram
participant Dev as Developer
participant Docs as API Reference UI
participant Proxy as API Gateway
participant Upstream as Third-Party API
Dev->>Docs: Inputs API Key & Request Body
Docs->>Proxy: POST /unified/crm/contacts
Proxy->>Upstream: Maps to native format & executes
Upstream-->>Proxy: Returns HTTP 201 Created
Proxy-->>Docs: Normalizes to Unified Schema
Docs-->>Dev: Renders JSON Response in UI3. Copy-Pasteable Code Snippets for Multiple Languages
While a "Try It" console is great for immediate validation, developers ultimately need to write code. Your reference must provide runnable snippets in the languages your customers actually use (cURL, Node.js, Python, Go, Ruby). These snippets must be generated from the spec, never hand-edited, and they must be complete—including import statements, client initialization, error handling, and response parsing.
4. Honest Error Tables
Every endpoint must list the actual error codes it can return, what they mean, and what the caller should do (retry, refresh token, give up, escalate). Burying errors in a global page doesn't help a developer debugging a specific POST request.
5. Markdown Twins for Machine Readability
Every HTML page in your API reference should have a .md sibling. This is a raw version of the page that an LLM or agent can fetch and reason over without HTML parsing overhead.
What Separates Good Reference Docs from Great Ones
| Capability | Static Swagger Dump | Comprehensive Reference |
|---|---|---|
| Code samples | One language, hand-written | Multi-language, generated from spec |
| Authentication | "See OAuth section" | One-click token injection in console |
| Error responses | Listed once at top | Per-endpoint with remediation steps |
| Pagination | Inconsistent across endpoints | Single normalized pattern |
| Machine readability | OpenAPI only | OpenAPI + Markdown twins + llms.txt |
| Try It | Iframe to a generic explorer | Live calls with the user's real account |
How to Build Runnable API Examples That Actually Work
The difference between a theoretical code snippet and a truly runnable example lies in how you handle edge cases. Real-world software engineering is messy. Vendor API docs are often terrible, edge cases go undocumented, and rate limits are aggressively enforced.
When you publish developer API recipes, your code must acknowledge and handle these realities. This is where most teams fail. They publish snippets that work for a curated happy-path example, then break the moment a developer swaps in their own credentials.
Handling Authentication Without Hand-Holding
Authentication is the number one cause of TTFC failure. Most runnable examples die on line one because the SDK install instructions are wrong, the OAuth scope is missing, or the example uses a static API key in a flow that actually needs a refreshable token. Make the authentication boilerplate explicit in every snippet.
Your API examples should default to using long-lived API keys or pre-provisioned sandbox tokens for initial testing. A good runnable example uses an environment variable named exactly what your dashboard calls it, and the snippet should fail loudly with a useful message if the variable is unset.
export TRUTO_API_KEY="sk_live_..."
export TRUTO_INTEGRATED_ACCOUNT_ID="ia_..."
curl -H "x-merge-account-token: $TRUTO_INTEGRATED_ACCOUNT_ID" \
-H "Authorization: Bearer $TRUTO_API_KEY" \
"https://api.truto.one/unified/crm/contacts?limit=10"For OAuth flows, document the exact scopes per endpoint. "Salesforce requires api refresh_token" is not enough—the developer needs to know that listing opportunities also requires field-level read on Opportunity.Amount, which their sandbox admin probably didn't grant. Reference docs that ship with explicit scope tables save hours per developer.
Standardizing Rate Limits and Retries
Rate limit handling is where most published code samples lie. They show a single happy-path call and skip what happens on the 100th request when the provider returns HTTP 429 Too Many Requests.
Every third-party API handles rate limits differently. Some use standard headers, some put rate limit data in the JSON body, and others simply drop connections. If you are using a unified API layer, do not assume the platform absorbs 429s for you. Truto, for example, deliberately passes 429s straight through to the caller and normalizes the upstream rate limit signals into standardized headers per the IETF specification:
ratelimit-limit: The maximum number of requests permitted in the current window.ratelimit-remaining: The number of requests remaining in the current window.ratelimit-reset: The time at which the current rate limit window resets.
The caller's script is responsible for retry and backoff. That trade-off is intentional—it gives you precise control over retry budgets—but it means your reference docs need to ship a real backoff snippet, not a TODO comment.
Here is an example of a production-ready TypeScript wrapper that you should include in your API reference to show developers how to handle API rate limits:
async function fetchWithRetry(url: string, options: RequestInit, maxRetries = 3): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status === 429) {
const resetHeader = response.headers.get('ratelimit-reset');
// Calculate wait time based on the normalized header, default to exponential backoff
let waitTimeMs = Math.pow(2, attempt) * 1000;
if (resetHeader) {
const resetTime = parseInt(resetHeader, 10);
// If header is a Unix timestamp
if (resetTime > 1000000000) {
waitTimeMs = Math.max(0, (resetTime * 1000) - Date.now());
} else {
// If header is delta seconds
waitTimeMs = resetTime * 1000;
}
}
console.warn(`Rate limited. Retrying in ${waitTimeMs}ms...`);
await new Promise(resolve => setTimeout(resolve, waitTimeMs));
continue;
}
if (!response.ok) {
throw new Error(`HTTP Error: ${response.status}`);
}
return response;
}
throw new Error('Max retries exceeded after HTTP 429 responses');
}If your developers prefer Python, providing a robust requests loop is equally important:
import time, requests
def call_with_backoff(url, headers, max_retries=5):
for attempt in range(max_retries):
r = requests.get(url, headers=headers)
if r.status_code == 429:
reset = int(r.headers.get("ratelimit-reset", 2 ** attempt))
time.sleep(min(reset, 60))
continue
r.raise_for_status()
return r.json()
raise RuntimeError("Exceeded retry budget")By providing these exact snippets, you eliminate the friction of developers having to guess how to handle 429s from your API.
Idempotency and Writes
Read examples are easy. Write examples are where reputations are made or destroyed. Every runnable POST or PATCH example should include an idempotency key, a clear rollback story, and a sandbox account that the developer can hit without polluting production data. If you cannot offer a sandbox, document exactly what the example will create and how to delete it afterward.
Scaling Runnable Examples Across Dozens of Integrations
Here is the brutal math that breaks most documentation teams: if you publish a CRM integration, you do not have one API to document. You have Salesforce, HubSpot, Pipedrive, Zoho, Close, and 30 others. Each has different field names, pagination styles, OAuth quirks, and rate limits. Writing 30 separate "create a contact" examples is a maintenance treadmill that ends with stale docs and massive technical debt.
The architectural answer is a unified API layer with a single canonical schema per category. You document one endpoint—POST /unified/crm/contacts—and every provider behind it accepts the same payload. Your runnable example works against any of the 30 CRMs.
Under the hood, this works because integration behavior is defined declaratively, not as separate code paths per provider. At Truto, the same generic execution engine that handles a HubSpot contact list also handles Salesforce, Pipedrive, and Zoho. Integration-specific behavior lives entirely as data: JSON configuration blobs and JSONata mapping expressions, not as if (provider === 'hubspot') branches.
flowchart LR
A[Developer] -->|One snippet| B[Unified API]
B --> C[Generic execution engine]
C --> D[Declarative mapping<br>per provider]
D --> E[HubSpot]
D --> F[Salesforce]
D --> G[Pipedrive]
D --> H[30+ more CRMs]Because of this architecture, you do not need to write integration-specific documentation. You document the unified schema once. The same code path, the same HTTP request, and the same runnable example handle every supported platform.
The trade-off most teams gloss over: a unified schema is only as expressive as its lowest common denominator. Custom fields, provider-specific objects, and edge-case attributes still need an escape hatch. The honest reference page documents both—the unified path for 80% of use cases and the passthrough or proxy API for the 20% that need native fidelity.
Preparing Your API Reference for AI Agents and LLMs
API strategy is rapidly shifting. This is the shift most documentation teams are sleeping through. Postman's 2025 State of the API Report found that the Model Context Protocol (MCP) is emerging as the connective layer between AI agents and APIs for machines to discover, understand, and invoke APIs, with 70% of developers aware of MCP but only 10% using it regularly.
The implication: documentation that is readable only by humans is becoming a competitive liability. In the near future, the "developer" evaluating your API will not be a human reading a web page—it will be an autonomous agent attempting to build an integration on behalf of a user.
The report is direct about what changes: as AI agents become primary API consumers, the APIs designed with machine-readable schemas, predictable patterns, and comprehensive documentation will integrate faster and more reliably than those built only for human consumption. The headline statistic from the survey is stark: nearly one in four developers (24.3%) are already designing APIs with AI agents in mind, a fundamental shift that signals the rise of machine-consumable APIs.
If your documentation relies heavily on client-side React rendering, complex CSS hiding, or iframe-based Swagger embeds, AI agents will fail to read it, hallucinate endpoints, and crash. To solve this, you must ship four concrete things to specifically prepare your reference for machine consumption.
1. The Docs MCP Server
The Model Context Protocol (MCP) is an open standard that allows AI models to securely interact with external tools and data sources. Exposing your API documentation via an MCP server allows an AI assistant (like Claude, Cursor, or ChatGPT) to dynamically search your reference, read endpoint schemas, and generate perfectly formatted code without scraping HTML.
By exposing your reference as discoverable tools, you stop AI hallucinations in API integrations and ensure that when an agent attempts to write a runnable example, the code actually compiles.
2. The llms.txt Index
Place a llms.txt index at your docs root that lists every page with a short description. Frontier model agents look here first to understand the layout of your documentation.
3. The llms-full.txt Dump
Generate a llms-full.txt file, which is a concatenated, plaintext dump of your core documentation pages and API group overviews, separated by standard dividers. It provides a single, high-density context file for offline ingestion. Skip the verbose method-level pages so you stay under standard context windows while giving the LLM the overall architecture of your platform.
4. Per-Page Markdown Twins
Every HTML page in your API reference should have a .md sibling. For example, if your HTML guide lives at /docs/crm/contacts/list, serve the raw Markdown at /docs/crm/contacts/list.md.
When generating these Markdown twins, your build system should automatically strip internal YAML frontmatter, ensure standard Markdown tables are used for parameters, and prepend a source URL (> Source: https://api.yoursite.com/...) so that when a human developer pastes the text into an LLM, the model has a canonical reference point.
These .md twins should be served with an X-Robots-Tag: noindex HTTP header to prevent traditional search engines from flagging them as duplicate content, while still leaving them fully accessible to AI web scrapers.
Hallucinations are a documentation problem. Do not assume LLMs will figure out your docs from HTML alone. Without explicit Markdown twins and an MCP entrypoint, agents fall back to scraping—which produces hallucinated parameters, invented enum values, and fabricated endpoints.
How MCP Servers Handle Data Retention and Security for AI Agents
Making your API reference machine-readable is only half the story. The moment an AI agent starts calling live APIs through an MCP server, every tool response carries real customer data - names, emails, employee IDs, deal amounts - straight into the LLM's context window. If your MCP server platform stores or caches those payloads, you have just created an uncontrolled copy of your customer's regulated data in someone else's infrastructure.
An AI agent might pull one employee record today, list all open deals tomorrow, and create a Jira ticket next week - all based on a conversation with a human user. If the MCP server platform stores every payload that passes through it, you are accumulating a growing, unpredictable dataset of your customers' most sensitive information.
This is the data governance gap that most teams building MCP integrations overlook. Using MCP servers introduces significant risks because the resources they provide access to typically contain sensitive information, including personally identifiable information (PII), financial data that could facilitate fraud, and even proprietary or competitively valuable information. Every MCP-connected AI agent is a non-human identity that most governance programs aren't built to handle. These identities accumulate over time, retain access after projects end or employees leave, and don't fit neatly into traditional IAM models designed for users and service accounts.
The architectural answer is a zero data retention MCP server that operates as a stateless pass-through proxy. To build SOC 2 and GDPR compliant AI agents using MCP, you must adopt a zero data retention architecture. This means operating a stateless pass-through proxy that processes API payloads entirely in-memory, mapping schemas on the fly, and returning results directly to the LLM without writing a single byte of customer data to disk. But statelessness alone is not enough. The data still passes through the LLM's context window, and that window is the new attack surface.
Why PII Redaction Must Live in the MCP Server Layer
You might assume PII redaction belongs in the LLM client or a downstream filter. It does not. The MCP server is the only point in the architecture where you have full visibility into the structured payload before it enters the LLM context.
This requires input and output filtering at the MCP layer: redacting PII before sending to agents, masking sensitive fields in tool responses, or disabling tools entirely when they necessarily expose prohibited data.
Three reasons this has to happen at the MCP server, not elsewhere:
- The LLM context window is a one-way door. Once data enters the model's context, you cannot un-send it. The model may reference it in future turns, include it in completions sent to other tools, or (in hosted deployments) have it logged by the model provider. Redaction after the context window is too late.
- Structured payloads enable field-level precision. MCP tool responses are typed JSON objects with known schemas. You know that
emailis an email andssnis a social security number because the schema tells you. This is far more reliable than running regex over free-form text after the LLM has already seen the raw data. - The MCP server owns the authentication context. It knows which integrated account is calling, what scopes are active, and what the customer's data residency requirements are. This context is not available downstream.
flowchart LR
A[Third-Party API] -->|Raw payload| B[MCP Server]
B -->|Redaction layer| C[Redacted payload +<br>token map]
C -->|Clean JSON| D[LLM Context Window]
D -->|Write-back with tokens| B
B -->|De-tokenize +<br>forward to API| AYou can't expose any PII to AI agents per customer and regulatory obligations. Directly wiring agents to Jira, logs, or other systems is a non-starter. The solution is a small, transparent proxy that sits between agents and tools, redacting sensitive data on the fly.
JSONata Redaction Patterns for MCP Tool Responses
If your integration layer already uses JSONata for declarative schema mapping (as Truto does for normalizing provider-specific payloads), you can apply the same expression engine to PII redaction. JSONata's $replace function accepts regex patterns, making it well-suited for field-level masking directly within the mapping pipeline.
Here are the core patterns:
Email redaction - Replace the local part while preserving the domain for debugging:
$replace(email, /^[^@]+/, "[REDACTED]")Input: "jane.doe@acme.com" → Output: " [REDACTED]@acme.com"
Phone number masking - Keep the country code, mask the rest:
$replace(phone, /(\+?\d{1,3})[\d\s-]+/, "$1-XXX-XXXX")Input: "+1-555-867-5309" → Output: "+1-XXX-XXXX"
SSN / National ID full redaction:
$replace(ssn, /\d{3}-?\d{2}-?\d{4}/, "[SSN-REDACTED]")Name tokenization - Replace with a deterministic placeholder that preserves referential integrity across the response:
$replace(first_name, /^.*$/, "[PERSON_" & $string($hash(first_name)) & "]")This produces a stable token like [PERSON_4a8f2c] so the LLM can still reason about "this person" across multiple fields without seeing the real name.
Composing redaction into a full mapping expression:
In a declarative mapping layer, you apply these patterns inside the response transformation. A simplified example for a CRM contacts endpoint:
{
"id": id,
"first_name": $replace(first_name, /^.*$/, "[NAME_REDACTED]"),
"last_name": $replace(last_name, /^.*$/, "[NAME_REDACTED]"),
"email": $replace(email, /^[^@]+/, "[REDACTED]"),
"phone": $replace(phone, /(\+?\d{1,3})[\d\s-]+/, "$1-XXX-XXXX"),
"company": company,
"deal_stage": deal_stage,
"created_at": created_at
}Notice that company, deal_stage, and created_at pass through untouched. Good redaction policy is selective - you mask the fields that identify individuals while preserving the business context the LLM needs to do useful work.
Common SaaS Payload Redaction: CRM, HRIS, and Tickets
Different SaaS categories expose different PII risk profiles. Here is a practical field-level redaction policy for the three most common integration categories:
CRM payloads (Salesforce, HubSpot, Pipedrive):
| Field | Action | Rationale |
|---|---|---|
first_name, last_name |
Tokenize | LLM needs referential identity, not real names |
email |
Mask local part | Domain still useful for company identification |
phone |
Mask digits | Country code useful for locale detection |
deal_amount, deal_stage |
Pass through | Business context, not PII |
company_name |
Pass through | Typically public information |
notes, description |
Regex scan + mask | Free-text fields may contain embedded PII |
HRIS payloads (BambooHR, Workday, Personio):
| Field | Action | Rationale |
|---|---|---|
employee_name, personal_email |
Tokenize / mask | Regulated under GDPR and most privacy frameworks |
ssn, national_id |
Full redact | Never expose to LLM under any circumstance |
salary, compensation |
Full redact | Highly sensitive; rarely needed for agent tasks |
department, title, location |
Pass through | Organizational context, low PII risk |
date_of_birth |
Full redact | Directly identifiable |
emergency_contact |
Full redact | Third-party PII; higher regulatory exposure |
Ticketing payloads (Jira, Zendesk, ServiceNow):
| Field | Action | Rationale |
|---|---|---|
reporter_email, assignee_email |
Mask local part | Agent needs to distinguish reporters, not identify them |
ticket_body, comments |
Regex scan + mask | Customers often paste credentials, IPs, stack traces |
ticket_title |
Pass through | Usually safe; flag if it contains email patterns |
priority, status, labels |
Pass through | Workflow context |
attachments |
Block or skip | Binary content cannot be reliably scanned inline |
Free-text fields are the hardest. Fields like notes, ticket_body, and comments regularly contain embedded emails, phone numbers, and even passwords pasted by end users. A field-level policy is not enough for these - you need regex scanning over the string content itself.
The Short-Lived Token Vault Pattern for Write-Back Operations
Redaction creates a problem for write operations. If an AI agent reads a redacted contact list, reasons about it, and then needs to update a specific record, the agent only has [PERSON_4a8f2c] - not the real name. You need a way to reverse the redaction for the write-back without ever exposing the raw value to the LLM.
The solution is a short-lived, in-memory token vault that maps redaction tokens to original values. The vault lives in the MCP server's process memory, scoped to a single session, and expires automatically after a short TTL.
A session map storing [PERSON_1] to the real name is, by definition, a PII store - it needs the same controls as the data it protects: encryption at rest, a short TTL, strict session scoping, audit logging on every de-redaction, and authorization checks before any reverse lookup.
Here is a minimal TypeScript implementation:
import { randomBytes, createHash } from 'crypto';
interface VaultEntry {
original: string;
fieldType: string;
createdAt: number;
}
class RedactionVault {
private store = new Map<string, VaultEntry>();
private ttlMs: number;
private cleanupInterval: ReturnType<typeof setInterval>;
constructor(ttlSeconds = 300) {
this.ttlMs = ttlSeconds * 1000;
// Sweep expired entries every 30 seconds
this.cleanupInterval = setInterval(() => this.sweep(), 30_000);
}
/** Replace a PII value with a stable, reversible token */
tokenize(value: string, fieldType: string): string {
const hash = createHash('sha256')
.update(value + fieldType)
.digest('hex')
.slice(0, 8);
const token = `[${fieldType.toUpperCase()}_${hash}]`;
if (!this.store.has(token)) {
this.store.set(token, {
original: value,
fieldType,
createdAt: Date.now(),
});
}
return token;
}
/** Reverse a token back to the original value for write-back */
detokenize(token: string): string | null {
const entry = this.store.get(token);
if (!entry) return null;
if (Date.now() - entry.createdAt > this.ttlMs) {
this.store.delete(token);
return null;
}
return entry.original;
}
/** Remove all entries older than TTL */
private sweep(): void {
const now = Date.now();
for (const [token, entry] of this.store) {
if (now - entry.createdAt > this.ttlMs) {
this.store.delete(token);
}
}
}
/** Tear down the cleanup interval */
destroy(): void {
clearInterval(this.cleanupInterval);
this.store.clear();
}
}Using it in the MCP tool response pipeline:
// When handling a tools/call response for a "list contacts" tool:
const vault = getSessionVault(sessionId); // One vault per MCP session
function redactPayload(contacts: any[]): any[] {
return contacts.map(contact => ({
...contact,
first_name: vault.tokenize(contact.first_name, 'name'),
last_name: vault.tokenize(contact.last_name, 'name'),
email: vault.tokenize(contact.email, 'email'),
phone: vault.tokenize(contact.phone, 'phone'),
}));
}
// When handling a write-back (e.g., "update contact"):
function detokenizePayload(body: Record<string, any>): Record<string, any> {
const result: Record<string, any> = {};
for (const [key, value] of Object.entries(body)) {
if (typeof value === 'string' && value.startsWith('[') && value.endsWith(']')) {
result[key] = vault.detokenize(value) ?? value;
} else {
result[key] = value;
}
}
return result;
}The equivalent Python implementation:
import hashlib
import time
import threading
from typing import Optional
class RedactionVault:
def __init__(self, ttl_seconds: int = 300):
self._store: dict[str, dict] = {}
self._ttl = ttl_seconds
self._lock = threading.Lock()
def tokenize(self, value: str, field_type: str) -> str:
hash_val = hashlib.sha256(
(value + field_type).encode()
).hexdigest()[:8]
token = f"[{field_type.upper()}_{hash_val}]"
with self._lock:
if token not in self._store:
self._store[token] = {
"original": value,
"field_type": field_type,
"created_at": time.time(),
}
return token
def detokenize(self, token: str) -> Optional[str]:
with self._lock:
entry = self._store.get(token)
if not entry:
return None
if time.time() - entry["created_at"] > self._ttl:
del self._store[token]
return None
return entry["original"]
def sweep(self):
with self._lock:
now = time.time()
expired = [
k for k, v in self._store.items()
if now - v["created_at"] > self._ttl
]
for k in expired:
del self._store[k]Key design constraints for the token vault:
- TTL must be short. Five minutes is a reasonable default. If the LLM session lasts longer, the agent can re-read the data (which generates fresh tokens).
- Scope to a single session. Never share a vault across MCP sessions or users. Each MCP token URL should map to its own vault instance.
- No disk persistence. The vault lives entirely in process memory. If the process restarts, the vault is gone. This is a feature, not a bug - it guarantees zero data retention.
- Log detokenization events, not values. Record that a detokenization occurred (token ID, timestamp, calling tool) but never log the original value.
Testing and Verification Checklist for Redaction and Token Swap
Redaction code that ships without tests is worse than no redaction at all - it creates a false sense of compliance. Here is a concrete checklist for validating your implementation:
Redaction correctness:
- Every field in your redaction policy is covered by a unit test with realistic sample data
- Free-text fields (
notes,ticket_body,comments) are tested with embedded emails, phone numbers, and SSN patterns - Edge cases: empty strings,
nullvalues, unicode names (e.g.,José,Müller), and extremely long values do not crash the redaction logic - Redacted output contains zero raw PII when piped through a PII scanner (regex-based or NER-based)
- Fields marked "pass through" in the policy actually pass through unchanged
Token vault correctness:
- Tokenize the same value twice - confirm the same token is returned (deterministic hashing)
- Detokenize a valid token within TTL - confirm the original value is returned
- Detokenize after TTL expiry - confirm
nullis returned - Detokenize a token from a different session vault - confirm
nullis returned (no cross-session leakage) - Process restart clears all tokens (no disk persistence)
End-to-end flow:
- Call a read tool (e.g.,
list_all_hub_spot_contacts) and verify the response contains only redacted values - Take a redacted response, simulate an LLM deciding to update a record, pass the tokenized values to a write tool (e.g.,
update_a_hub_spot_contact_by_id), and verify the MCP server correctly detokenizes before forwarding to the upstream API - Confirm the upstream API receives the original, unredacted values in the write request
- Simulate a TTL expiry mid-conversation: read data, wait beyond the TTL, attempt a write-back, and verify the server returns a clear error (not a silent pass-through of token strings)
Troubleshooting and Monitoring Signals for Redaction
Once redaction is in production, you need observability into whether it is actually working. Silent failures - where PII leaks through because a new field was added to an upstream API response - are the most dangerous kind.
Signals to monitor:
| Signal | What It Tells You | Alert Threshold |
|---|---|---|
| Detokenization failure rate | Tokens expired or vault miss | > 5% of write-back attempts |
| Redaction pattern match count per response | Whether the regex patterns are actually firing | Drops to zero on an endpoint that previously had matches |
| New unrecognized fields in API responses | Upstream provider added fields your policy does not cover | Any new field containing string values |
| Vault size (entries per session) | Memory pressure and session scope correctness | > 10,000 entries in a single vault instance |
| Detokenization latency | Whether the vault lookup is becoming a bottleneck | p99 > 5ms |
Common failure modes:
- Upstream schema change. The provider adds a new field (e.g.,
personal_phone_2) that your redaction policy does not cover. Fix: run a schema diff on every integration update cycle and flag new string fields for review. - Nested objects. A CRM returns a
related_contactsarray inside the main contact object. Your top-level redaction misses the nested PII. Fix: apply redaction recursively, or flatten nested objects before the redaction pass. - Encoding mismatch. A provider returns HTML-encoded values (
jane.doe@acme.com) that bypass your email regex. Fix: decode HTML entities before running redaction patterns. - Token collision. Two different PII values produce the same short hash. This is unlikely with 8-character hex (4 billion possible values) but possible at scale. Fix: increase hash length or use full SHA-256 with a prefix.
- LLM leaking redacted tokens in user-facing output. The agent writes
[NAME_4a8f2c]in a customer-facing message. Fix: add a post-processing step on the LLM output that catches token patterns and replaces them with generic labels like "the contact."
Run a PII canary test weekly. Inject a synthetic payload with known PII patterns through your MCP server and verify every field comes out redacted. This catches regressions faster than waiting for a compliance audit.
Strategic Wrap-Up and Next Steps
Publishing a comprehensive developer API reference with runnable examples is not a documentation task—it is a core product growth strategy. Technical evaluators do not have the patience to debug your platform's idiosyncratic rate limits or decipher undocumented authentication flows. They want to paste a script, hit enter, and see a 200 OK.
A comprehensive developer API reference is the highest-ROI piece of content your team will ever ship. The discipline is straightforward, even if the execution is not:
- Measure TTFC. Pick a number this quarter. Beat it next quarter. Instrument the elapsed time between signup and the first 2xx response.
- Generate snippets from your OpenAPI spec, not by hand. Hand-written snippets drift the moment your API changes.
- Ship honest authentication and rate limit handling in every example. Provide complete code snippets that explicitly handle HTTP 429 errors using standardized headers like
ratelimit-reset. Hiding the gotchas just moves the abandonment downstream. - Standardize your data models so developers only have to learn one schema. If you ship more than five integrations, get serious about a unified schema and declarative provider mappings. Maintaining parallel reference trees is a tax that compounds.
- Treat AI agents as a first-class audience. Ship Markdown twins,
llms.txt, and MCP servers. The 24% of teams already doing this will out-ship the 76% who aren't. - Enforce data governance at the MCP layer. Redact PII before it enters the LLM context, use short-lived token vaults for write-back operations, and monitor for redaction failures. Zero data retention is table stakes - field-level redaction is the next bar enterprise buyers will expect.
If you are evaluating whether to build the unified layer that makes one set of runnable examples work across hundreds of providers, or buy it, that is a tractable conversation. Either path is defensible. Pretending you can hand-maintain integration-specific reference pages forever is not.
FAQ
- What is Time to First Call (TTFC) in API documentation?
- TTFC is the elapsed time from a developer signing up to executing their first successful, authenticated API request that returns a non-error response. Postman's experiments show developers make a successful call 1.7 to 56 times faster when given a runnable example, making TTFC the most direct predictor of API conversion.
- How should runnable API examples handle rate limits?
- Every runnable example that performs reads or writes should ship explicit retry-and-backoff logic for HTTP 429 responses. Examples should demonstrate how to read standardized rate limit headers (like ratelimit-reset) and implement exponential backoff rather than retrying blindly.
- How do you scale runnable examples across dozens of third-party integrations?
- Adopt a unified schema per category (CRM, HRIS, accounting) with a declarative mapping layer per provider. This lets you write one runnable example for a unified endpoint that works against Salesforce, HubSpot, Pipedrive, and 30 others without maintaining integration-specific code snippets.
- What is an API Markdown twin for AI agents?
- A Markdown twin is a raw .md version of an API reference page, stripped of internal YAML frontmatter and served with a canonical source URL, designed specifically to be parsed by LLMs and AI agents without the overhead of HTML scraping.
- Do AI agents actually consume API reference documentation?
- Yes. Postman's 2025 State of the API Report found that 70% of developers are aware of MCP (the Model Context Protocol) and 24.3% are already designing APIs with AI agents in mind. To support this, you should ship an llms.txt index, Markdown twins, and an MCP server.