PII Redaction for MCP: Stop Leaking SaaS Data to LLMs
Architectural patterns for redacting PII and standardizing ATS data from Greenhouse, Lever, and Workday before it reaches LLMs via MCP - with code examples, field-level decision matrices, and compliance checklists.
If you are wiring AI agents into Salesforce, Workday, or Jira through the Model Context Protocol, every tool call is a potential data breach.
Exposing your B2B SaaS application to AI models used to require building custom, point-to-point API connectors for every single LLM provider. The Model Context Protocol (MCP) changed that architecture entirely. By acting as a universal standard for tool calling, MCP collapses the N x M integration problem into a simple N + M hub-and-spoke model. You build one MCP server, and your product instantly works with Claude, ChatGPT, Cursor, and custom LangChain agents.
But this architectural shift introduces a massive security vulnerability. When an AI agent invokes an MCP tool like list_all_workday_employees, the default behavior of most integrations is to return the entire JSON payload from the upstream API. The MCP server pulls a record from the upstream SaaS API, hands the raw JSON to the LLM, and that payload now lives inside a third-party model provider's context window. Social Security numbers, unhashed passwords, salary data, customer emails, support ticket bodies with PHI—all of it.
PII redaction at the MCP boundary is the only architectural pattern that lets you ship AI agents without expanding your SOC 2 scope to include every model provider your customers prefer. This guide breaks down the architectural patterns for masking sensitive data before it reaches the LLM, the trade-offs between static redaction and context-aware tokenization, and why a zero data retention proxy layer is the only defensible way to handle third-party SaaS data.
The AI Agent Security Gap: Why PII Redaction is Mandatory for MCP
When AI agents use MCP to query SaaS APIs, they inadvertently pull raw Personally Identifiable Information (PII) into LLM context windows. Engineering teams often underestimate the blast radius of this data exposure until they fail an InfoSec audit.
The regulatory and security ground has shifted hard in the last twelve months. Fifty percent of all enterprise cybersecurity incident response efforts will focus on incidents involving custom-built AI-driven applications by 2028, according to Gartner. As Christopher Mixter, VP Analyst at Gartner, put it: "AI is evolving quickly, yet many tools - especially custom-built AI applications - are being deployed before they're fully tested. These systems are complex, dynamic and difficult to secure over time. Most security teams still lack clear processes for handling AI-related incidents."
The compliance side is just as unforgiving. Through 2027, manual AI compliance processes will expose 75% of regulated organizations to fines exceeding 5% of their global revenue. For a $200M ARR SaaS, that is a $10M ceiling on a single bad audit. The core issue is that these systems are highly dynamic; an agent might decide to pull an invoice from QuickBooks today and an employee record from BambooHR tomorrow.
The risk of LLM data ingestion is not theoretical. Researchers at Truffle Security trawled through the December 2024 Common Crawl archive, consisting of 400TB of web data gathered from 2.67 billion web pages, and found 11,908 live secrets using their open source secret scanner, TruffleHog. Similarly, security researchers at Lasso Security recently analyzed an open-source training dataset used for LLM development on Hugging Face and found nearly 12,000 live API keys, passwords, and credentials exposed in the clear.
If credentials leak that easily into LLM training data, expecting your customers' PII to stay inside a model's context window is wishful thinking. When you send customer data to an LLM provider, you lose control over where that data is cached, how it is logged, and whether it will be used to train future foundation models.
Once data hits a third-party LLM, you cannot un-send it. Provider data retention policies, log analysis pipelines, and abuse-monitoring queues all become sub-processors of your customers' regulated data. Redaction has to happen before the tool call returns to the model.
What Counts as Sensitive Data in a SaaS API Response
Most engineering teams underestimate the surface area of sensitive data. A typical GET /contacts response from HubSpot or Salesforce isn't just a name and email. It often contains a sprawling JSON tree of liabilities:
- Direct identifiers: email, phone, full name, SSN, tax ID, employee ID.
- Quasi-identifiers: date of birth, zip code, IP address, device ID (re-identifiable when combined).
- Financial: bank routing, card last-four, salary, compensation history (Workday, BambooHR).
- Health: anything in a Zendesk or Intercom ticket from a healthcare customer (instant PHI).
- Secrets: webhook URLs, API tokens, OAuth refresh tokens stored in custom fields.
- Free-text fields: notes, descriptions, comments—the worst offender, because regex misses unstructured PII.
A helpdesk integration is the canonical worst case. A user types their credit card into a support ticket. Your AI agent calls list_zendesk_tickets, the LLM ingests the body, and now Anthropic or OpenAI has logged a PCI violation in their abuse pipeline. If your architecture allows an LLM to request a resource and directly receive the raw SaaS API response, you are flying blind. You need an interception layer.
Standardizing ATS Responses for LLM Consumption
The PII problem gets significantly harder when you are pulling candidate data from multiple Applicant Tracking Systems. Greenhouse, Lever, Workday, and Ashby each expose their own schemas, field names, and nesting structures. An AI agent that needs to reason across candidates from different ATS platforms cannot do so if every API returns a different shape of JSON - and it definitely cannot do so safely if sensitive fields appear in unpredictable locations.
This is the ATS normalization problem, and it has a direct impact on your redaction architecture.
Why ATS Schema Divergence Breaks Redaction
Consider a simple example: fetching a candidate record. Greenhouse returns first_name and last_name as top-level fields on the candidate object. Lever uses an opportunity-based model where the candidate's name sits inside a nested contact object. Workday exposes candidate data through its SOAP and REST recruiting services with entirely different field paths and naming conventions.
If your redaction layer relies on hard-coded JSON paths like $.candidate.social_security_number, it will silently miss the same data in a Lever response where the path is $.opportunity.contact.ssn or a Workday response where it is nested three levels deep inside a custom field group. Schema-driven redaction rules break the moment you add a second ATS provider.
Mapping ATS Fields to a Common Model
The solution is to normalize ATS responses into a unified schema before redaction runs. A unified ATS data model maps the core recruiting entities - Candidates, Applications, Jobs, Departments, Offices, Interview Stages, Scorecards, Offers, Reject Reasons, EEOC data, Activities, Attachments, and Users - into a single, consistent schema regardless of the underlying provider.
Here is what a normalized candidate record looks like after passing through a unified ATS API:
{
"id": "cand_8832",
"first_name": "Priya",
"last_name": "Sharma",
"email_addresses": ["priya.sharma@example.com"],
"phone_numbers": ["+1-555-0192"],
"applications": [
{
"id": "app_2291",
"job_id": "job_445",
"status": "active",
"current_stage": "Technical Interview",
"applied_at": "2026-03-15T09:00:00Z"
}
],
"tags": ["senior", "referred"],
"custom_fields": {
"visa_status": "H-1B",
"expected_compensation": 185000
}
}This shape is identical whether the underlying source is Greenhouse, Lever, Workday, or Ashby. That consistency is what makes your redaction rules portable. You write one JSONata transform that targets $.email_addresses, $.phone_numbers, and $.custom_fields.expected_compensation, and it works across every ATS provider your customers use.
This is exactly how Truto's Unified ATS API works. The normalization layer maps provider-specific field names and nesting structures into a standardized schema covering the full recruiting data model. The same JSONata transforms that redact PII also run on top of this normalized output, so adding a new ATS provider does not require new redaction rules.
Field-Level PII Classification for ATS Data
Once your ATS responses are normalized, you can apply a consistent field-level classification matrix. Not every field needs the same treatment. The right strategy depends on the field's sensitivity tier and the LLM task the agent is performing.
| Unified Field | Sensitivity Tier | Regulation | Default Action | When to Loosen |
|---|---|---|---|---|
candidate.email_addresses |
PII | GDPR Art. 6 | Tokenize (<EMAIL_N>) |
Never for external LLMs |
candidate.phone_numbers |
PII | GDPR Art. 6 | Tokenize (<PHONE_N>) |
Never for external LLMs |
candidate.first_name |
PII | GDPR Art. 6 | Pass through or tokenize | Tokenize if cross-referencing with external data |
candidate.last_name |
PII | GDPR Art. 6 | Tokenize (<LAST_N>) |
Pass through for internal scheduling agents only |
application.status |
Safe | — | Pass through | — |
application.current_stage |
Safe | — | Pass through | — |
job.title, job.department |
Safe | — | Pass through | — |
scorecard.ratings |
Internal | SOC 2 | Pass through | — |
scorecard.comments |
Internal + PII risk | SOC 2, GDPR | NER scan + tokenize names | — |
custom_fields.visa_status |
Sensitive PII | GDPR Art. 9 | Drop or redact | Include only for compliance reporting agents |
custom_fields.expected_compensation |
Confidential | SOC 2 | Replace with band | Include exact value only for offer-approval agents |
eeoc.race, eeoc.gender |
Special Category | GDPR Art. 9, Title VII | Drop by default | Pass through only for anonymized aggregate reporting |
attachments (resumes) |
PII-dense | GDPR, HIPAA | Block from LLM context | Allow only parsed, redacted summary |
activities.body (notes) |
PII risk (free text) | GDPR | NER scan + tokenize | — |
reject_reasons |
Safe | — | Pass through | — |
The key principle: application status, job metadata, and pipeline stage information are almost always safe to pass through. Candidate contact details, compensation data, EEOC demographics, and free-text notes are almost never safe without transformation. Build your default policy around this split.
How PII Redaction Works in an MCP Architecture
To understand where to place your redaction logic, you need to understand the MCP JSON-RPC lifecycle. When an MCP client (like Claude Desktop) connects to an MCP server, all communication happens over HTTP POST using JSON-RPC 2.0 messages.
The pattern is simple in concept: insert a redaction step between the upstream SaaS response and the JSON-RPC tools/call reply that the MCP client receives. When the LLM decides to use a tool, it sends a tools/call request. The arguments arrive as a single flat object. The MCP server executes the API request against the third-party SaaS platform, receives the response, and wraps it in an MCP-compliant result object.
PII redaction must happen exactly between the moment the SaaS API returns the data and the moment the MCP server constructs the JSON-RPC result.
Here is the architectural flow for a secure MCP proxy layer:
sequenceDiagram
participant LLM as AI Agent (MCP Client)
participant Gateway as MCP Proxy Gateway
participant Redact as Redaction Layer
participant SaaS as Upstream SaaS API
LLM->>Gateway: JSON-RPC tools/call<br>{"name": "get_employee", "arguments": {"id": "123"}}
Gateway->>SaaS: GET /api/v1/employees/123<br>Authorization: Bearer token
SaaS-->>Gateway: HTTP 200 OK (Raw JSON with PII)
Gateway->>Redact: Scan + transform payload
Redact-->>Gateway: Sanitized payload
Gateway-->>LLM: JSON-RPC Response (PII-free)If you inspect the raw payload from the upstream SaaS API, it might look like this:
{
"id": "emp_89324",
"first_name": "Jane",
"last_name": "Doe",
"email": "jane.doe@enterprise.com",
"social_security_number": "999-99-9999",
"base_salary": 125000,
"department": "Engineering"
}Your proxy layer must intercept this payload, apply a masking policy, and return the sanitized version to the LLM inside the standard MCP content array:
{
"jsonrpc": "2.0",
"id": 42,
"result": {
"content": [{
"type": "text",
"text": "{\"id\": \"emp_89324\", \"first_name\": \"Jane\", \"department\": \"Engineering\", \"email\": \"<EMAIL_1>\", \"social_security_number\": \"[REDACTED]\"}"
}]
}
}This interception guarantees that the LLM only ever sees the data it strictly needs to perform its reasoning. A robust redaction layer must maintain three non-negotiable properties:
- Pre-model inspection. Gateways analyze content before it reaches the model, not after. Once data hits an LLM, it's already exposed.
- Deterministic transformation. Redaction must produce the same output for the same input across requests, otherwise pagination cursors and follow-up tool calls get confused.
- Reversibility for write-back. If the agent later calls
update_contact, your gateway needs a way to swap masked tokens back to real values, or the agent will overwrite a real email with<EMAIL_1>.
The redaction layer typically combines three detection strategies:
| Strategy | Catches | Misses |
|---|---|---|
| Regex / Luhn / IBAN validators | SSNs, credit cards, IBANs, structured IDs | Names, addresses, contextual PII |
| Named-entity recognition (NER) | Person names, locations, organizations in free text | Domain-specific identifiers (employee IDs, custom field PHI) |
| Schema-driven field rules | Known sensitive fields by JSON path | New fields added by the SaaS vendor |
Production systems run all three. Microsoft Presidio combined with spaCy and a YAML rule file is the open-source baseline. OpenAI's Privacy Filter is a small model with frontier personal data detection capability, designed for high-throughput privacy workflows, able to perform context-aware detection of PII in unstructured text and is a viable drop-in for the NER step.
Techniques for Masking SaaS Data: Static Redaction vs. Context-Aware Tokenization
Once you have the interception layer in place, you must choose how to alter the data. This is where most implementations fail. Replacing every sensitive field with [REDACTED] is the lazy answer, and it actively kills agent reasoning.
Static Redaction (Regex and Declarative Mapping)
Static redaction involves identifying sensitive fields by key name or regex pattern and replacing their values with a static string like [REDACTED], null, or ***. This approach is fast, deterministic, and easy to audit.
However, naive masking breaks LLM context. As noted by AI security firm Protecto AI, removing data entirely degrades AI accuracy. Consider this Salesforce contact list:
[
{ "id": "003", "email": "jane@acme.com", "company": "Acme" },
{ "id": "004", "email": "jane@acme.com", "company": "Acme" },
{ "id": "005", "email": "bob@globex.com", "company": "Globex" }
]If you use static redaction, every email turns into [REDACTED]. The agent can no longer answer "which contacts are duplicates?" because the values are identical strings. Worse, when the user says "send a follow-up to Jane," the agent has nothing to reference. If an LLM is trying to correlate support tickets submitted by the same user, but every email address has been replaced with the exact same static string, the model loses the ability to group those tickets.
Context-Aware Tokenization
To solve the context degradation problem, enterprise architectures use context-aware tokenization or synthetic data replacement. Enterprise gateways replace actual values with semantically meaningful, format-preserving tokens.
Instead of wiping out the data, the tokenization engine replaces it with a synthetic but consistent value:
[
{ "id": "003", "email": "<EMAIL_1>", "company": "<COMPANY_1>" },
{ "id": "004", "email": "<EMAIL_1>", "company": "<COMPANY_1>" },
{ "id": "005", "email": "<EMAIL_2>", "company": "<COMPANY_2>" }
]If the LLM sees <EMAIL_1> across five different Jira tickets, it can correctly reason that the same user submitted all five tickets. It can generate a summary report based on that correlation.
When the LLM outputs its final response or attempts to execute a write operation (like update_jira_ticket), the proxy layer intercepts the outbound request, detokenizes <EMAIL_1> back to jane@acme.com, and forwards the request to the SaaS API. This is the same pattern LiteLLM uses with Presidio: for 'replace' operations, the gateway can check the LLM response and replace the masked token with the user-submitted values.
To implement this safely, you must keep a short-lived, in-memory token vault keyed to the conversation or session. Never persist these mappings to disk. If you write the token map to a database, you have just rebuilt the exact data retention problem you were trying to solve.
Redaction vs. Hashing vs. Tokenization: Choosing Per Field
The previous section explains why static redaction kills LLM reasoning and why tokenization preserves it. But in practice, you need to pick from three distinct strategies - redaction, deterministic hashing, and reversible tokenization - and the right choice depends on whether the agent needs to correlate records, write data back, or simply not see the value at all.
Strategy 1: Hard Redaction (Drop or Replace)
Use hard redaction for fields the LLM should never reason about. Government IDs, raw compensation figures, and EEOC demographic data fall into this bucket.
function redactFields(record: Record<string, any>, dropKeys: string[]): Record<string, any> {
const output = { ...record };
for (const key of dropKeys) {
if (key in output) {
output[key] = "[REDACTED]";
}
}
return output;
}
// Usage: SSN and exact salary are never useful to the LLM
const sanitized = redactFields(candidate, [
"social_security_number",
"tax_id",
"base_salary",
"eeoc_race",
"eeoc_gender"
]);When to use it: the field is high-sensitivity, the LLM does not need it for the current task, and you have no write-back requirement.
Strategy 2: Deterministic Hashing (HMAC-SHA256)
Deterministic hashing is the right choice when the LLM needs to correlate records without seeing real values and you do not need to reverse the process. SHA-256 hashing is the best default because it is deterministic (same input produces the same hash) and supports GDPR Article 17 right-to-erasure lookups without storing PII. Use a keyed HMAC with a per-tenant secret so hashes are meaningful within a tenant but useless outside it.
import { createHmac } from "crypto";
function hmacHash(value: string, tenantSecret: string): string {
return createHmac("sha256", tenantSecret)
.update(value.toLowerCase().trim())
.digest("hex")
.slice(0, 16); // truncate for readability in LLM context
}
function hashFields(
record: Record<string, any>,
fieldsToHash: string[],
tenantSecret: string
): Record<string, any> {
const output = { ...record };
for (const key of fieldsToHash) {
if (output[key]) {
output[key] = `<HASH_${hmacHash(String(output[key]), tenantSecret)}>`;
}
}
return output;
}
// Same email always produces the same hash within a tenant
const hashed = hashFields(candidate, ["email", "phone"], TENANT_SECRET);
// { email: "<HASH_a3f8b2c1e9d04712>", phone: "<HASH_7e2f9a1b3c5d6801>" }When to use it: the LLM needs to detect duplicates, group by submitter, or count unique entities - but never needs to display or write back the original value.
Strategy 3: Reversible Tokenization (Session-Scoped Vault)
Reversible tokenization is required when the agent reads a value, reasons about it, and then writes it back. The session-scoped vault maps tokens to real values in memory and detokenizes on the write path.
class SessionTokenVault {
private forward = new Map<string, string>(); // real -> token
private reverse = new Map<string, string>(); // token -> real
private counter = new Map<string, number>(); // per-type counter
tokenize(value: string, type: string): string {
if (this.forward.has(value)) return this.forward.get(value)!;
const count = (this.counter.get(type) ?? 0) + 1;
this.counter.set(type, count);
const token = `<${type}_${count}>`;
this.forward.set(value, token);
this.reverse.set(token, value);
return token;
}
detokenize(token: string): string | undefined {
return this.reverse.get(token);
}
// Call this when the session ends
destroy(): void {
this.forward.clear();
this.reverse.clear();
this.counter.clear();
}
}
// Usage in a redaction pipeline
const vault = new SessionTokenVault();
candidate.email = vault.tokenize(candidate.email, "EMAIL");
// "priya.sharma@example.com" -> "<EMAIL_1>"
// On write-back, detokenize before forwarding to the SaaS API
const realEmail = vault.detokenize("<EMAIL_1>");
// "priya.sharma@example.com"When to use it: the agent must read a contact list and then send an email, update a record, or move a candidate to the next stage by referencing a specific person.
Never mix strategies for the same field in the same session. If email is hashed in a read response but needs to be detokenized for a write, the agent will send a hash to the SaaS API and corrupt the record. Pick one strategy per field per policy and stick with it.
Implementing a Centralized PII Gateway for AI Agents
Security policies fail when they are decentralized. The biggest mistake teams make is putting redaction logic inside the AI agent's internal prompt logic or a client-side library. Every new agent re-implements the rules, drifts, and eventually leaks, a problem that compounds quickly in multi-agent systems. If an engineer updates the agent's system prompt and accidentally removes the instruction to ignore SSNs, the data leaks immediately. Without a centralized control layer, every application must implement its own filtering logic for data privacy. That approach leads to inconsistent security and compliance gaps.
PII redaction must happen at a centralized gateway or proxy layer. The MCP server already holds the privileged credentials, possesses the schema knowledge, and sits at the perfect network position. It is the natural enforcement point. A centralized gateway provides several architectural advantages:
1. Schema-Driven Field Masking with JSONata
For structured fields, declarative transforms beat imperative code. A response transformation expressed as JSONata is auditable, reviewable in a pull request, and trivially diffable for compliance teams. Instead of writing custom Python or Node.js code for every single integration, you define a JSONata expression that maps the raw payload to a sanitized schema.
Here is an example JSONata expression that strips and tokenizes sensitive HR data from a raw Workday API response:
(
$maskLastName := function($str) { $substring($str, 0, 1) & "***" };
$map($$ , function($v) {
{
"id": $v.id,
"first_name": $v.first_name,
"last_name": $maskLastName($v.last_name),
"email": "<EMAIL_" & $hash($v.email) & ">",
"social_security_number": null,
"compensation": $exists($v.base_salary) ? "<REDACTED>" : null,
"manager_id": $v.manager_id,
"department": $v.department
}
})
)This pattern—declarative response mapping with a transformation language—is exactly how Truto already handles unified API normalization. The same hooks that map first_name to a unified firstName field can drop, hash, or tokenize the value. See our JSONata mapping guide for the broader pattern.
2. Free-Text Scanning for Tickets and Notes
Structured field rules don't help when the SaaS payload contains a raw Zendesk ticket body or a Salesforce note. Run free-text fields through a detector before serializing:
import { AnalyzerEngine, AnonymizerEngine } from "presidio-client";
async function sanitizeFreeText(payload: any, fields: string[]) {
for (const path of fields) {
const text = get(payload, path);
if (!text) continue;
const findings = await analyzer.analyze({
text,
language: "en",
entities: ["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
"US_SSN", "CREDIT_CARD", "IBAN_CODE"]
});
const anonymized = await anonymizer.anonymize({ text, analyzer_results: findings });
set(payload, path, anonymized.text);
}
return payload;
}Run the detector with a confidence threshold appropriate to the field. Ticket bodies in healthcare-adjacent products should err toward over-redaction; product feedback fields can be more permissive.
3. Tool-Surface Minimization and Namespace Resolution
The other half of the gateway story is which tools you expose at all. Instead of exposing every endpoint a SaaS API offers, a gateway can dynamically generate MCP tools based strictly on approved documentation records. If your agent only needs ticket subjects and statuses, do not ship an attachments.download tool. Documentation-driven tool generation forces a deliberate review for every endpoint that touches an LLM. We covered this pattern deeply in Auto-Generated MCP Tools.
Furthermore, when an MCP client calls a tool, all arguments arrive in a single flat JSON object. A centralized gateway uses predefined JSON Schemas to split these arguments into query parameters and body parameters before forwarding them to the proxy API handlers. This prevents prompt injection attacks from smuggling malicious payloads into unexpected HTTP headers or query strings.
4. Strict Rate Limit Handling
AI agents are notoriously aggressive when scraping data. A centralized gateway protects your infrastructure by enforcing rate limits. It is a critical architectural requirement that your gateway does not absorb or silently retry upstream rate limit errors. Truto, for example, explicitly passes upstream HTTP 429 errors directly to the caller. Truto normalizes the upstream rate limit information into standardized headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset) per the IETF specification. The AI agent's orchestration layer remains in full control of retry logic and exponential backoff, ensuring predictable system behavior.
Policy Recipes: What to Show the LLM for Common ATS Tasks
The classification matrix tells you the sensitivity tier of each field. But the action you take also depends on the task. An AI agent screening candidates for a hiring manager needs different data than an agent generating an anonymized diversity report. Here are field-level policy recipes for the most common ATS agent workflows.
Candidate Screening and Shortlisting
The agent reads candidate profiles and applications, compares them against a job description, and produces a ranked shortlist.
- Pass through:
first_name,application.status,application.current_stage,job.title,job.department,scorecard.ratings,tags - Tokenize (reversible):
email_addresses,phone_numbers,last_name- the agent may need to reference specific candidates in follow-up actions - Replace with band:
custom_fields.expected_compensation- replace exact figures with a band (e.g.,"$170K-$190K") so the agent can assess fit without seeing the precise number - Drop:
eeoc.*,social_security_number,tax_id,attachments(raw resume files)
Interview Scheduling
The agent coordinates calendar availability and moves candidates to the next interview stage.
- Pass through:
first_name,last_name,application.current_stage,interview.scheduled_at,interview.interviewers - Tokenize (reversible):
email_addresses- required for sending calendar invites on write-back - Drop: everything else. The agent does not need compensation, EEOC data, scorecards, or notes to schedule an interview.
Diversity and Compliance Reporting
The agent aggregates EEOC data across applications and departments to build anonymized dashboards.
- Pass through:
eeoc.race,eeoc.gender,eeoc.veteran_status,eeoc.disability_status,application.status,reject_reasons,job.department - Hash (deterministic, irreversible):
candidate.id- allows counting unique candidates without re-identification - Drop:
first_name,last_name,email_addresses,phone_numbers,compensation,scorecard.comments,activities.body
The key here: EEOC fields are only safe for the LLM when every other identifying field has been dropped or irreversibly hashed. Pass EEOC data alongside a name and you have just created a re-identification vector.
Offer Approval Workflow
The agent prepares an offer package for a hiring manager's approval.
- Pass through:
first_name,last_name,job.title,job.department,offer.start_date,offer.status - Pass through (exact):
offer.salary,offer.equity,offer.bonus- this is the one workflow where exact compensation must reach the agent - Tokenize:
email_addresses,phone_numbers - Drop:
eeoc.*,social_security_number,scorecard.comments
The salary-band-vs-exact-figure decision comes down to the task. Screening agents get bands. Offer-approval agents get exact numbers. Reporting agents get neither. Encode this as a named policy in your gateway config, not as a per-agent prompt instruction.
End-to-End: How Normalization and Redaction Fit the Proxy Flow
Putting it all together, here is the complete pipeline when an AI agent calls list_ats_candidates through an MCP gateway backed by a unified API:
sequenceDiagram
participant Agent as AI Agent (MCP Client)
participant GW as MCP Gateway
participant Unified as Unified API Layer
participant ATS as Upstream ATS<br>(Greenhouse / Lever / Workday)
Agent->>GW: tools/call "list_ats_candidates"<br>{"job_id": "job_445", "status": "active"}
GW->>Unified: GET /unified/ats/candidates?job_id=job_445
Unified->>ATS: Provider-specific API call<br>(schema, auth, pagination handled)
ATS-->>Unified: Raw provider response
Unified-->>GW: Normalized candidate records
Note over GW: 1. Apply task-specific policy<br>2. Run schema-driven field rules<br>3. Run NER on free-text fields<br>4. Tokenize / hash / redact per matrix
GW-->>Agent: JSON-RPC response<br>(PII-free, LLM-friendly)The normalization step (Unified API Layer) solves the schema divergence problem: every ATS response arrives in the same shape. The redaction step (MCP Gateway) applies the task-specific policy from the decision matrix. Because both layers run in a pass-through architecture, no candidate data is persisted at any point in the pipeline.
This two-stage design - normalize first, then redact - means that when your customer connects a new ATS provider, your redaction rules work immediately. No new field mappings, no new regex patterns, no new code.
Zero Data Retention: The Ultimate Defense for Third-Party SaaS Data
Redacting PII in transit is only half the battle. Redaction prevents PII from reaching the LLM. It does not prevent PII from being stored by the MCP infrastructure itself.
If the infrastructure routing your MCP calls stores a copy of the unredacted third-party data on its own disks, you have created a massive compliance liability. If your AI agent connects to Salesforce or BambooHR through a managed MCP server platform, the data flowing through that server is governed by that platform's data retention policy. If the platform caches API responses in a database to speed up subsequent queries, they become a sub-processor of your customers' highly regulated enterprise data.
Your SOC 2 scope immediately expands. Your GDPR obligations multiply. Enterprise InfoSec teams will flag your application during procurement when they see a third-party caching their HR records.
As detailed in our breakdown of MCP Server Data Retention Policies, the major LLM providers explicitly wash their hands of what happens during a tool call. OpenAI's documentation states that data sent to remote MCP servers is subject entirely to the third-party's retention policies.
The only defensible architecture for handling sensitive SaaS data is a pass-through proxy with strictly zero data retention. In a zero data retention architecture, the platform acts entirely as a conduit. It receives the request from the LLM, resolves the OAuth tokens from a secure key-value store, proxies the request to the upstream SaaS API, applies the JSONata redaction transformations in memory, and returns the response to the LLM. The underlying SaaS data is never written to a database table, never stored in a durable state mechanism, and never logged in plain text.
Ask any MCP platform vendor exactly three questions: (1) Do you store the response body of tool calls? (2) For how long, and where? (3) Is that storage in scope of your SOC 2 report? If they hesitate, walk away.
For a deeper dive into this compliance posture, review our guide on Building SOC 2 & GDPR Compliant AI Agents.
GDPR and HIPAA Compliance Checklist for MCP Data Flows
Redaction is the technical control. But technical controls alone do not satisfy regulators. PII redaction is the data governance layer that makes compliance possible, but you still need consent logic, data retention policies, privacy notices, breach procedures, and a Data Protection Officer. Here is a practical checklist for teams shipping AI agents that touch ATS or HR data through MCP.
GDPR Requirements
- Lawful basis (Art. 6): Document the legal basis for processing candidate data through your AI agent. Legitimate interest is common for B2B SaaS, but the agent's specific purpose must be documented.
- Data minimization (Art. 5): Data minimization, purpose limitation, and privacy by design must be enforced at the operation level, not just declared in policy or procurement contracts. Your MCP tools should request only the fields the agent needs - not the entire candidate record.
- Special category data (Art. 9): EEOC demographics, disability status, and health information require explicit consent or a specific legal exemption. Default policy: drop these fields entirely unless the agent is performing anonymized aggregate reporting.
- Right to erasure (Art. 17): If you use deterministic hashing, you can verify deletion without storing PII. If you use reversible tokenization, the session vault must be destroyed when the conversation ends.
- Records of processing (Art. 30): Article 30 requires records of processing activities. For AI-driven processing, this means documented, attributable evidence of every agent interaction with personal data. Log which tool was called, which policy was applied, and which fields were redacted - but never log the unredacted values.
- Data Processing Agreement: If using a cloud provider or LLM as a service, you must have a signed Data Processing Agreement specifying responsibilities of each party. This applies to your MCP server platform, your LLM provider, and any NER service in the pipeline.
HIPAA Requirements (Healthcare Customers)
- Business Associate Agreement: Every AI vendor processing PHI on your organization's behalf must execute a HIPAA-compliant BAA before deployment. Vendors that will not execute a BAA cannot be used in PHI-touching workflows.
- Minimum necessary standard: The HIPAA Minimum Necessary Rule requires that access be limited to what is needed for the specific purpose. Your MCP tool surface must be restricted so agents cannot pull entire patient or employee records when they only need a status field.
- Audit controls: Implement operation-level audit logging for AI-PHI interactions. HIPAA's audit controls standard requires activity records for ePHI systems. For AI agents, tamper-evident audit logs capturing agent identity, PHI accessed, operation performed, and human authorizer satisfy HIPAA audit controls.
- Encryption in transit: All MCP transport between client, gateway, and upstream SaaS API must use TLS. No exceptions.
- Risk assessment: Conduct HIPAA risk assessments to identify risks to the integrity, confidentiality, and availability of PHI when used in AI technology. Assessments should be conducted regularly, especially when there are changes to existing processes or technology.
SOC 2 Considerations
- Sub-processor documentation: Your SOC 2 report must enumerate every system that touches customer data. If your MCP server platform stores response bodies, it is a sub-processor.
- Data retention evidence: Auditors will ask for proof that API response payloads are not persisted. A zero-data-retention architecture with pass-through proxying is the cleanest answer.
- Access controls: MCP tools should enforce per-integration, per-customer credential isolation. One customer's OAuth tokens must never be accessible to another customer's agent sessions.
The Honest Trade-offs of Gateway-Side Redaction
This architectural rigor is not free. Senior engineers should plan for these realities when implementing gateway-side redaction:
- Latency. NER on a 50KB ticket body adds 50-200ms per call. Caching detectors in-process and running regex first helps, but agent loops will feel slower.
- False positives. A product name that looks like a person's name will get tokenized. Build an allowlist for known-safe terms per integration.
- False negatives. No detector catches everything. Layer schema rules, regex, and NER, and accept that adversarial inputs (creative formatting of SSNs) will sometimes slip through. Assume a 5-10% false-negative rate on free text and design downstream controls to compensate.
- Reversibility complexity. Token-to-value swap on write paths is the single most common source of bugs. Test it explicitly with multi-turn agent traces.
- Schema drift. When Salesforce adds a new custom field your customer uses for SSNs, your schema-rule list will not know about it. Pair redaction with field-discovery and schema-drift monitoring.
Why Truto's Pass-Through Architecture Solves the MCP Security Problem
Building a centralized, zero-data-retention MCP gateway that handles OAuth token lifecycles, JSONata transformations, NER scanning pipelines, and dynamic tool generation is a massive engineering undertaking. This is exactly the infrastructure Truto provides out of the box.
Truto is a unified API and MCP server platform built around design choices that line up directly with strict InfoSec compliance requirements:
- Pass-Through by Default (Zero Data Retention): Truto's proxy API never stores the third-party SaaS data flowing through its MCP servers. The payload is fetched, processed in memory, returned to the caller, and discarded. Your customers' data is not warehoused on our side, eliminating sub-processor compliance risks entirely.
- Unified ATS Schema for Consistent Redaction: Truto's Unified ATS API normalizes candidate, application, job, scorecard, and EEOC data from Greenhouse, Lever, Workday, Ashby, and dozens of other providers into a single schema. Your redaction rules target one set of field paths and work across every connected ATS - no per-provider masking logic required.
- Declarative Transforms via JSONata: Truto's built-in JSONata transformation layer allows your engineering team to declaratively strip or tokenize sensitive fields from API responses before they ever reach the LLM. Redaction is configuration, not a fork of our code.
- Documentation-Driven Tool Exposure: Truto derives MCP tool definitions dynamically from integration resources and documentation records. An integration's resource only becomes an MCP tool if you explicitly document it, ensuring AI agents cannot access unauthorized endpoints.
- Transparent Rate Limiting: Truto passes upstream HTTP 429 rate limit errors directly to your agent with standardized IETF headers (
ratelimit-limit,ratelimit-remaining,ratelimit-reset), ensuring your orchestration layer maintains precise control over execution flow and retry backoff.
What Truto does not do: it is not a full DLP product. If you need adaptive NER on free text with custom entity training, pair Truto's transform layer with Presidio, OpenAI's Privacy Filter, or a commercial DLP gateway in front of the MCP endpoint. The pass-through architecture supports that composition cleanly.
Where to Start This Quarter
Exposing enterprise SaaS data to LLMs requires extreme architectural discipline. You cannot rely on prompt engineering to protect PII, and you cannot route sensitive payloads through platforms that cache your customers' data. If you are bringing AI agents anywhere near customer SaaS data, run this sequence:
- Inventory the tool surface. List every MCP tool your agent can call and every field each one returns. This alone catches half the leaks.
- Classify fields by sensitivity. Tag fields as PII, PHI, PCI, quasi-identifier, or safe. Get legal and security to sign off on the matrix.
- Pick a redaction strategy per class. Decide whether to drop, hash, tokenize, or pass through. Document the rule for each field.
- Centralize the enforcement at the MCP layer. Not in the agent. Not in the prompt. At the gateway.
- Verify zero data retention upstream. Read the SOC 2 report of your MCP infrastructure. If raw payloads land on disk anywhere, your redaction is theater.
- Test reversibility on write paths. Send a multi-turn agent trace that reads, then writes, then reads again. Ensure tokens round-trip correctly.
PII redaction is not a feature you bolt on the week before an InfoSec review. It is an architectural choice that determines whether your agent product can be sold to a regulated customer at all.
FAQ
- How do I standardize API responses from different ATS platforms like Greenhouse, Lever, and Workday for LLM consumption?
- Use a unified ATS API that normalizes provider-specific schemas into a common data model covering candidates, applications, jobs, scorecards, and other recruiting entities. This gives your redaction rules a single set of field paths to target regardless of the underlying ATS provider.
- What is the difference between PII redaction, hashing, and tokenization for AI agent data?
- Hard redaction replaces sensitive values with a static placeholder and is irreversible - use it for data the LLM should never see. Deterministic hashing (HMAC-SHA256) produces consistent one-way outputs for correlation without exposing real values. Reversible tokenization maps values to session-scoped tokens that can be swapped back on write paths.
- Should I send exact salary data or salary bands to an LLM via MCP?
- It depends on the task. Candidate screening agents should receive salary bands (e.g., $170K-$190K) to assess fit without exact figures. Offer approval agents need exact compensation to prepare packages. Reporting agents should receive neither - only aggregated statistics.
- What GDPR and HIPAA requirements apply to AI agents accessing ATS data through MCP?
- GDPR requires a documented lawful basis, data minimization at the operation level, special handling for Article 9 data like EEOC demographics, and records of processing for every agent interaction. HIPAA requires a Business Associate Agreement, minimum necessary access controls, operation-level audit logging, and regular risk assessments.
- How does a unified ATS API help with PII redaction for MCP servers?
- A unified ATS API normalizes field names and structures from every provider into a consistent schema. This means you write redaction rules once against the normalized output, and they work automatically when a customer connects any supported ATS - eliminating the need for per-provider masking logic.