How to Ensure Zero Data Retention When Processing Third-Party API Payloads
Learn how to architect a pass-through API proxy with zero data retention to pass enterprise SIG Core reviews and close B2B SaaS deals faster.
Your enterprise deal just died in procurement. The buyer's InfoSec team found that your integration vendor caches their HRIS records and CRM contacts on shared infrastructure for 30 to 60 days. They flagged it as an unmanaged sub-processor, refused to sign, and moved on.
If you're processing third-party API payloads containing sensitive CRM, HRIS, or financial data, storing that data at rest is a massive liability. Enterprise procurement teams will actively block your deals if your integration architecture relies on caching their regulated data on unverified third-party infrastructure. To pass strict InfoSec reviews and ship integrations fast, you need an architecture that processes data in transit without ever writing it to a database.
This guide breaks down exactly how to architect a Zero Data Retention (ZDR) integration pipeline — why legacy sync-and-store architectures fail enterprise security audits, how to build a stateless pass-through proxy, and how to use declarative mapping languages to normalize payloads entirely in memory.
The Enterprise Procurement Wall: Why Data Retention Kills Deals
When you sell B2B SaaS to mid-market companies, integration velocity is your primary bottleneck. When you move your SaaS integration strategy upmarket to enterprise, compliance becomes the binary go/no-go for revenue.
Your account executive moves a six-figure deal to the final stages, and the buyer's procurement team sends over a Standardized Information Gathering (SIG) questionnaire — a structured risk assessment published by Shared Assessments. SIG Core is an extensive assessment with over 600 questions covering 21 risk categories, designed to assess third parties that store or manage highly sensitive or regulated information such as payment card data or genetic records.
Domain 10 — Third-Party Risk Management — acts as a tripwire. One question in particular will stop your deal cold: "Does any third-party sub-processor store, cache, or replicate our data?"
If your application relies on a middleware vendor that caches your customer's data to handle API retries or pagination, your answer is yes. That yes triggers a cascade of follow-up questions about data residency, encryption at rest, retention policies, breach notification timelines, and sub-processor agreements. You need an integration tool that doesn't store customer data. If that vendor refuses to sign a Business Associate Agreement (BAA) or lacks the required compliance certifications, the deal stops dead.
The financial stakes driving this scrutiny are massive. IBM's 2024 Cost of a Data Breach Report found the global average cost of a data breach reached a record $4.88 million — a 10% increase from 2023 and the largest spike since the pandemic. In the United States specifically, the average breach cost leads the world at $9.36 million per incident.
Enterprise InfoSec teams aren't being paranoid. They're doing math. Every additional sub-processor that stores sensitive data is another node in the blast radius of a potential breach. You are trading a short-term engineering convenience for a permanent compliance blocker.
What is Zero Data Retention in API Processing?
Zero Data Retention (ZDR) in API processing is an architectural pattern where payload data is processed entirely in-memory and immediately discarded, ensuring no sensitive information is ever written to disk, databases, or secondary storage logs.
For third-party integration platforms, ZDR means:
- In-memory processing: All data transformations, filtering, and mapping occur in volatile memory (RAM) and are wiped immediately after the HTTP response is sent.
- No persistent caching: The system does not write payload data to Redis, Memcached, or database tables to handle pagination or rate limiting.
- No retry queues on disk: Failed requests are handled by the client or via stateless queuing systems that only store metadata, not the payload itself.
- Redacted logging: API logs only capture metadata (status codes, latency, endpoint URLs) and actively strip request/response body payloads.
This isn't a new concept. Major technology providers are already shifting toward this model to satisfy enterprise demands. OpenAI offers Zero Data Retention for approved enterprise API customers, ensuring data is processed in-memory and not used for training — inputs and outputs are never logged and are not retained for application state. Brave Search API markets itself as the only search API offering true ZDR to reduce liability exposure for AI companies. LandingAI provides a ZDR option where documents are processed in-memory and immediately discarded. Your integration layer must meet this same standard.
ZDR also aligns directly with legal requirements. Article 5(1)(c) of the General Data Protection Regulation (GDPR) states that personal data shall be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed" — a principle known as data minimization. If the purpose of your integration layer is to transform and relay data, then storing that data exceeds what is necessary.
ZDR vs. data minimization: ZDR is the architectural implementation of the GDPR's data minimization principle applied to API middleware. Data minimization says "don't collect more than you need." ZDR says "don't persist anything at all — process it in transit."
The Flaws of Traditional Sync-and-Store Integration Architectures
Most integration platforms — whether built in-house or purchased from a vendor — follow a sync-and-store pattern:
flowchart LR
A[Third-Party API<br>e.g. Salesforce] -->|Pull data| B[Integration<br>Middleware]
B -->|Store in| C[(Middleware<br>Database)]
C -->|Serve from cache| D[Your Application]When you request a list of contacts from Salesforce through a traditional aggregator, the aggregator does not just proxy your request. It continuously polls the Salesforce API in the background, downloads the customer's entire CRM database, normalizes the data into its own proprietary schema, and stores it in a massive multi-tenant database. When you query their API, you're actually querying their cached copy of your customer's data.
This architecture has real engineering benefits — it reduces latency, avoids third-party rate limits, and makes pagination trivial. But it creates severe compliance problems:
| Problem | What Happens |
|---|---|
| Sub-processor expansion | The middleware vendor becomes a data processor under GDPR and must be listed as a sub-processor in your DPA. You must now explain to enterprise InfoSec why a third-party startup holds 30 to 60 days of their employee records. |
| Data residency violations | Customer data may be stored in regions that violate contractual or legal requirements. |
| Data staleness and retention ambiguity | You're querying a cache, so data is inherently stale. If a user deletes a sensitive record in the source system, that record might persist in the aggregator's database for weeks until the next sync cycle completes. |
| Security honeypots | Centralizing thousands of companies' CRM and HRIS data into a single multi-tenant database creates a high-value target for attackers. |
| Audit complexity | You now need to audit your vendor's security controls — encryption at rest, access policies, breach notification — in addition to your own. |
The worst part? Many teams don't realize the compliance cost until their first enterprise deal is on the line. By then, ripping out a deeply embedded sync-and-store integration layer is a multi-quarter project.
Passing enterprise security reviews when using third-party API aggregators is nearly impossible when the vendor's architecture fundamentally violates data minimization principles.
How to Architect a Pass-Through API Proxy
To achieve true Zero Data Retention, you must decouple the configuration of the integration from the execution of the payload. The architecture must act as a stateless proxy layer that translates requests on the fly.
Here is how a pass-through execution engine handles a unified API request without storing data at rest:
graph TD
Client[Client Application] -->|Unified Request<br>GET /crm/contacts| Proxy[Stateless Execution Engine]
subgraph Zero Storage Boundary
Proxy -->|1. Load Config| DB[(Configuration DB<br>No Payload Data)]
DB -->|Returns JSONata Mapping| Proxy
Proxy -->|2. In-Memory Transform| Proxy
Proxy -->|3. Native Request<br>GET /services/data/v59.0/query| Provider[Third-Party API<br>Salesforce, HubSpot]
Provider -->|4. Native Response| Proxy
Proxy -->|5. In-Memory Transform| Proxy
end
Proxy -->|6. Unified Response| ClientHere are the technical requirements for making this work:
1. Configuration as Data, Not Code
Instead of writing integration-specific code (if provider == 'salesforce'), the system stores the blueprint of the API — base URLs, authentication schemes, pagination strategies, and rate limit rules — as a JSON blob in a configuration database. The database contains zero integration-specific columns and zero payload data.
This is a critical distinction: operational metadata (tokens, refresh schedules, account configuration, JSONata expressions) lives in the database, while customer payload data (the actual CRM contacts, employee records, and financial transactions) never touches persistent storage.
2. Just-in-Time Credential Injection
When a request arrives, the engine retrieves the target account's encrypted credentials. It decrypts the OAuth token in memory, applies it to the outbound request header, and immediately discards the decrypted value. If the token is expired, the engine proactively refreshes it using a mutex lock to prevent race conditions, updates the encrypted storage, and proceeds with the request. The system must store OAuth tokens and API keys to function — that's unavoidable. But there's a hard line between stored credentials and customer data flowing through the system.
3. In-Memory Payload Transformation
All data mapping — from the third-party's native schema to your unified schema — must happen in memory. No intermediate writes to a database or file system. The transformation engine receives a JSON object, applies a mapping function, and outputs a new JSON object. Each request is self-contained: the proxy doesn't remember previous requests and doesn't maintain a local copy of the third-party's data. Every request goes directly to the source API, gets the freshest data, and returns it.
4. Client-Side Pagination and Real-Time Rate Limiting
Traditional systems cache data to handle pagination. A pass-through proxy handles it dynamically. The configuration defines the pagination style (cursor, offset, link header). The engine extracts the next-page cursor from the third-party response, maps it to a unified cursor format, and passes it directly back to the client. The client holds the state, not the middleware.
Similarly, when the third-party API returns a 429 Too Many Requests, the engine detects the rate limit, extracts the Retry-After header, normalizes it, and passes the 429 directly back to the client. No retry queues persist the payload to disk.
5. No Full-Body Request Logging
This one catches people off guard. Your observability stack probably logs full request and response bodies by default. If those bodies contain employee SSNs or patient records, your logging infrastructure just became a data retention liability. A zero-storage architecture logs metadata (status codes, latency, error types) but strips or redacts payload content from all logs.
The honest trade-off: Pass-through architectures are slower than cached ones. Walking through 50 pages of a third-party API on every request adds real latency. If you need sub-100ms response times on integration data, a pure pass-through won't get you there. You'll need to cache data in your own infrastructure (where you control retention, encryption, and residency) rather than relying on middleware to do it. Some platforms offer opt-in synced data stores for exactly this reason — the key is that the default path should be zero-storage.
Using JSONata for Stateless Payload Transformation
The hardest part of a pass-through integration architecture isn't the proxying — it's the real-time data transformation. You need to convert Salesforce's FirstName to your unified first_name, translate HubSpot's filterGroups into a common query format, and normalize completely different pagination schemes — all without writing the data to disk.
If you write imperative code to handle this, you end up with sprawling, unmaintainable microservices and the dreaded if (provider === 'hubspot') { ... } pattern that scales linearly with the number of integrations. Every new provider means new code, new tests, new deployments. Instead, you need a declarative transformation language.
Truto's zero-code architecture relies heavily on JSONata — a functional query and transformation language purpose-built for reshaping JSON objects. JSONata is ideal for ZDR because it is:
- Side-effect free: Expressions are pure functions. They take an input, apply a transformation, and return an output without mutating external state or writing to disk.
- Turing-complete: Despite being declarative, JSONata supports complex conditionals, string manipulation, array transforms, and custom functions necessary for mapping convoluted enterprise APIs.
- Storable as configuration: A JSONata expression is just a string. It can live in a database column alongside integration configuration data, which means adding support for a new API is a configuration change, not a code deployment.
Here's what a stateless field mapping looks like for a Salesforce contact:
/* Transform a Salesforce contact into a unified schema */
{
"id": Id,
"first_name": FirstName,
"last_name": LastName,
"email": Email,
"phone": Phone,
"company_name": Account.Name,
"created_at": CreatedDate,
"updated_at": LastModifiedDate,
"remote_data": $
}And here's a more complex example mapping a HubSpot contact with custom date formatting:
(
$formatDate := function($date) { $substring($date, 0, 10) };
{
"id": id,
"first_name": properties.firstname,
"last_name": properties.lastname,
"email": properties.email,
"phone": properties.phone,
"created_at": $formatDate(createdAt),
"updated_at": $formatDate(updatedAt),
"remote_data": $
}
)Both expressions run entirely in memory. The engine evaluates the expression against the incoming payload, constructs the unified JSON object, sends it to the client via the open HTTP connection, and garbage-collects the memory. The data never touches a hard drive.
Declarative mappings scale as data — far easier to manage and audit than imperative code sprawl. Each new integration is a new configuration row, not a new microservice.
Handling Complex Orchestration Statelessly Sometimes a single unified request requires fetching data from multiple third-party endpoints (e.g., fetching a contact, then fetching their associated company). A pass-through engine handles this via "Before" and "After" pipeline steps. The engine executes the first request, holds the partial result in memory, executes the second request, merges the results using JSONata, and returns the final payload. No intermediate results are written to storage.
Passing the SIG Core Questionnaire with a Zero-Storage Architecture
Let's bring this back to the deal stuck in procurement. Your enterprise prospect sent a SIG Core questionnaire. Here's how your answers change depending on your integration architecture:
| SIG Question Category | Sync-and-Store Answer | Zero-Storage Answer |
|---|---|---|
| Does any sub-processor store customer data? | Yes — middleware caches CRM/HRIS data for 30-60 days | No — middleware processes data in transit only |
| Data residency controls? | Depends on middleware vendor's infrastructure | N/A — no data at rest in middleware |
| Encryption at rest for stored data? | Must verify middleware vendor's encryption practices | N/A — no data at rest to encrypt |
| Data retention and deletion policy? | Must align with middleware vendor's retention schedule | N/A — nothing to retain or delete |
| Breach notification for sub-processor? | Must establish breach notification chain with middleware vendor | Reduced scope — middleware has no payload data to breach |
With a zero-storage architecture, you don't need to declare the integration vendor as a sub-processor for payload data. They still handle operational metadata (OAuth tokens, account configurations), but the sensitive customer data — the employee records, the financial transactions, the patient information — never touches their disk.
This doesn't make the questionnaire disappear. You still need to demonstrate that your own application handles data responsibly. But it removes an entire category of questions about third-party data handling, which is often the section that kills deals.
Where Truto Fits
Truto's unified API uses a pass-through proxy architecture by default. When your application calls GET /unified/crm/contacts, Truto calls the third-party API in real time, applies JSONata-based transformations in memory, and returns the unified response. No customer payload data is written to Truto's databases.
The architectural underpinning is a generic execution engine that reads declarative configuration (integration endpoints, auth schemes, pagination strategies) and declarative mappings (JSONata expressions for schema translation). A single code path handles every integration — Salesforce, HubSpot, Workday, and 100+ others — without any provider-specific logic in the runtime. This means there's no integration-specific database table or column where your data could accidentally end up.
When the enterprise auditor asks, "Does any third-party sub-processor store, cache, or replicate our data?", you can confidently answer "No." Truto processes the data in memory and routes it directly to your application, drastically simplifying SOC 2 and SIG Core audits.
Where the trade-off gets real: Truto also offers a synced data feature called SuperQuery (backed by TimescaleDB) for use cases that genuinely need cached, SQL-queryable integration data — like building dashboards or running analytics across thousands of records. This feature stores data. It's explicitly opt-in, and enabling it means you are introducing data persistence through Truto. The default Unified API path remains strictly pass-through.
According to IBM's same 2024 report, organizations making extensive use of AI and automation in security prevention workflows saw average breach costs reduced by $2.2 million compared to those that didn't. By adopting a zero-storage architecture, you eliminate the risk of cached data breaches entirely, protecting both your customers and your balance sheet.
If your primary concern is finding an integration tool that doesn't store customer data, the key question to ask any vendor is: "In the default API path, is customer payload data ever written to persistent storage — including logs, retry queues, and caches?" If the answer involves qualifications about retention windows, you're looking at a sync-and-store architecture dressed up with a short TTL.
What to Do Next
If you're evaluating which integration tools are best for enterprise compliance, here's a concrete checklist:
-
Audit your current architecture. Map every location where third-party customer data is persisted — databases, caches, log files, retry queues, analytics pipelines. You'll probably find more copies than you expected.
-
Separate operational metadata from payload data. Your integration layer needs to store tokens and configuration. It should not need to store the actual data flowing through it.
-
Ask vendors the hard questions. Request architecture diagrams showing exactly where data is written. Ask for their data flow documentation, not just their marketing page.
-
Accept the latency trade-off — or own the cache yourself. If you need fast reads over integration data, cache it in your own infrastructure where you control retention, encryption, and residency. Don't outsource that responsibility to middleware.
-
Validate with a real SIG Core dry-run. Before your next enterprise deal hits procurement, fill out the third-party risk management sections of a SIG Core questionnaire yourself. Identify the questions you can't answer cleanly and fix the architecture before it costs you revenue.
For a deeper walkthrough of the procurement process, see our guide on how to pass enterprise security reviews when using third-party API aggregators. For a detailed comparison of zero-storage platforms, read why Truto is the best zero-storage unified API for compliance-strict SaaS.
Stop letting your six-figure enterprise deals die in procurement. Architect your integration layer for compliance from day one.
FAQ
- What is zero data retention in API processing?
- Zero Data Retention (ZDR) is an architectural pattern where API payloads are processed entirely in-memory and never written to disk, databases, or persistent storage. The system receives, transforms, and delivers data without retaining any copies on the middleware provider's infrastructure.
- How does zero data retention help pass enterprise security reviews?
- When your integration middleware doesn't store customer payload data, you don't need to declare it as a data sub-processor for sensitive information. This eliminates an entire category of SIG Core questionnaire questions about third-party data residency, retention policies, encryption at rest, and breach notification chains — which is often the section that kills deals.
- What is the difference between pass-through and sync-and-store integration architectures?
- Sync-and-store architectures pull data from third-party APIs, cache it in the middleware's multi-tenant database, and serve your application from that cache. Pass-through architectures call the third-party API in real-time for every request, transform the response in memory, and return it without persisting anything. Pass-through is slower but avoids the compliance liabilities of storing customer data on middleware infrastructure.
- How does JSONata enable stateless API transformations?
- JSONata is a functional, side-effect-free query language that maps and transforms JSON payloads entirely in memory. Because expressions are pure functions stored as configuration strings, they eliminate the need to write imperative integration code or store data in a database during the translation process.
- What data does a zero-storage integration platform actually store?
- A zero-storage platform still stores operational metadata like encrypted OAuth tokens, account connection status, webhook configurations, and declarative integration mappings. The distinction is that customer payload data — the actual CRM contacts, employee records, and financial transactions flowing through the system — is never persisted.