Skip to content

The Operational Runbook for Declarative Syncs and Compliance

A step-by-step playbook to build the integrations your sales team asks for - from prioritization and PRD templates to declarative, zero-data-retention sync pipelines.

Roopendra Talekar Roopendra Talekar · · 23 min read
The Operational Runbook for Declarative Syncs and Compliance

Your six-figure enterprise deal just hit a wall. Not because the prospect disliked the product demo, and not because a competitor undercut you on price. It died in procurement because the infosec team opened the vendor risk assessment questionnaire and found your integration layer listed as a stateful third-party data sub-processor.

An operational runbook for declarative syncs and compliance is a structured document that defines how your team configures, deploys, monitors, and audits configuration-driven data pipelines—without writing provider-specific code and without caching third-party data in a middleware layer. If you are an engineering leader or PM at a B2B SaaS company shipping integrations while passing SOC 2, HIPAA, or enterprise vendor risk assessments, this is the playbook you need.

When B2B SaaS companies move upmarket, their integration strategy must evolve. The custom Python scripts and Zapier templates that sustained your SMB tier will collapse under the weight of enterprise requirements. Enterprise buyers demand native, reliable connections to their customized systems, but they outright refuse to let their highly sensitive data sit in a third-party caching layer just so your engineering team can handle API retries.

This guide provides a step-by-step framework to move from brittle, hand-rolled sync scripts to declarative pipelines that are auditable, compliant, and operationally sane.

The Enterprise Integration Bottleneck

Moving upmarket exposes SaaS companies to a completely different class of security scrutiny. Procurement teams are hyper-vigilant. They scrutinize every node in your architecture that touches their data. The moment your integration middleware stores third-party customer data—even temporarily for retry buffers or workflow state—it becomes a data sub-processor in the eyes of procurement.

The cost of getting this wrong is severe. Breach costs increased 10% from the prior year, the largest yearly jump since the pandemic, as 70% of breached organizations reported that the breach caused significant or very significant disruption. The global average cost of a data breach reached $4.88 million in 2024. Enterprise infosec teams are hyper-aware of every third party that touches their data.

Simultaneously, integration capabilities are a strategic imperative to outperform competitors. 106 is the average number of SaaS apps per company in 2024, according to BetterCloud's State of SaaS report. Every one of those apps is a potential integration point your customers will ask about. The demand for integration has never been higher, and integration platform as a service is the largest integration platform market meeting this demand.

If each integration requires a dedicated sync script, and each script takes an engineer three weeks to build and harden, your integration backlog will outpace your hiring plan. To capture enterprise revenue without expanding headcount or failing SOC 2 audits, engineering teams must adopt declarative data sync pipelines built on zero-data-retention architecture.

Building the Integrations Sales Actually Asks For

Sales just pinged engineering: "We need a BambooHR integration by next quarter or we lose a $200K deal." This happens weekly at every growing B2B SaaS company. The problem is not that sales asks for integrations - it is that engineering has no structured way to evaluate, prioritize, and ship them fast enough to keep pace with the pipeline.

Integrations are brought up in 60% of all sales deals, and product integrations are deal breakers for 84% of customers. During vendor assessment, buyers are primarily concerned with a software provider's ability to provide integration support (44%). In G2's 2024 Buyer Behavior Report, integration capability ranks as a top buying consideration - it is the #1 factor for teams purchasing marketing, sales, customer service, and customer success software.

Despite this, most B2B SaaS companies treat integrations as a feature request queue - a customer asks for Salesforce, so you build Salesforce; another asks for NetSuite, so you build NetSuite, and six months later you have eight integrations with no strategy behind them. The result is engineering spending 30-40% of their time on integration support instead of building product.

A Prioritization Framework That Works

Stop treating integration requests as a flat backlog. Score each request across three dimensions:

Dimension Weight How to Measure
Revenue impact 40% Total pipeline value of deals blocked or influenced by this integration
Request frequency 30% Number of unique prospects and customers who asked in the last 6 months
Technical overlap 30% How much of the unified model this integration shares with existing connectors

Pull integration mentions from your CRM's closed-lost reasons and deal notes. If "no BambooHR integration" appears in five lost deals worth $150K each, that is $750K in attributed pipeline - a number that justifies engineering investment far more than any gut feeling.

Technical overlap is the dimension most teams ignore. If you already support three HRIS providers, adding a fourth is dramatically cheaper than adding your first accounting integration - the unified schema, webhook event types, and sync job templates already exist. In a declarative system, a new provider within an existing category is often just a new JSON config and a set of JSONata mapping expressions.

Measuring What Matters: Time-to-First-Sync

The single most important metric for your integration POC is time-to-first-sync: how long it takes from "we decided to build this" to "real customer data flows through the pipeline."

For hand-rolled scripts, time-to-first-sync is typically 2-4 weeks per provider. That includes building the OAuth flow, implementing pagination, writing field mappings, handling errors, and deploying the code. With a declarative pipeline, time-to-first-sync should collapse to hours or days, because you are writing configuration, not code.

Track time-to-first-sync for every integration you ship. If it is not improving over time, your architecture has a scaling problem.

Imperative Scripts vs. Declarative Data Pipelines

Most engineering teams start building integrations imperatively. A customer requests a Zendesk integration, so a developer writes a Node.js script. They implement Zendesk's specific cursor pagination, write a function to handle Zendesk's OAuth flow, and hardcode the mapping to your internal database schema.

Then a customer asks for Jira. The developer writes another script in Python, this time handling Jira's offset pagination, Basic Auth, and custom field structures. Six months later, you have a graveyard of brittle ETL scripts.

An imperative sync script tells the computer how to fetch data: write the HTTP client, handle pagination manually, manage OAuth refresh logic, serialize query parameters, parse responses, track cursors. The "how" is spread across hundreds of lines of code per provider.

The imperative approach (pseudocode)

# zendesk_sync.py - one of many provider-specific scripts
def sync_tickets(account, last_sync_date):
    token = refresh_oauth_token(account)  # you wrote this
    cursor = None
    while True:
        resp = requests.get(
            f"https://{account.subdomain}.zendesk.com/api/v2/tickets",
            headers={"Authorization": f"Bearer {token}"},
            params={"page[after]": cursor, "sort": "updated_at",
                    "filter[updated_at]": last_sync_date}
        )
        if resp.status_code == 429:
            time.sleep(int(resp.headers.get("Retry-After", 60)))
            continue
        data = resp.json()
        for ticket in data["tickets"]:
            normalized = map_zendesk_ticket(ticket)  # hand-written mapping
            emit_record(normalized)
        cursor = data.get("meta", {}).get("after_cursor")
        if not cursor:
            break

You will write a nearly identical file for ServiceNow, Freshdesk, and every other ticketing provider. When an upstream provider alters an endpoint, your team drops product work to push an emergency fix.

A declarative data pipeline fundamentally changes this dynamic. Instead of writing code that dictates what the end state should look like: which resources to sync, what the dependency graph looks like, and how to map fields between schemas.

The declarative approach (configuration)

{
  "integration_name": "zendesk",
  "resources": [
    {
      "resource": "ticketing/users",
      "method": "list"
    },
    {
      "resource": "ticketing/tickets",
      "method": "list",
      "query": {
        "updated_at": { "gt": "{{previous_run_date}}" }
      }
    },
    {
      "resource": "ticketing/comments",
      "method": "list",
      "depends_on": "ticketing/tickets",
      "query": {
        "ticket_id": "{{resources.ticketing.tickets.id}}"
      }
    }
  ]
}

In a declarative system, the runtime engine is a generic pipeline. It reads this JSON configuration describing how to talk to the API. Pagination strategy, auth handling, field normalization, and cursor tracking are all resolved from configuration data—not from code you wrote. Adding a new integration becomes a data operation, not a code deployment. Swapping Zendesk for Freshdesk means changing the integration_name field. For a deeper look at this architectural shift, read our guide on Declarative Data Sync Pipelines: Ship Integrations as Config, Not Code.

The Compliance Mandate: Why Zero Data Retention Wins

When you sell to healthcare, finance, or government buyers, your architecture will collide with their vendor risk assessment (VRA) process. Here is the part that kills enterprise deals: data residency in your integration middleware.

Many legacy iPaaS and unified API vendors rely on stateful architectures. They ingest data from the third-party API, store it in their own managed databases to normalize it, and then serve it to your application. This triggers a cascade of compliance liabilities:

  • SOC 2 Confidentiality criteria require you to identify, retain, and dispose of confidential data according to documented policies. Under SOC 2, particularly within the Confidentiality and Availability Trust Services Criteria, having clear data retention policies is a critical expectation. Auditors want to see that your organization has thoughtfully determined how long data should be kept, why it's retained, and how it's securely disposed of when it's no longer needed.
  • HIPAA mandates PHI retention for six years and strict access controls on any system that processes it.
  • Enterprise VRAs will ask exactly where customer payload data is stored, who has access, and whether it can run inside their VPC.

If the integration vendor suffers a breach, your customer's data is exposed. Security teams understand this risk and will block deals if your integration layer retains sensitive payloads.

The alternative is a zero-data-retention architecture. In this model, the integration layer acts entirely as a stateless proxy. Data is fetched from the third-party API, transformed in memory using JSONata expressions, and streamed directly to your application or webhook endpoint. The payload is never written to disk by the middleware.

sequenceDiagram
    participant App as Your Application
    participant Proxy as Zero-Retention Pipeline
    participant API as Third-Party API
    
    App->>Proxy: Execute Sync Job (Declarative)
    Proxy->>API: Fetch Page 1 (OAuth injected in-flight)
    API-->>Proxy: Raw JSON Payload
    Note over Proxy: Transform in-memory<br>via JSONata mapping
    Proxy-->>App: Normalized Data Stream
    Note over Proxy: Payload discarded.<br>No data persisted to disk.

This architecture bypasses the heaviest VRA blockers. Because the integration layer does not store data at rest, the compliance burden shrinks dramatically.

Warning

Trade-off to acknowledge: Zero data retention means you cannot query previously synced data from the middleware layer itself. If your product needs local queryability (e.g., fast search across historical records), you must land data into your own datastore and manage retention there. The benefit is that you control the data lifecycle, not a third-party middleware vendor.

For a complete breakdown of passing these security reviews, see our On-Prem Deployment & Compliance Guide for SaaS Integrations.

Why a Declarative, Zero-Storage Unified API Wins

When you combine the prioritization framework above with a declarative architecture, the economics change fundamentally.

For engineering: Adding a new integration within an existing category takes days instead of weeks. The unified schema already exists. Pagination, auth, and error handling are solved at the platform level. The only work is writing JSON config and JSONata mappings - both data operations that do not require code review, CI/CD pipelines, or deployment.

For sales: Time-to-integration shrinks from "next quarter" to "next sprint." Sales can confidently commit to integration timelines because delivery risk is lower. A new HRIS provider does not require rearchitecting the data pipeline - it reuses the same sync jobs, webhook events, and error handling as every other HRIS provider.

For compliance: Zero data retention eliminates the integration layer as a data sub-processor. Your VRA responses stay clean. SOC 2 audit scope stays narrow. Customer data flows through the pipeline without being written to the middleware's storage.

For ops: Every integration follows the same operational patterns. Monitoring dashboards, alerting rules, and runbook procedures work identically whether you are syncing from Salesforce or a provider you added last week. One code path, one set of behaviors, one set of failure modes.

92% of respondents in the 2024 State of SaaS Integrations report said customers with integrations are less likely to churn. The faster you can ship a requested integration, the faster that deal closes and that customer sticks. A declarative, zero-storage architecture turns integrations from a linear engineering cost into a configuration exercise. The 50th integration is not meaningfully harder than the 5th.

Building Your Operational Runbook for Declarative Syncs

Transitioning to a declarative pipeline requires a structured operational runbook. It is not a design document; it is the step-by-step playbook your team follows to configure, deploy, and maintain data syncs in production without writing custom code.

Step 1: Define Your Unified Schema

Before connecting to any third-party API, define the exact schema your application expects to receive. This isolates your core product from the chaos of third-party data models.

If you are syncing CRM contacts, your application should expect a standardized object regardless of whether the data comes from Salesforce, HubSpot, or Pipedrive.

Your schema should include:

Field Type Description Example
id string Provider-native record ID "003xx000001234"
first_name string Contact first name "Jane"
email_addresses array Email objects with email and is_primary [{"email": "jane@acme.com", "is_primary": true}]
custom_fields object Provider-specific fields not in the common model {"industry__c": "SaaS"}
remote_data object Raw provider response (optional) Full API response
created_at ISO 8601 Record creation timestamp "2024-06-15T10:30:00Z"
updated_at ISO 8601 Record last-modified timestamp "2024-11-20T14:15:00Z"

The custom_fields escape hatch is non-negotiable. Enterprise Salesforce instances can have hundreds of custom fields, and your unified schema will never cover all of them. A good declarative system lets you customize mappings at the environment or even per-account level without changing the base schema. We cover this pattern in depth in our guide on shipping API connectors as data-only operations.

Step 2: Configure Field Mappings with JSONata

With your schema defined, configure the transformation expressions that convert each provider's response format into your common model. JSONata is a functional query and transformation language specifically designed for JSON. It is Turing-complete, side-effect free, and storable as a string in a database.

Instead of writing a JavaScript function to parse a HubSpot response, you write a JSONata expression.

HubSpot Response Mapping Example (YAML Config):

response_mapping: >-
  (
    {
      "id": response.id.$string(),
      "first_name": response.properties.firstname,
      "last_name": response.properties.lastname,
      "email_addresses": [
        response.properties.email ? { "email": response.properties.email, "is_primary": true }
      ]
    }
  )

For more complex mappings, such as advanced CRM contact extraction, a single JSONata expression handles field renaming, type coercion, array filtering, and custom field extraction:

response.{
  "id": Id.$string(),
  "first_name": FirstName,
  "last_name": LastName,
  "email_addresses": [{ "email": Email, "is_primary": true }],
  "phone_numbers": $filter([
    { "number": Phone, "type": "phone" },
    { "number": MobilePhone, "type": "mobile" }
  ], function($v) { $v.number }),
  "created_at": CreatedDate,
  "updated_at": LastModifiedDate,
  "custom_fields": $sift($, function($v, $k) { $k ~> /__c$/i and $boolean($v) })
}

This mapping is executed by the generic pipeline engine. If a customer needs a custom field mapped, you update the JSONata string for their specific environment. It can be versioned, overridden per customer, and hot-swapped. No code deployment is required.

Step 3: Set Up Incremental Sync Cursors

Syncing tens of thousands of records on every job run will exhaust API rate limits instantly. Full syncs are expensive. Once the initial data pull completes, subsequent runs should only fetch records that changed since the last successful run.

In a declarative system, this is typically a binding on a timestamp parameter within your sync job configuration:

{
  "resource": "ticketing/tickets",
  "method": "list",
  "query": {
    "updated_at": { "gt": "{{previous_run_date}}" }
  }
}

The runtime engine automatically tracks previous_run_date as the completion timestamp of the last successful sync run for that account. On the first run, it defaults to epoch (1970-01-01T00:00:00.000Z), effectively performing a full sync.

Document in your runbook:

  • When to trigger a full re-sync: Schema changes, mapping updates, data corruption recovery.
  • How cursors are scoped: Per sync job and per integrated account (not global).
  • What happens if a run fails partway: The cursor should not advance; the next run retries from the same checkpoint.

Step 4: Define Resource Dependencies

Real-world syncs are rarely flat. Comments belong to tickets. Contacts belong to accounts. Your pipeline config needs to express these relationships declaratively:

{
  "resource": "ticketing/comments",
  "method": "list",
  "depends_on": "ticketing/tickets",
  "query": {
    "ticket_id": "{{resources.ticketing.tickets.id}}"
  }
}

This tells the runtime: for each ticket fetched in the ticketing/tickets step, fetch the associated comments. The depends_on field creates the dependency graph; the placeholder syntax dynamically injects parent record fields into child queries. Your runbook should document the dependency tree for each pipeline and the expected execution order.

The Integration PRD: Lifecycle Stories and Acceptance Criteria

Most integration PRDs stop at the happy path: "user connects account, data syncs." That is roughly 20% of the work. The other 80% - token expiry, rate limiting, partial failures, API deprecation - is where integrations break in production and erode customer trust.

Every integration PRD must include lifecycle stories for these six scenarios:

Required Lifecycle Stories

1. Happy Path

  • User authorizes the connection (OAuth or API key)
  • Initial full sync completes and records appear in the product
  • Incremental syncs run on schedule and capture changes

2. Re-authentication

  • The OAuth refresh token is revoked at the provider
  • The system detects the 401, marks the account as needs_reauth
  • The end user receives a notification and re-authorizes
  • Sync resumes from the last successful cursor position

3. Rate Limiting

  • The provider returns HTTP 429 during a large sync
  • The pipeline passes the 429 with normalized Retry-After headers to the consumer
  • The consumer backs off and retries without data loss

4. Pagination Edge Cases

  • A sync runs against an account with 500K+ records
  • The cursor must not advance if a page fetch fails
  • Switching pagination formats between provider API versions does not break existing connections

5. Observability

  • Ops can view sync run status, record counts, and error rates per integrated account
  • Auth failures trigger alerts within 5 minutes
  • No PII appears in logs or metrics

6. Deprecation

  • The provider deprecates an API version
  • The integration config is updated to point to the new endpoints
  • Existing connected accounts continue working without re-authorization

Acceptance Criteria Checklist

Before any integration ships to production, verify:

  • OAuth flow tested with a real provider account (not just sandbox)
  • Full sync completes for an account with 1,000+ records
  • Incremental sync correctly filters by updated_at cursor
  • Rate limit response returns normalized headers to the caller
  • Token refresh works proactively (before expiry) and reactively (after 401)
  • needs_reauth webhook fires when refresh fails
  • Proxy API passthrough works for endpoints not covered by the unified model
  • Custom fields from the provider appear in the custom_fields object
  • remote_data contains the full original API response
  • Error expression extracts meaningful messages from provider-specific error formats

Handling Rate Limits, Retries, and Pagination at Scale

Abstracting API communication sounds great in theory, but the operational realities of third-party APIs still apply. APIs go down, tokens expire, and rate limits are exceeded.

How Declarative Systems Handle Pagination

A resilient declarative engine abstracts pagination entirely. The pipeline config specifies the pagination strategy (cursor-based, offset-based, page-number, link header) and the response field that contains the next cursor. The runtime handles the loop. You never write while (cursor) { ... } again.

Different providers use wildly different pagination schemes. HubSpot uses cursor-based pagination with an after parameter. Salesforce uses SOQL query locators. Zendesk uses cursor-based page [after]. A good declarative system encodes these differences in the integration config, not in your application code, streaming the normalized data back to your application in a single continuous data stream.

Rate Limits: You Own the Backoff

Many unified API platforms claim to "handle" rate limits automatically by absorbing the errors and silently retrying on their own servers. This is a dangerous anti-pattern for enterprise systems. Silent retries mask underlying architectural flaws, cause unpredictable latency spikes, and often result in the middleware storing your data in a queue while it waits for the rate limit window to reset—violating zero-data-retention requirements.

A highly resilient architecture takes a radically honest approach: pass rate limit errors directly to the caller, but normalize the metadata.

When an upstream API returns HTTP 429 (Too Many Requests), your pipeline should pass that 429 error directly back to your application. However, because every API formats rate limit headers differently, the pipeline normalizes upstream rate limit info into standardized headers per the IETF spec:

  • ratelimit-limit: The maximum number of requests permitted in the current window.
  • ratelimit-remaining: The number of requests remaining in the current window.
  • ratelimit-reset: The time at which the rate limit window resets (in UTC epoch seconds).

Your runbook should specify how your application reads these headers:

  • Backoff strategy: Implement exponential backoff with jitter (typically base * 2^attempt + random(0, 1000)ms) in your application's state machine.
  • Max retries: 3-5 for transient errors, 0 for auth failures.
  • Alerting thresholds: Notify ops when a sync job hits rate limits more than N times in a single run.

Error Handling Modes

Declarative sync pipelines typically offer two error handling strategies that must be documented in your runbook:

  1. Ignore and continue (default): Log the error, emit an error event, and proceed to the next resource. This is the right default for large syncs where a single 404 on one record shouldn't abort 10,000 successful ones.
  2. Fail fast: Halt the entire pipeline on the first error. Use this for critical syncs (like financial reconciliation) where partial data is worse than no data.

Deploying and Monitoring Compliant Integrations

Once the declarative syncs are configured, the final phase of the runbook is operational monitoring and credential management.

Proactive OAuth Token Management

OAuth access tokens typically expire after 30 to 60 minutes. If a long-running sync job is executing when a token expires, the job will fail. Waiting for a 401 Unauthorized error before attempting a refresh is a reactive, fragile strategy.

Your operational runbook must mandate proactive token renewal. A well-designed system schedules token renewal ahead of expiry. For example, immediately after an OAuth token is acquired, the platform schedules work ahead of token expiry to fire an alarm 60 to 180 seconds before the token's exact expiration time. When the alarm fires, the system proactively negotiates a new access token.

Handling Concurrency with Mutex Locks

In enterprise environments, multiple sync jobs, webhooks, and user requests might attempt to use the same integrated account simultaneously. If the token expires, you risk a race condition where five concurrent processes all attempt to refresh the token at the exact same millisecond. This often triggers fraud-detection mechanisms at the upstream provider, resulting in a revoked refresh token and a disconnected customer.

To prevent this, the token refresh logic must be protected by a distributed mutex lock. When the first process detects an expired token (or the proactive alarm fires), it acquires a lock for that specific account ID. Subsequent concurrent requests see the lock and await the resolution of the first operation. Once the new token is acquired, the lock is released, and all pending requests proceed using the fresh credentials.

Your runbook should also cover retry policies for refresh failures. Retryable errors (5xx) get automatic multi-hour backoff; non-retryable errors (401 invalid_grant) stop retrying immediately, mark the account as needs_reauth, and surface a re-authentication prompt to the end user. For a deep dive on this topic, see our article on handling OAuth token refresh failures in production.

Monitoring Without Logging Sensitive Payloads

Enterprise compliance requires strict auditability. Security teams need to know exactly who connected an account, when a sync job ran, and what errors occurred. However, you cannot log the actual data payloads, or you violate the zero-data-retention policy.

Compliance-aware monitoring means your runbook must specify that the platform only logs metadata:

  • Sync job run ID, start time, end time, and status.
  • Target API endpoint and HTTP method.
  • Record counts per resource (fetched, emitted, errored).
  • Normalized error messages and HTTP status codes (extracted via JSONata error expressions, stripping out response bodies containing PII).
  • Token refresh events and rate limit occurrences.

If an API returns a 400 Bad Request because a specific email address is malformed, the error expression should extract the structural reason for the failure without logging the actual email address into your observability stack.

Webhook Delivery for Real-Time Sync Events

Declarative sync pipelines emit structured webhook events that your application can consume in real-time:

Event When Contains
sync_job_run:started Pipeline begins execution Job ID, account ID, timestamp
sync_job_run:record Each record is fetched and normalized Unified record data
sync_job_run:record_error A single record fetch fails Error details, resource, HTTP status
sync_job_run:completed Pipeline finishes successfully Summary counts, duration
sync_job_run:failed Pipeline aborts (fail-fast mode) Error details
sync_job_run:rate_limited Upstream returns 429 Provider, resource, retry-after

Your application receives these events, writes the record data to your own datastore, and manages retention according to your own policies. The integration middleware never persists the data.

sequenceDiagram
    participant App as Your Application
    participant Engine as Sync Engine
    participant API as Third-Party API

    App->>Engine: Trigger sync run<br>(job_id, account_id)
    Engine->>API: Fetch page 1 (with auth, pagination)
    API-->>Engine: Response (records + cursor)
    Engine->>App: Webhook: sync_job_run:record<br>(normalized data)
    Engine->>API: Fetch page 2
    API-->>Engine: Response (records + cursor)
    Engine->>App: Webhook: sync_job_run:record<br>(normalized data)
    Engine->>App: Webhook: sync_job_run:completed
    Note over Engine: No data persisted<br>in middleware

Quick Ship Playbook: From Audit to Production

This is the condensed, action-oriented version of everything above. Follow these six phases to go from "sales asked for it" to "customers are using it."

Phase 1: Audit (Day 1)

Pull every integration mention from your CRM:

  • Closed-lost deal notes citing missing integrations
  • Feature requests from existing customers
  • Competitive loss reports where integrations were a factor

Compile into a single ranked list using the prioritization framework from the earlier section.

Phase 2: Rank and Select (Day 2)

Score each integration across revenue impact, request frequency, and technical overlap. Group candidates by unified model category (CRM, HRIS, ATS, Accounting, Ticketing). Pick the top-scored integration within a category you already support - this minimizes schema design work and maximizes reuse.

Phase 3: Define the Unified Model (Days 3-5)

If your target category already has a unified schema, skip this step. If not, define the schema:

  • Identify the 5-7 core resources (e.g., for HRIS: employees, departments, locations, time-off requests)
  • Define the canonical fields for each resource
  • Include custom_fields and remote_data escape hatches for provider-specific data

Phase 4: Run the POC (Days 5-10)

Set up the integration config and measure time-to-first-sync:

  1. Configure the provider's base URL, auth scheme, and resource endpoints as JSON config
  2. Write JSONata response mappings for each resource
  3. Write JSONata query mappings for filters and search
  4. Connect a real account and verify data flows end-to-end

If time-to-first-sync exceeds 3 days for a provider within an existing category, investigate what is blocking you.

Phase 5: Ship (Days 10-15)

  • Enable the integration in production
  • Verify proactive token refresh works
  • Confirm sync job webhooks fire correctly
  • Document any provider-specific quirks in your internal runbook

Phase 6: Escape Hatch (Ongoing)

Not every API endpoint fits neatly into a unified model. A Proxy API is your escape hatch - it lets your application make arbitrary calls through the connected account's credentials without requiring a unified mapping.

Document which endpoints are covered by the unified model and which require proxy passthrough. As usage patterns emerge, promote frequently-used proxy calls into proper unified model resources.

Milestone Existing Category New Category
POC (first sync working) 1-3 days 5-10 days
Shippable connector (core CRUD) 3-5 days 10-15 days
Production-hardened (edge cases, error handling) 1-2 weeks 2-3 weeks
Iterate (custom fields, related resources, webhooks) Ongoing Ongoing

These timelines assume a declarative pipeline architecture. With hand-rolled scripts, multiply each estimate by 3-5x.

Examples: Tenant Overrides, Passthrough, and Pagination

Tenant Override: Custom Field Mapping for a Single Customer

Enterprise customers customize their CRM instances heavily. One customer's Salesforce org might have a Territory__c field that needs to appear in the unified contact schema. Instead of modifying the base mapping for all customers, a declarative system lets you apply an override at the account level:

{
  "unified_model_override": {
    "crm": {
      "contacts": {
        "list": {
          "response_mapping": "response.{ \"id\": Id.$string(), \"first_name\": FirstName, \"last_name\": LastName, \"territory\": Territory__c, \"custom_fields\": $sift($, function($v, $k) { $k ~> /__c$/i and $boolean($v) }) }"
        }
      }
    }
  }
}

This override is stored on the integrated account record. It deep-merges with the base mapping, so all standard fields continue working. Only this customer's account sees the territory field in the response. No code change, no deployment, no risk to other customers.

Passthrough via Proxy API

When a customer needs access to a provider-specific endpoint that the unified model does not cover - like HubSpot's deal pipeline stage history - use the Proxy API:

GET /proxy/deal-pipeline-stages?integrated_account_id=abc123

The request passes through the connected account's authentication and rate limit handling, but skips the unified mapping layer entirely. The raw provider response comes back as-is. Your product can consume unified data for standard operations and fall back to the proxy for provider-specific features, all through the same authenticated connection.

Pagination Handling in Config

Different providers paginate differently. In a declarative system, pagination is a configuration property, not code:

Cursor-based (HubSpot-style):

{
  "pagination": {
    "format": "cursor",
    "config": {
      "cursor_field": "paging.next.after",
      "cursor_param": "after",
      "page_size_param": "limit",
      "default_page_size": 100
    }
  }
}

Offset-based (Jira-style):

{
  "pagination": {
    "format": "offset",
    "config": {
      "offset_param": "startAt",
      "page_size_param": "maxResults",
      "default_page_size": 50,
      "total_field": "total"
    }
  }
}

The runtime engine reads the pagination format from config and applies the correct strategy. Your application receives a continuous stream of results with a next_cursor token - regardless of how the upstream provider actually paginates.

Zero-Storage Sync Configuration

A complete sync job configuration that fetches CRM contacts incrementally with zero data retention in the middleware:

{
  "integrated_account_id": "abc123",
  "sync_config": {
    "resources": [
      {
        "resource": "crm/contacts",
        "method": "list",
        "query": {
          "updated_at": { "gt": "{{previous_run_date}}" }
        }
      }
    ]
  },
  "trigger": {
    "type": "cron",
    "expression": "0 */6 * * *"
  },
  "webhook_url": "https://your-app.com/webhooks/sync"
}

Every fetched record is transformed in memory via JSONata, emitted to your webhook endpoint, and discarded. The middleware stores only the sync cursor and job metadata - never the customer's data.

Your Runbook Checklist

Before deploying any declarative sync pipeline to production, your runbook should have documented answers to each of the following:

  • Unified schema defined for each resource category (CRM, HRIS, ticketing, etc.)
  • Field mappings configured and tested for each provider you support
  • Incremental sync cursors enabled with documented reset procedures
  • Error handling mode specified per pipeline (ignore vs. fail-fast)
  • Rate limit backoff strategy implemented in your webhook consumer
  • OAuth token refresh monitored with alerting on needs_reauth events
  • Webhook delivery endpoint deployed with idempotent record processing
  • Monitoring dashboards showing sync run status, error rates, and latency
  • Compliance documentation confirming zero data retention in the middleware layer
  • Scheduled triggers configured (cron expressions for recurring syncs)
  • Full re-sync procedure documented for disaster recovery scenarios

Scaling Integrations Without Scaling Headcount

The transition from imperative scripts to declarative data pipelines is not just a technical refactor; it is a strategic necessity for B2B SaaS companies moving upmarket.

Building all of this infrastructure from scratch—a generic execution engine, declarative configs for every provider, field mapping with a transformation language, pagination abstraction, token lifecycle management, and webhook delivery—would consume your engineering team for quarters. That is the practical argument for using a unified API platform that already implements this architecture.

Truto uses a generic execution engine driven entirely by JSON configuration and JSONata expressions. Adding a new provider is a data operation, not a code deployment. The platform acts as a pass-through proxy with zero data retention, which means your SOC 2 audit scope stays clean.

Enterprise buyers will not compromise on security, and they will not wait six months for your engineering team to build custom connectors. By adopting a zero-data-retention architecture, defining integrations purely as configuration data, and standardizing rate limit and authentication handling, you can scale your integration catalog exponentially.

You eliminate the maintenance burden of custom code, pass strict vendor risk assessments with ease, and allow your engineering team to focus on building your core product.

FAQ

What is a declarative data sync pipeline?
A declarative data sync pipeline defines what data to fetch and how to map it using configuration (JSON/YAML), rather than writing imperative code for each provider. The generic execution engine automatically handles pagination, auth, and error handling.
How does zero data retention help with SOC 2 compliance?
When your integration middleware processes data in-flight without caching customer payloads, it eliminates the need for data retention policies and breach notification obligations for that layer, drastically shrinking your SOC 2 audit scope.
How should a unified API handle rate limits?
Instead of silently absorbing errors in a black-box queue, a reliable API passes HTTP 429 errors directly to the caller while normalizing upstream rate limit data into standard IETF headers for predictable, application-controlled backoff.
What is incremental syncing in a data pipeline?
Incremental syncing uses a cursor (typically the last successful run's timestamp) to fetch only records that changed since the previous sync. This prevents rate limit exhaustion and dramatically reduces processing time compared to full re-syncs.
Why do stateful integration platforms fail enterprise security reviews?
Platforms that cache third-party payload data act as sub-processors. This increases vendor risk and complicates SOC 2 or HIPAA compliance for enterprise procurement teams, often leading to blocked deals.

More from our Blog