Skip to content

Developer Tutorial: How to Build JSONata Mappings for API Integrations

Learn how to replace hardcoded API adapters with declarative JSONata mappings and build ETL-free data sync pipelines. A step-by-step guide covering field mapping, transforms, incremental checkpointing, and error normalization.

Sidharth Verma Sidharth Verma · · 22 min read
Developer Tutorial: How to Build JSONata Mappings for API Integrations

If you are an engineering leader or product manager at a B2B SaaS company, you already know that building integrations is a massive financial drain. If you've ever maintained a mapHubSpotContact() function next to a mapSalesforceContact() function next to twenty more, you already know where this is going. Your team likely spends weeks writing custom code to connect with third-party APIs, only to spend the rest of the year maintaining those connections when vendors deprecate endpoints or change their pagination strategies.

Hardcoded per-vendor adapters do not scale. The reason most engineering leaders eventually rip them out is that JSONata - a declarative, JSON-native transformation language - lets you express the same mappings as data instead of code. When teams ask how to create a step-by-step developer tutorial for JSONata mappings, they are usually trying to escape this maintenance trap. They want to stop writing brittle API adapters in Python or Node.js and start treating API integration as a data transformation problem.

This tutorial walks through how to write production-grade JSONata mappings for response transformation, custom object handling, query translation, and error normalization, with concrete, working code samples you can lift straight into a connector.

The goal is not to teach you JSONata syntax from scratch (the official docs do that well). The goal is to show you the patterns that actually hold up when a vendor ships a breaking change at 2am on a Friday.

Why Hardcoded API Integrations Fail at Scale

Most engineering teams start building integrations using the strategy pattern. They define a common interface in their application and write a separate adapter class for every third-party API. You write a mapHubSpotContact() function. It works. Then you add Salesforce, and you write mapSalesforceContact(). Then Pipedrive, Zoho, and Close.

The strategy pattern looks fine when you have three integrations. It rots once you cross ten. Every connector ends up with its own adapter class, its own field-name translation, its own pagination quirks, its own date format, and its own custom error envelope. The codebase becomes littered with conditional logic.

// The integration maintenance nightmare
function normalizeContact(provider, rawData) {
  if (provider === 'hubspot') {
    return {
      id: rawData.id,
      first_name: rawData.properties.firstname,
      last_name: rawData.properties.lastname
    };
  } else if (provider === 'salesforce') {
    return {
      id: rawData.Id,
      first_name: rawData.FirstName,
      last_name: rawData.LastName
    };
  }
  // 50 more else-if blocks follow...
}

The branching logic quietly metastasizes across the codebase until a one-line vendor schema change becomes a two-week task that touches six files. The cost shows up in two places: build and maintain.

  • Build cost: A Forrester Total Economic Impact study commissioned by MuleSoft found that, before automating with a platform, organizations spent roughly 168 hours of developer time per API integration on average. That is a full month of senior engineering effort for one connector you haven't even shipped a feature on yet.
  • Maintain cost: Vendors deprecate endpoints, rotate auth schemes, and ship undocumented field changes constantly. Industry estimates put the annual cost of maintaining a single custom integration between $10,000 and $100,000 depending on complexity, instability, and change frequency.

There's a deeper architectural problem too. Code-per-integration architectures grow maintenance burden linearly with the number of integrations. A bug fix in your Salesforce handler does not help the HubSpot handler. A pagination improvement for Pipedrive does not propagate to Zoho. Every connector is its own little fiefdom that has to be tested, deployed, and on-call'd separately.

API schema normalization is arguably the hardest problem in B2B product integrations because software vendors fundamentally disagree on how to model reality. In HubSpot, a Contact is a relatively flat object. In Salesforce, the schema is a sprawling web of standard and custom objects. The escape hatch is to stop treating integrations as code and start treating them as configuration.

What Is JSONata and Why It's the Right Tool

JSONata is a declarative, open-source query and transformation language purpose-built for JSON data, inspired by the location-path semantics of XPath 3.1. It's a declarative functional language based on the map/filter/reduce paradigm, exposed through a lightweight syntax that lets you focus on the intention of the query rather than the programming constructs that control evaluation.

Unlike jq, which is highly optimized for command-line parsing but difficult to read, or MuleSoft's DataWeave, which is proprietary, JSONata is open-source and highly expressive. In practical terms, that means a JSONata expression is a string. You can store it in a database column, version-control it, hot-swap it without redeploying, and evaluate it at runtime against any JSON payload.

Why this matters for API integrations:

  • No deploys to ship a mapping change. A new field shows up in HubSpot? Update one database row.
  • Pure functions, no side effects. Easy to test, easy to reason about, impossible to leak state between requests.
  • Turing-complete. Conditionals, recursion, string manipulation, array transforms, and custom functions - if you can describe a transformation, you can express it entirely as string-based configuration.
  • Battle-tested at enterprise scale. IBM z/OS Connect Designer uses JSONata as an open source expression language for querying and transforming JSON data, and AWS Step Functions exposes JSONata function libraries for String, Numeric, Aggregation, Boolean, Array, Object, Date/Time, and High Order operations. Stedi also positions JSONata as the core transformation engine for EDI and B2B data exchanges. This is not a niche tool.

By treating API integration as a data transformation problem rather than a software engineering problem, you can ship connectors faster. The rest of this guide uses the patterns Truto has battle-tested across 100+ CRM, HRIS, ATS, accounting, ticketing, and ERP integrations. The examples are generic JSONata - they work in any runtime that embeds the reference implementation.

flowchart LR
  A[Unified API Request] --> B[Query Mapping<br>JSONata]
  B --> C[Third-Party API Call]
  C --> D[Response Mapping<br>JSONata]
  D --> E[Error Expression<br>JSONata]
  E --> F[Unified Response]
  
  classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px;

Step 1: Basic Response Mapping with JSONata

Let us start with a foundational example: mapping a flat third-party API response to a unified schema.

Assume your internal application expects a unified contact object. Salesforce returns data in a flat, PascalCase format with a couple of nested address blocks. A raw Salesforce response looks like this:

{
  "Id": "003xx000004TmiQ",
  "FirstName": "John",
  "LastName": "Doe",
  "Title": "VP Engineering",
  "Email": "john@example.com",
  "Phone": "+1-415-555-0101",
  "MobilePhone": "+1-415-555-0102",
  "MailingStreet": "123 Market St",
  "MailingCity": "San Francisco",
  "MailingState": "CA",
  "MailingPostalCode": "94103",
  "CreatedDate": "2024-01-15T10:30:00Z",
  "LastModifiedDate": "2024-06-20T14:15:00Z"
}

Instead of writing JavaScript to map this, we write a JSONata expression. JSONata evaluates the input JSON and constructs a new object based on the structure you define:

response.{
  "id": Id,
  "first_name": FirstName,
  "last_name": LastName,
  "name": $join($removeEmptyItems([FirstName, LastName]), " "),
  "title": Title,
  "email_addresses": [{ "email": Email, "is_primary": true }],
  "phone_numbers": [
    Phone ? { "number": Phone, "type": "phone" },
    MobilePhone ? { "number": MobilePhone, "type": "mobile" }
  ],
  "addresses": [{
    "street_1": MailingStreet,
    "city": MailingCity,
    "state": MailingState,
    "postal_code": MailingPostalCode
  }],
  "created_at": CreatedDate,
  "updated_at": LastModifiedDate
}

A few things to notice that beginners often miss:

  • Declarative structure: We are simply describing the desired output shape and pointing to the input fields.
  • Conditional evaluation: The ? ternary without an else evaluates to undefined, which JSONata silently drops from output objects. That's how you build conditional fields without if statements.
  • Derived fields: $join($removeEmptyItems([FirstName, LastName]), " ") synthesizes a derived field. The $removeEmptyItems ensures we do not get a leading or trailing space if one of the names is missing. You're not limited to 1:1 mappings.
  • Configuration over code: The expression is a single string. Store it next to the integration config in your database. To support a new contact field tomorrow, you edit one row.

Compare this to the equivalent in a per-integration adapter file: ~30 lines of branching JavaScript that needs a code review, a CI run, and a deploy every time the schema shifts.

Step 2: Handling Custom Objects and Nested Arrays

Basic key-value mapping is easy. The real challenge in B2B integrations is handling custom fields and nested, polymorphic arrays. Custom objects are the default state of enterprise SaaS deployments, not the exception.

This is where naive mappers collapse. Real enterprise APIs nest data inside properties blobs, prefix custom fields with sigils like __c, or return polymorphic arrays. Let us look at HubSpot. HubSpot nests all contact data inside a properties object, and any field that is not a default HubSpot property is considered a custom field.

Here is a raw HubSpot response:

{
  "id": "12345",
  "properties": {
    "firstname": "John",
    "lastname": "Doe",
    "email": "john@example.com",
    "hs_additional_emails": "john.alt@example.com;j.doe@example.com",
    "phone": "+1-415-555-0101",
    "hs_whatsapp_phone_number": "+1-415-555-0199",
    "deal_stage_custom": "negotiation",
    "acv_estimate__c": 45000
  }
}

The mapping needs to: pull nested standard properties, split a semicolon-delimited string into an array, normalize phone types, and dynamically capture any non-default property (like deal_stage_custom and acv_estimate__c) into a custom_fields blob.

Here is the JSONata expression to handle this complexity:

(
  $defaultProps := [
    "firstname", "lastname", "email", "hs_additional_emails", 
    "phone", "hs_whatsapp_phone_number"
  ];
  $customKeys := $difference($keys(response.properties), $defaultProps);
  
  {
    "id": response.id,
    "first_name": response.properties.firstname,
    "last_name": response.properties.lastname,
    "email_addresses": $append(
      response.properties.email ? [{ "email": response.properties.email, "is_primary": true }] : [],
      response.properties.hs_additional_emails
        ? response.properties.hs_additional_emails.$split(";").{ "email": $ }
        : []
    ),
    "phone_numbers": [
      response.properties.phone
        ? { "number": response.properties.phone, "type": "phone" },
      response.properties.hs_whatsapp_phone_number
        ? { "number": response.properties.hs_whatsapp_phone_number, "type": "whatsapp" }
    ],
    "custom_fields": response.properties.$sift(function($v, $k) { $k in $customKeys })
  }
)

Let us break down exactly what this JSONata pattern unlocks:

  1. Dynamic Custom-Field Discovery: We define $defaultProps as an array of the standard fields we explicitly know about. We use $keys() to get every key in the HubSpot properties object, then $difference() to find the keys that are not in our default list. This gives you dynamic custom-field discovery without anyone hand-maintaining a list.
  2. Dynamic Object Filtering: $sift() is the JSONata equivalent of Object.fromEntries(Object.entries(...).filter(...)) - it iterates over the properties object, keeping only the key-value pairs where the key exists in our $customKeys array.
  3. Array Manipulation: The split-and-map pattern (hs_additional_emails.$split(";").{ "email": $ }) turns a delimited string into an array of objects in one line. We use $append to safely merge the primary email and additional emails.

This expression handles an infinite number of custom fields dynamically. If a HubSpot administrator adds 50 new custom fields tomorrow, this JSONata expression will automatically collect them and place them in the custom_fields object. Your integration code never has to change.

For a deeper walkthrough of enterprise edge cases - Salesforce __c fields, picklist normalization, lookup relationships - see our step-by-step guide to mapping custom objects with JSONata.

Tip

Keep your default-property list as a JSONata variable at the top of the expression. When a vendor adds a new "standard" field, you update one line, and your custom-field detection logic still works correctly.

Step 3: Query and Request Body Translation

Response mapping is the easy half. Schema normalization is not just about parsing responses; you also have to translate outbound requests. The harder half is translating a unified request - say, ?status=active&updated_after=2025-01-01 - into whatever query dialect the third-party API speaks.

CRMs are the worst offenders here. HubSpot expects queries as a POST request to a search endpoint using a complex filterGroups array. Salesforce expects a GET request with a SOQL WHERE clause. Pipedrive wants flat query params. You can use JSONata to translate the unified query parameters into the exact syntax the provider requires.

Here's a unified filter object your internal application might generate:

{
  "first_name": "John",
  "email_addresses": [{ "email": "john@example.com" }],
  "updated_after": "2025-01-01T00:00:00Z"
}

Translating to HubSpot's filterGroups syntax in a POST body using JSONata:

rawQuery.{
  "filterGroups": [{
    "filters": [
      first_name ? {
        "propertyName": "firstname",
        "operator": "CONTAINS_TOKEN",
        "value": first_name
      },
      email_addresses ? {
        "propertyName": "email",
        "operator": "IN",
        "values": email_addresses.email
      },
      updated_after ? {
        "propertyName": "lastmodifieddate",
        "operator": "GTE",
        "value": updated_after
      }
    ]
  }]
}

Translating the same unified filter to Salesforce SOQL as a query parameter requires entirely different logic, but it is still just a JSONata expression evaluating the same unified input:

(
  $clauses := [
    first_name ? "FirstName LIKE '%" & first_name & "% कढ़ाई'",
    email_addresses ? "Email IN ('" &
      $join(email_addresses.email, "','") & "')",
    updated_after ? "LastModifiedDate >= " & updated_after
  ];
  {
    "q": "SELECT Id, FirstName, LastName, Email FROM Contact" &
      ($count($clauses) > 0 ? " WHERE " & $join($clauses, " AND ") : "")
  }
)

The caller's request is identical. JSONata allows you to chain ternary operators to handle multiple optional query parameters. You build highly dynamic search queries by checking for the existence of unified fields and appending them to a filter array only if they are present in the request. When Salesforce shifts SOQL syntax or HubSpot adds a new operator, you edit a string - you don't ship a release.

Warning

Handling complex conditionals and SOQL Injection SOQL injection is real. If your unified filter values come from end users, sanitize them before concatenation. The JSONata expression itself doesn't escape strings - that's your application's job. A safer pattern is to use bound parameters wherever the third-party API supports them, and reserve string concatenation for vendor-controlled enum values.

Step 4: Normalizing API Errors with JSONata

API downtime and unhandled errors can cause severe financial losses, making resilient error handling critical. Industry estimates put the cost of a single day of API disruption between $10,000 and $500,000 depending on how core the integration is to revenue.

Vendor error envelopes are a special kind of chaos. Some use standard HTTP status codes with JSON error bodies. Others return custom XML structures. Some APIs return a 200 OK status for every request and embed the error inside the response body. Without normalization, your retry logic and re-auth flows can't tell signal from noise.

If you do not normalize these errors, your application will assume a 200 OK from Slack is a successful request, even if the payload says {"ok": false, "error": "invalid_auth"}.

The pattern: write a JSONata expression whose job is to inspect the raw response and produce a structured { status, message } object. If the expression returns undefined, the standard HTTP status check runs. Otherwise, your runtime throws an error with the normalized status code.

Slack: 200 OK with error body

$not(data.ok) ? {
  "status": $mapValues(data.error, {
    "invalid_auth": 401,
    "token_expired": 401,
    "missing_scope": 403,
    "channel_not_found": 404,
    "ratelimited": 429,
    "internal_error": 500
  }),
  "message": $mapValues(data.error, {
    "invalid_auth": "Authentication failed.",
    "token_expired": "OAuth token has expired.",
    "ratelimited": "Rate limit exceeded."
  })
}

In this example, $not(data.ok) triggers the error evaluation only when Slack explicitly signals a failure. The $mapValues function acts as a lookup table, translating Slack's proprietary string codes into standard HTTP status codes. If the expression evaluates to a 401 status, your system knows immediately that the token is invalid and can trigger an automatic re-authentication flow.

Freshdesk: Correcting Semantic Errors (429 that's actually a 402)

Sometimes APIs return semantically incorrect HTTP status codes. For example, Freshdesk returns a 429 Too Many Requests when a customer's API plan does not include API access. This is not a rate limit; it is a payment issue. You can use JSONata to inspect the headers and correct the status code:

status = 429 and $not($exists(headers.`retry-after`)) ? {
  "status": 402,
  "message": "API access is not included in this plan."
}

A real rate limit returns a 429 with a retry-after header. Without that header, the JSONata expression remaps the error to 402 Payment Required, preventing your system from blindly applying exponential backoff to an endpoint that will never succeed.

GraphQL pattern (e.g. Linear, Fireflies):

GraphQL APIs bury errors in data.errors [] with vendor-specific codes:

$exists(data.errors) ? data.errors[0].{
  "status": $mapValues(extensions.code, {
    "AUTHENTICATION_ERROR": 401,
    "FORBIDDEN": 403,
    "NOT_FOUND": 404
  }),
  "message": message
}

A normalized error layer is what lets your retry, alerting, and re-auth systems actually do their jobs - rather than firing pager alerts because Slack returned a 200 your code thought meant success.

Info

A unified API platform should pass real HTTP 429 errors through to the caller and surface upstream rate-limit headers in a standardized form - typically the IETF ratelimit-limit, ratelimit-remaining, and ratelimit-reset headers. The caller owns retry and backoff strategy, because only the caller knows what the right behavior is for their workload. Don't trust a platform that silently swallows 429s.

Step 5: Declarative Data Sync Pipelines Without ETL Tools

The previous steps showed how JSONata handles individual request-response cycles. But most B2B integrations need more than one-off API calls - they need recurring data pipelines that pull records from third-party APIs, transform them, and deliver them to your data stores. This is where engineering teams typically reach for heavyweight ETL tools like Airflow, Fivetran, or custom cron-based scripts.

The cost of that choice is real. Custom ETL pipelines carry $50,000 to $500,000+ in initial development costs, with ongoing maintenance that typically runs 20-30% of the initial build annually. Even self-hosted open-source ETL tools require $500 to $2,000/month in infrastructure plus 10-40 hours/month of engineering maintenance.

There's a lighter path. The same declarative, JSONata-driven approach that handles field mapping can drive entire sync pipelines - no DAGs, no orchestrator infrastructure, no ETL framework. Define your entire data pipeline as a JSON manifest. Declare which resources to fetch, how they depend on each other, what transformations to apply, and where to deliver the results. The runtime handles pagination, checkpointing, error recovery, and webhook delivery. You never write pipeline orchestration code.

The Complete Sync Job Manifest (Annotated)

Here is a complete, production-ready sync job definition that pulls users, contacts (filtered to recently changed), tickets, and per-ticket comments from Zendesk. Every field is annotated below:

{
  "integration_name": "zendesk",
  "args_schema": {
    "ticket_sync_start_date": {
      "type": "string",
      "format": "date-time"
    }
  },
  "resources": [
    {
      "resource": "ticketing/users",
      "method": "list"
    },
    {
      "name": "all-contacts",
      "resource": "ticketing/contacts",
      "method": "list",
      "persist": false
    },
    {
      "name": "filtered-contacts",
      "type": "transform",
      "config": {
        "expression": "resources.ticketing.contacts[updated_at >= %.%.%.previous_run_date]"
      },
      "depends_on": "all-contacts",
      "persist": true
    },
    {
      "resource": "ticketing/tickets",
      "method": "list",
      "query": {
        "updated_at": {
          "gt": "{{args.ticket_sync_start_date|previous_run_date}}"
        }
      }
    },
    {
      "resource": "ticketing/comments",
      "method": "list",
      "depends_on": "ticketing/tickets",
      "query": {
        "ticket_id": "{{resources.ticketing.tickets.id}}"
      }
    }
  ]
}
Field Purpose
integration_name Identifies the third-party API. The runtime loads the corresponding integration config (base URL, auth, pagination) from this identifier.
args_schema JSON Schema defining runtime arguments. Here, ticket_sync_start_date is an optional datetime override for the initial sync window.
resource The unified API resource path. Format: unified_api_name/resource_name. The runtime resolves this to the provider's actual endpoint using the integration config.
method list for paginated collection fetches, get for single-record retrieval.
depends_on Creates an execution dependency. ticketing/comments won't execute until ticketing/tickets completes. For each ticket returned, the runtime fires a separate comments request with the ticket's ID.
query Query parameters passed to the API. Supports placeholders (double-curly syntax) that resolve at runtime - including references to parent resource fields and runtime arguments.
name Required identifier when a resource is referenced by a transform or spool node via depends_on.
type: "transform" Declares this node as a JSONata transform rather than an API call. The config.expression field holds the JSONata expression.
persist Controls whether records from this node are delivered to your webhook. Set to false on raw fetches and true on filtered/transformed output to avoid duplicate delivery.

The placeholder {{args.ticket_sync_start_date|previous_run_date}} uses conditional fallback syntax: use the runtime argument if provided, otherwise fall back to the per-tenant checkpoint. This lets you override the sync window for initial loads while defaulting to incremental behavior on recurring runs.

JSONata Transform Nodes for Provider Data

Transform nodes sit between the raw API fetch and webhook delivery, letting you reshape, filter, or enrich data using JSONata - without modifying any upstream mapping configuration.

Filtering stale records when the API lacks server-side filtering:

Some APIs don't support updated_at filters natively. You can fetch everything and filter client-side with a transform node:

{
  "name": "all-contacts",
  "resource": "ticketing/contacts",
  "method": "list",
  "persist": false
},
{
  "name": "recent-contacts",
  "type": "transform",
  "config": {
    "expression": "resources.ticketing.contacts[updated_at >= %.%.%.previous_run_date]"
  },
  "depends_on": "all-contacts",
  "persist": true
}

The expression resources.ticketing.contacts [updated_at >= %.%.%.previous_run_date] is a JSONata filter predicate. It accesses the fetched contacts array through the resources context object and keeps only records where updated_at is on or after the last successful sync timestamp. The %.%.% syntax navigates up the context hierarchy to access the root-level previous_run_date binding.

Combining paginated content into a single payload:

For APIs that split a logical document across paginated blocks (like Notion page content), you can use spool nodes to accumulate all pages, then a transform node to merge them:

{
  "name": "page-blocks",
  "resource": "knowledge-base/page-content",
  "method": "list",
  "query": { "page": { "id": "{{args.page_id}}" } },
  "persist": false
},
{
  "name": "all-blocks",
  "type": "spool",
  "depends_on": "page-blocks"
},
{
  "name": "merged-content",
  "type": "transform",
  "config": {
    "expression": "$blob($reduce(resources.`knowledge-base`.`page-content`, function($acc, $v) { $acc & $v.body.content }, ''), { \"type\": \"text/markdown\" })"
  },
  "depends_on": "all-blocks",
  "persist": true
}

The spool node accumulates all paginated results into a single dataset. The downstream transform node then uses $reduce to concatenate every block's content into one markdown string, wrapped in a $blob() call that tags the output with a MIME type. The webhook receives a single event with the complete document - not hundreds of individual block events.

Tip

If a resource name contains hyphens (like knowledge-base), wrap it in backticks in your JSONata expression: resources.`knowledge-base`.`page-content`. Underscored names don't need this treatment.

Per-Tenant Checkpoint: Incremental Sync Without State Management

Every sync pipeline needs to answer one question: "Where did I leave off?" Traditional ETL pipelines force you to manage checkpoint state yourself - storing cursors in a key-value store, tracking high-water marks in a database table, handling clock skew between services.

Declarative sync jobs handle this automatically through a per-tenant checkpoint called previous_run_date. Here's how the lifecycle works:

stateDiagram-v2
  [*] --> FirstRun: Sync Job created
  FirstRun --> Running: previous_run_date =<br>1970-01-01T00:00:00.000Z
  Running --> Completed: All resources fetched successfully
  Completed --> NextRun: previous_run_date =<br>completion timestamp
  NextRun --> Running: Only fetch records<br>updated since checkpoint
  Running --> Failed: Error during sync
  Failed --> NextRun: previous_run_date unchanged<br>(retries from last good state)

Key behaviors:

  • Initial value: 1970-01-01T00:00:00.000Z - effectively "fetch everything" on the first run.
  • Updated on success only. If a sync run fails, previous_run_date stays at the last successful completion timestamp. The next run automatically retries the full window.
  • Scoped per sync job + per integrated account. Tenant A's Zendesk checkpoint is completely independent of Tenant B's. You never have cross-tenant state leakage.
  • Overridable at runtime. Pass "ignore_previous_run": true in the sync job run request to force a full re-sync. This is your backfill command - useful for data recovery, schema migrations, or onboarding a new customer who needs historical data.

To use the checkpoint in your sync job, bind it to the resource's query parameters:

{
  "resource": "ticketing/tickets",
  "method": "list",
  "query": {
    "updated_at": {
      "gt": "{{previous_run_date}}"
    }
  }
}

Every subsequent run fetches only tickets modified since the last successful sync. No cron job managing watermarks. No checkpoint table migrations.

Forcing a full re-sync (backfill):

{
  "sync_job_id": "7279a917-b447-4629-9e46-a1eeb791ad6b",
  "integrated_account_id": "7ae7b0ab-c6a7-4f29-aec1-1f123517af5d",
  "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c",
  "ignore_previous_run": true
}

This resets the sync window to the epoch, pulling every record as if it were the first run. The checkpoint updates normally on completion, so the next scheduled run picks up incrementally from there.

Pagination and Incremental Sync Patterns

Declarative sync pipelines need to handle two orthogonal concerns: paginating through large result sets within a single run, and incrementally syncing only changed records across runs.

Pagination is handled by the integration config, not by your sync job definition. When you specify "method": "list" on a resource, the runtime automatically paginates through the full result set using the strategy defined in the integration's configuration - cursor-based, offset-based, page-number, or link-header. You don't write pagination logic. The sync job delivers every page of results to your webhook.

Incremental sync is your responsibility to declare, but the runtime manages the state. Bind previous_run_date to whatever filter parameter the API supports:

{
  "resource": "ticketing/tickets",
  "method": "list",
  "query": {
    "updated_at": { "gt": "{{previous_run_date}}" }
  }
}

When the API doesn't support server-side date filtering, combine a full fetch with a transform node that filters client-side:

{
  "name": "all-users",
  "resource": "ticketing/users",
  "method": "list",
  "persist": false
},
{
  "name": "changed-users",
  "type": "transform",
  "config": {
    "expression": "resources.ticketing.users[updated_at >= %.%.%.previous_run_date]"
  },
  "depends_on": "all-users",
  "persist": true
}

Dependent pagination handles the common pattern where you need to fan out requests based on parent records. In the Zendesk example, fetching comments requires a ticket_id from each ticket:

{
  "resource": "ticketing/comments",
  "method": "list",
  "depends_on": "ticketing/tickets",
  "query": {
    "ticket_id": "{{resources.ticketing.tickets.id}}"
  }
}

The runtime iterates over every ticket from the parent resource and fires a fully-paginated comments fetch for each one. You declare the dependency; the runtime handles the fan-out and pagination for each child request.

Recursive fetching handles tree structures where records can have child records of the same type (like folders containing subfolders):

{
  "resource": "file-storage/drive-items",
  "method": "list",
  "recurse": {
    "if": "{{resources.file-storage.drive-items.has_children:bool}}",
    "config": {
      "query": {
        "parent": { "id": "{{resources.file-storage.drive-items.id}}" }
      }
    }
  }
}

The recurse block defines a condition (has_children is true) and a config that re-invokes the same resource with the parent ID as a filter. The runtime handles the recursion depth automatically.

End-to-End: Source, Transform, Webhook Delivery

Let's trace a complete sync run from trigger to data landing in your system.

sequenceDiagram
  participant App as Your Application
  participant Truto as Sync Runtime
  participant Zen as Zendesk API
  participant WH as Your Webhook Endpoint
  
  App->>Truto: POST /sync-job-run<br>{sync_job_id, integrated_account_id, webhook_id}
  Truto->>WH: sync_job_run:started
  
  Truto->>Zen: GET /ticketing/users (page 1)
  Zen-->>Truto: 100 users + next_cursor
  Truto->>WH: sync_job_run:record (user 1..100)
  Truto->>Zen: GET /ticketing/users (page 2)
  Zen-->>Truto: 50 users (last page)
  Truto->>WH: sync_job_run:record (user 101..150)
  
  Truto->>Zen: GET /ticketing/tickets?updated_at[gt]=checkpoint
  Zen-->>Truto: 30 tickets (incremental)
  Truto->>WH: sync_job_run:record (ticket 1..30)
  
  loop For each ticket
    Truto->>Zen: GET /ticketing/comments?ticket_id=T_N
    Zen-->>Truto: Comments for ticket T_N
    Truto->>WH: sync_job_run:record (comments)
  end
  
  Truto->>WH: sync_job_run:completed
  Note over Truto: previous_run_date updated<br>to completion timestamp

1. Trigger the sync. Your application sends a POST request to create a sync job run, passing the sync job ID, the integrated account ID (which tenant's Zendesk), and the webhook ID (where to deliver records).

2. Resource execution. The runtime processes resources in dependency order. Independent resources (users, tickets) run in sequence. Dependent resources (comments) wait for their parent to complete. Every list resource is automatically paginated - the runtime follows cursors until the API signals no more pages.

3. Transform application. If a resource has a downstream transform node, the runtime evaluates the JSONata expression against each batch of fetched records. Only nodes with persist: true emit webhook events.

4. Webhook delivery. Each record is delivered to your webhook endpoint as a sync_job_run:record event. Your endpoint receives a structured payload containing the unified record data. Errors during individual record fetches are delivered as sync_job_run:record_error events - the sync continues processing remaining resources by default.

5. Checkpoint update. On successful completion, previous_run_date is updated to the completion timestamp. The next run automatically picks up only records changed after this point.

6. Scheduling. For recurring syncs, create a cron trigger instead of manually firing runs:

{
  "sync_job_id": "d7fd45d6-136a-4244-aeb9-b6439bfa8b71",
  "integrated_account_id": "6680c7ff-9f0e-45be-9915-a7334dc37f23",
  "webhook_id": "a5b21886-3b4d-4fd0-9956-ffc0714d701c",
  "cron_expression": "0 */6 * * *"
}

This runs the sync every 6 hours (cron is evaluated in UTC), automatically using the checkpoint from the last successful run. No external scheduler needed.

Error handling modes: By default, the runtime ignores individual record errors and continues processing, delivering sync_job_run:record_error events for each failure. If you need strict consistency, set "error_handling": "fail_fast" on the sync job run to halt on the first error. When a run fails, the checkpoint is not updated, so the next run retries the entire window automatically.

The entire pipeline - from trigger through paginated fetches, dependency resolution, JSONata transforms, incremental checkpointing, and webhook delivery - is defined by two JSON documents: the sync job manifest and the integration config. No DAG definitions. No orchestrator containers. No ETL framework.

How Truto Runs JSONata at Scale

Building a few JSONata expressions is easy. The interesting engineering problem is what happens when you have hundreds of these expressions across dozens of integrations and thousands of customers. Managing them requires a dedicated architecture.

Truto is built entirely on this declarative philosophy. The platform contains zero integration-specific code. The runtime engine is a generic pipeline that takes a declarative configuration describing how to talk to a third-party API, and evaluates the JSONata mapping describing how to translate the data. The same code path that handles a HubSpot CRM contact listing also handles Salesforce, Pipedrive, and Zoho - without knowing or caring which one it is talking to.

Three pieces that matter if you're evaluating this architecture:

1. Three-level override hierarchy. Because integration behavior is entirely data-driven, Truto enables per-customer customization of the unified API behavior without deploying code. Mappings can be overridden at the platform, environment, and individual account level. If one enterprise customer has 147 custom Salesforce fields nobody else has, you ship a JSONata override scoped to that single connected account - no fork, no code change, no deploy. The runtime deep-merges the three layers at request time. See 3-Level API Mapping for the full architecture.

2. Extended JSONata functions for API work. Truto ships an extended JSONata implementation with helpers like $mapValues() (the lookup-table function used in every error example above), $firstNonEmpty() (handles vendor APIs that return errors in three different shapes depending on the endpoint), and $convertQueryToSql() (turns a JSON filter object into a SOQL WHERE clause). These exist because the same five problems show up in every integration.

3. Adding an integration is a data operation, not a code operation. A new connector is a JSON config (base URL, auth scheme, pagination strategy, resource endpoints) plus a set of JSONata mappings for each unified resource. Both live in the database. The engine handles the HTTP requests, the standardized headers, and the OAuth token refreshes. The JSONata expressions handle the domain-specific data shaping. The same engine that runs 100+ integrations today runs the 101st without a deploy. Read Hot-Swappable API Integrations for how that works in practice.

The Honest Trade-offs

JSONata is not a free lunch. A few things to weigh:

  • Debuggability. A 200-line JSONata expression is harder to step through than 200 lines of TypeScript. Use the JSONata Exerciser for development, and keep expressions small and composable.
  • Performance. Evaluation cost scales with expression complexity. For high-throughput sync workloads, profile your hot mappings. Most of the time, network latency to the third-party API dwarfs JSONata cost - but it's worth measuring.
  • Hiring. Your team will need to learn JSONata. The learning curve is shorter than DataWeave or XSLT, but it's still a curve.
  • Custom logic that escapes JSONata's expressiveness. Multi-step orchestration (call API A, use the result to call API B, merge into the response) usually needs a runtime concept beyond pure expressions. Plan for an escape hatch for before/after steps.

The trade-off math almost always favors declarative configuration once you cross ~10 integrations or have any per-customer customization requirements. Below that threshold, a well-organized adapter pattern can be fine.

Where to Go From Here

Stop writing code for third-party API integrations. Every switch statement based on a provider name is a liability that will cost your team hours of maintenance when the vendor inevitably changes their schema. By adopting a declarative architecture powered by JSONata, you turn API integration into a data operation.

Three concrete next steps depending on where you are:

  1. If you're prototyping: Pull up the JSONata Exerciser, paste in a real third-party API response, and write a mapping that produces your unified shape. You'll feel the productivity difference within an hour.
  2. If you're evaluating architecture for a new integration platform: Start from the assumption that integration behavior should be data. Every place your design forces code, ask whether a JSONata expression plus a config field could replace it. See our developer guide to JSONata mapping examples for more advanced patterns.
  3. If you're drowning in adapter maintenance: The migration path is gradual. Pick one integration with the highest churn, rewrite its mapping as JSONata config, and measure the difference in time-to-fix on the next schema change. The ROI compounds with every additional connector you migrate.

Declarative mappings won't fix bad vendor APIs - nothing does. But they make every other layer of your integration stack cheaper to operate, easier to reason about, and faster to evolve. That's the trade you actually want to be making in 2026.

FAQ

What is JSONata and why use it for API integrations?
JSONata is a declarative, open-source query and transformation language for JSON data. It's ideal for API integrations because expressions are stored as strings (allowing you to hot-swap mappings without a code deploy), are side-effect free, and are Turing-complete. Enterprise platforms like IBM z/OS Connect and AWS Step Functions embed it natively.
How does JSONata compare to writing custom adapter code in TypeScript or Python?
Adapter code scales linearly in maintenance burden with the number of integrations—every connector has its own tests and deploys. JSONata expressions are configuration: one generic runtime evaluates them all, schema changes are edited in a database row rather than a code file, and per-customer overrides become possible without forking code.
How do you handle custom objects and fields with JSONata?
You can use JSONata functions like `$keys()` and `$difference()` to dynamically identify fields that are not part of a standard schema, and `$sift()` to filter object keys by predicate. This allows you to automatically discover and group unrecognized properties into a dedicated `custom_fields` object.
Can JSONata translate API query parameters?
Yes. JSONata can transform a standardized query object (like `email=test@example.com`) into provider-specific syntax, such as a Salesforce SOQL `WHERE` clause or a HubSpot `filterGroups` array. The caller's request stays identical across integrations.
How do I normalize API errors that come back as HTTP 200 OK?
Write a JSONata error expression that inspects the response body and returns a structured `{ status, message }` object. For Slack-style APIs, check `$not(data.ok)` and use `$mapValues()` to translate the vendor's proprietary error string to a real HTTP status (e.g., `invalid_auth` to 401). If the expression returns `undefined`, your runtime falls through to standard HTTP status checking.

More from our Blog