What is the difference between a canonical data model and a unified API?

A canonical data model is a design pattern — a schema plus mapping logic you build yourself. A unified API is a managed service that implements that pattern for you, including the canonical schema, per-provider mappings, authentication, pagination, and ongoing maintenance behind a single API endpoint.

When should I build my own common data model instead of using a unified API?

Building in-house makes sense when you have 1-3 stable integrations and a small team. Beyond 5-10 integrations, especially with enterprise customers requesting more, the maintenance burden of an in-house CDM typically exceeds the cost of a unified API platform.

What are common data model examples in B2B SaaS?

Common examples include a unified CRM Contact schema (normalizing Salesforce, HubSpot, and Pipedrive contacts), an HRIS Employee model (normalizing BambooHR, Workday, and Gusto employees), and an ATS Candidate model (normalizing Greenhouse, Lever, and Ashby applicants).

What is a Common Data Model in APIs? (2026 Architecture Guide)

Q: What is a common data model in APIs?

A common data model (also called a canonical data model) is a provider-independent JSON schema that standardizes entities like Contacts, Employees, or Invoices across multiple third-party APIs. Each provider's data is mapped to and from this shared schema, so your application only reads and writes one format.

A common data model in APIs (also called a canonical data model or CDM) is a standardized, intermediary schema that sits between your application and every third-party API you integrate with. Instead of writing bespoke data transformation logic for Salesforce, then HubSpot, then Pipedrive, you map each provider's response to one shared schema. Your application code reads and writes against that single schema, and the mapping layer handles translation in both directions.

If you're here, you're probably staring at a growing list of integration requests and wondering whether to keep hand-rolling provider-specific code or invest in a normalized data layer. This guide covers exactly what a common data model is, how it works at the API level, where traditional CDMs break down, and when a unified API is the better architecture.

The N² Integration Problem in B2B SaaS

Enterprise buyers don't purchase isolated software. They purchase nodes in a massive, interconnected graph of data. According to BetterCloud's State of SaaS Report, the average organization uses between 106 and 131 different SaaS applications. Every one of those tools has its own API, its own field naming conventions, its own pagination style, and its own authentication quirks.

Without a common data model, connecting N systems requires mapping between every pair — which scales at n². Introducing a canonical data model reduces the number of required mappings from n² to just 2n. Each new system only needs to be mapped twice: once to translate its native format into the canonical model, and once to translate the canonical model back. A solution consisting of just 6 applications requires 30 point-to-point message translators without a CDM and only 12 when using one. At 20 integrations, you're looking at 380 point-to-point mappings versus 40 canonical ones.

The cost is real. Custom API integrations typically cost between $50,000 and $150,000 per integration per year, including development, QA, monitoring, and ongoing support. Annual maintenance alone adds 10–20% of the initial build cost as vendors update their APIs. Multiply that across a dozen CRMs your customers use, and you've burned through a year of engineering budget before writing a line of core product code.

For a startup trying to ship five CRM integrations to unblock enterprise sales deals, the math simply doesn't work. You can't dedicate half your engineering team to reading third-party API documentation and mapping vendor-specific fields. You need a scalable architectural pattern. You need to evaluate the true cost of building SaaS integrations in-house.

What is a Common Data Model in APIs?

A common data model in APIs is a provider-independent schema that defines canonical entities (like "Contact", "Employee", or "Invoice") with standardized field names, types, and relationships. Every integration maps its native data to and from this schema, so consuming applications only interact with one predictable format.

The concept traces back to the Enterprise Integration Patterns catalog by Gregor Hohpe and Bobby Woolf: design a canonical data model that is independent from any specific application, and require each application to produce and consume messages in this common format. The canonical data model provides an additional level of indirection between each application's individual data formats. Instead of requiring point-to-point mappings between every system, each system maps to a single shared hub.

graph TD
    subgraph Point-to-Point Architecture
        A[Salesforce] <--> B[HubSpot]
        A <--> C[Zendesk]
        B <--> C
        A <--> D[Jira]
        B <--> D
        C <--> D
    end

    subgraph Canonical Data Model Architecture
        E[Salesforce] <--> Hub((Common<br>Data<br>Model))
        F[HubSpot] <--> Hub
        G[Zendesk] <--> Hub
        H[Jira] <--> Hub
    end

In API terms, a CDM typically includes:

Entity definitions — A Contact has id, first_name, last_name, email_addresses, phone_numbers, created_at, updated_at
Standardized enums — Employment status is always active, inactive, or terminated — not A, 1, or TRUE
Consistent relationships — A Contact always references an Account by account.id, regardless of whether the source system calls it CompanyId, organization_id, or account_ref
Predictable pagination — Always next_cursor / prev_cursor, even if the source API uses page numbers, offsets, or link headers

When your application needs to fetch a list of customer contacts, it sends a standardized GET /contacts request to the common data model. The integration layer translates that request into the specific query language required by the target third-party API, executes the request, and normalizes the response back into the standard Contact schema. Your application only ever talks to the hub.

How API Schema Normalization Works in Practice

Let's make this concrete. You're building a product that needs to read contacts from both HubSpot and Salesforce. Here's what the raw API responses look like.

The HubSpot Native Payload

HubSpot represents a Contact as a relatively flat object. Most of the valuable data is nested inside a properties object, and phone numbers are split across highly specific keys.

{
  "id": "12345",
  "properties": {
    "firstname": "John",
    "lastname": "Doe",
    "email": "john.doe@example.com",
    "phone": "+1-555-0123",
    "hs_whatsapp_phone_number": "+1-555-0999",
    "jobtitle": "VP of Engineering"
  },
  "createdAt": "2024-01-15T10:30:00Z"
}

The Salesforce Native Payload

Salesforce uses a completely different structure. Fields are PascalCase, custom fields carry a __c suffix, and phone numbers are broken out into specific functional categories.

{
  "Id": "003xxx00001abcde",
  "FirstName": "John",
  "LastName": "Doe",
  "Email": "john.doe@example.com",
  "Phone": "+1-555-0123",
  "MobilePhone": "+1-555-0999",
  "Title": "VP of Engineering",
  "CreatedDate": "2024-01-15T10:30:00.000+0000"
}

Without a CDM, your application absorbs this complexity directly:

// This is what your codebase looks like without a CDM
if (provider === 'hubspot') {
  name = data.properties.firstname + ' ' + data.properties.lastname;
  email = data.properties.email;
} else if (provider === 'salesforce') {
  name = data.FirstName + ' ' + data.LastName;
  email = data.Email;
}
// ... repeat for 15 more CRMs

The Common Data Model Output

With a common data model, both responses normalize to an identical structure:

{
  "id": "12345",
  "first_name": "John",
  "last_name": "Doe",
  "title": "VP of Engineering",
  "email_addresses": [
    {
      "email": "john.doe@example.com",
      "is_primary": true
    }
  ],
  "phone_numbers": [
    {
      "number": "+1-555-0123",
      "type": "phone"
    },
    {
      "number": "+1-555-0999",
      "type": "mobile"
    }
  ],
  "created_at": "2024-01-15T10:30:00Z"
}

Notice how the common data model converts flat string fields into typed arrays (like phone_numbers), standardizes date-time formats into ISO 8601, and maps unique identifiers to a simple id string. Your application code becomes one path, regardless of provider. The complexity lives in the mapping layer, not in your business logic.

The hard part isn't field renaming — it's the semantic differences. HubSpot stores multiple emails as a semicolon-delimited string in hs_additional_emails. Salesforce spreads phone numbers across Phone, MobilePhone, HomePhone, OtherPhone, AssistantPhone, and Fax. A ticketing system like Zendesk splits users into Agents and Requesters, while Jira treats everyone as an Issue assignee. Your CDM's mapping layer needs to handle all of this — and that's where most in-house implementations start to buckle. (We've written more about this in our guide on why schema normalization is the hardest problem in SaaS integrations.)

Info

Normalization goes both ways. A common data model must also translate your unified query parameters (e.g., ?email=john@example.com) into the native query language of the target API. For HubSpot, this means constructing a complex filterGroups JSON payload. For Salesforce, it means dynamically generating a SOQL WHERE clause. The common data model abstracts this entirely.

The Flaw in Traditional Canonical Data Models

Traditional CDMs work well inside controlled enterprise environments where you own all the systems. They struggle in B2B SaaS for three reasons.

1. The Lowest Common Denominator Problem

When you force fifty different APIs into a single schema, you make compromises. A canonical schema for "Contact" can only include fields that exist across most providers. But enterprise customers don't live in the lowest common denominator. They have custom fields on Salesforce (Lead_Score__c), custom properties on HubSpot (deal_priority), and proprietary objects that don't map to anything in your schema.

A Salesforce instance at a Fortune 500 company might have hundreds of custom objects and fields tracking highly specific business logic. If your canonical data model strips out these custom fields because they don't exist in the standard schema, you've actively destroyed the value of the integration for your enterprise customer. A rigid CDM forces you to either ignore this data or bolt on an untyped custom_fields bag that your application can't reason about. Neither option is acceptable. This is why your unified APIs are lying to you about the hidden cost of rigid schemas.

2. Bidirectional Mapping Is a Different Beast

Reading data into a CDM is the easy direction. Writing it back is where things fall apart. Creating a contact in Salesforce requires different mandatory fields than creating one in HubSpot. Salesforce needs LastName; HubSpot technically requires nothing. Your CDM's create schema has to express these provider-specific validation rules, which chips away at the purpose of a common model.

3. The Maintenance Treadmill

The integration landscape has evolved from monolithic ESBs toward distributed patterns. Modern businesses run on dozens of specialized cloud apps, each evolving at its own pace. Every time one of those systems changes its data format — a new field, a renamed attribute, a restructured payload — multiple translators need updating. The maintenance burden is proportional to the number of connections, not the number of systems.

If you're maintaining your own CDM in-house, every API version bump from a vendor becomes a mapping update ticket. HubSpot ships API changes roughly quarterly. Salesforce has three major releases per year. Multiply that by 20+ providers and you need a dedicated team just for mapping maintenance.

Common Data Model vs. Unified API

A common data model is a design pattern — a JSON schema plus the mapping logic you build around it. A unified API is a product that implements that pattern as a service: it gives you the canonical schema, the per-provider mappings, the authentication handling, the pagination normalization, and the ongoing maintenance — all behind a single API endpoint.

Aspect	In-House CDM	Unified API
Schema design	You design and maintain	Pre-built, covers 20+ categories
Mapping logic	You write per provider	Handled by the platform
Auth management	You build OAuth flows per provider	Managed (token refresh, re-auth)
Pagination	You normalize per provider	Consistent cursor-based interface
Rate limit handling	You implement backoff and retries per provider	Platform-managed
New provider support	Weeks to months of engineering	Available immediately
Custom fields	You decide the escape hatch	Depends on platform architecture
Ongoing maintenance	Your engineering team	Platform vendor

Building an in-house common data model means you own the schema, but you also own all the infrastructure required to support it. A true integration layer is not just data mapping. It requires authentication state management — securely storing OAuth credentials, handling token refreshes, and managing expiration lifecycles across dozens of providers. It requires pagination normalization across cursor-based, offset-based, and link-header approaches. It requires rate limit handling with exponential backoff, circuit breakers, and retry queues that respect each provider's specific rate limit headers. And it requires normalizing incoming webhooks from third-party systems into standardized event payloads.

A Unified API provides the common data model out-of-the-box, alongside all of this supporting infrastructure. Instead of spending months building an integration framework, your team routes requests through the Unified API provider. The provider handles the OAuth handshake, refreshes tokens shortly before they expire, executes the HTTP request, normalizes the pagination, maps the data into the canonical schema, and returns the unified response.

That said, unified APIs have their own trade-offs. You're introducing a dependency on a third-party service in your data path. You need to evaluate how the platform handles latency, data residency, custom fields, and what happens when their mapping doesn't cover your edge case. Not every unified API is built the same way.

The right question isn't "CDM or unified API?" — it's "How much of the mapping layer do I want to own?" When you're integrating with 2–3 systems, point-to-point is often the pragmatic choice. Once you're past 10 providers with active customers demanding new integrations monthly, the economics shift hard toward a managed solution.

The Truto Approach: Extensible Schemas Without Integration-Specific Code

Most unified API platforms claim they solve the common data model problem. Under the hood, many maintain separate code paths for each provider — if (hubspot) { ... } else if (salesforce) { ... }. This creates the same maintenance burden they promise to eliminate, just behind their own API.

Truto's architecture works differently. The runtime engine is completely generic. It does not contain a single line of integration-specific code. No switch statements on provider names. No dedicated handler files per CRM. Instead, all provider-specific behavior is expressed as data (a pattern we recommend for normalizing data models across different CRMs): JSON configuration for how to call the API, and JSONata expressions for how to transform the response.

1. JSONata for Declarative Mapping

Instead of hardcoding data transformations in TypeScript or Python, Truto uses JSONata expressions to map disparate third-party APIs into a unified schema. JSONata is a lightweight, functional query and transformation language for JSON data. It supports conditionals, string manipulation, array transforms, and custom functions — all in a declarative expression format.

Here's what a response mapping expression looks like for normalizing HubSpot contacts:

response.{
  "id": $string(id),
  "first_name": properties.firstname,
  "last_name": properties.lastname,
  "email_addresses": [
    properties.email ? { "email": properties.email, "is_primary": true },
    properties.hs_additional_emails
      ? properties.hs_additional_emails.$split(";").{ "email": $ }
  ],
  "phone_numbers": [
    properties.phone ? { "number": properties.phone, "type": "phone" },
    properties.mobilephone ? { "number": properties.mobilephone, "type": "mobile" }
  ],
  "created_at": createdAt,
  "updated_at": updatedAt
}

The same engine evaluates a completely different mapping for Salesforce — building SOQL queries, flattening PascalCase fields, parsing six phone number types — without any code change. Both produce the identical common data model output. Every field mapping, query translation, and conditional logic rule is stored as declarative configuration. Adding a new integration or fixing a broken field mapping is a data operation, not a software release.

2. `remote_data` Solves the Lowest Common Denominator Problem

To ensure developers are never locked out of custom fields that don't fit the common data model, Truto automatically appends a remote_data object to every unified response. This object contains the exact, unmodified JSON payload returned by the third-party API.

If an enterprise customer has a highly specific custom field in their Salesforce instance (e.g., Region_Routing_Code__c), it's right there in remote_data. You get the benefit of a clean, canonical schema for standard fields, without losing access to the proprietary data your customers rely on.

3. Three-Level Override Hierarchy for Enterprise Customization

No two enterprise deployments are exactly alike. A common data model must be flexible enough to accommodate tenant-specific quirks. Truto handles this through a three-level configuration override hierarchy:

Platform level — The base canonical data model and default mappings provided by Truto.
Environment level — Overrides applied to a specific customer environment (e.g., a customer whose Salesforce instance renamed Phone to Business_Phone__c).
Account level — Overrides applied to a single, specific integrated account for edge cases.

Overrides are deep-merged, so you only specify what's different. A customer doesn't need to rewrite the entire mapping — just the fields that diverge from the standard. If one enterprise customer needs their HubSpot industry field mapped to a custom unified field, you apply a JSONata override strictly to their integrated account. The rest of your customer base remains unaffected. This lets you offer bespoke integrations to your largest enterprise buyers without branching your codebase.

4. Dynamic Resource Resolution

Many third-party APIs don't map cleanly to standard CRUD operations. An HRIS platform might require you to call /employees/full-time for one query and /employees/contractors for another, even though both represent the same unified Employee entity. HubSpot routes to different endpoints depending on whether you're listing all contacts, searching with filters, or pulling contacts from a specific list.

The mapping configuration handles this with expression-based routing:

{
  "resource": {
    "expression": "rawQuery.search_term ? 'contacts-search' : 'contacts'",
    "resources": ["contacts", "contacts-search"]
  }
}

The generic engine evaluates the expression and routes to the correct endpoint. To your engineering team, it looks like a simple GET /employees request. Behind the scenes, the common data model handles the multi-step orchestration and endpoint routing. No conditional logic in application code. No provider-specific branching.

When to Build Your Own CDM vs. Buying a Unified API

Here's a practical decision framework:

Scenario	Recommendation
1–3 integrations, stable providers, small team	Build in-house. Point-to-point is fine at this scale.
5–10 integrations, enterprise customers asking for more	Seriously evaluate a unified API. The maintenance cost is about to explode.
10+ integrations across multiple categories (CRM + HRIS + ATS)	Buy. Unless integrations are your core product, the engineering cost is not defensible.
You need real-time, sub-second access to raw provider data	Consider a hybrid: unified API for reads, direct integration for latency-critical writes.
Regulated industry with strict data residency requirements	Evaluate platforms that don't store customer data. Not all unified APIs work the same way here.

Info

Key trade-off to consider: Using any unified API — Truto included — means adding a service in your critical data path. Evaluate latency requirements, data residency policies, and fallback behavior before committing. For some use cases (like high-frequency trading data or sub-10ms latency requirements), a direct integration may still be the right call.

What This Means for Your Integration Strategy

A common data model is not a new idea — it's a form of enterprise application integration intended to reduce costs and standardize on agreed data definitions. What's changed is the execution environment: modern B2B SaaS teams aren't integrating 6 internal systems behind a firewall. They're integrating with their customers' tools — dozens of them, each configured differently, each with custom fields and undocumented behaviors.

The pattern still works. The implementation needs to be far more flexible than traditional CDMs allowed.

If you're building integrations today, start with these concrete steps:

Audit your integration surface area. How many providers do your customers actually use? How many are on the roadmap? If the number exceeds five, the n² math starts working against you.
Define your canonical entities first. Even if you decide to build in-house, spending a week designing your Contact, Company, and Deal schemas before writing any mapping code will save months of rework later.
Never throw away the original response. Whatever CDM approach you choose, always preserve the raw provider data. Custom fields, metadata, and provider-specific IDs will be needed sooner than you think.
Separate your mapping layer from your business logic. If your application code contains if (provider === 'salesforce') statements, you've coupled two concerns that should be independent. A declarative mapping layer — whether built with JSONata, JMESPath, or your own DSL — is worth the upfront investment.
Evaluate the maintenance cost honestly. The first integration is fun. The fifteenth is a grind. The true cost isn't building the mapping — it's maintaining it when vendors push breaking changes at 2 AM on a Friday.