Tradeoffs Between Real-time and Cached Unified APIs
Compare real-time, cached, and declarative pass-through API architectures. Includes a decision matrix, TCO scenarios, and guidance on when to skip ETL for SaaS data sync.
When considering a unified API, it's essential to determine the type that aligns with your specific use case — whether you require just-in-time real-time data or data that is cached and refreshed periodically.
How real-time unified APIs work
Real-time unified APIs offer just-in-time data that is always fresh. These APIs fetch data in real time, making the call to the underlying API the moment you request it. This ensures the freshest data without storing it, eliminating the need for scheduling infrastructure.
Pros
-
Your customer's data is not stored
Customer data is not stored, ensuring data privacy. -
Fresh data every time
The data provided is always up-to-date. -
No scheduling infrastructure required
Real-time APIs fetch data instantly upon request.
Tradeoff
Limited querying and filtering: Complex queries may necessitate multiple API calls and data processing, creating additional steps to retrieve the required information. For instance, when fetching a list of users from Asana, you must first obtain the ID, then the team, and finally the user. This process involves navigating through multiple hoops to access the desired data.
! Real-time unified APIs vs. cached unified APIs
Real-time unified APIs vs. cached unified APIs
How cached unified APIs work
Cached unified APIs work by storing data fetched at periodic intervals in a database. The sync frequency can be adjusted to suit your use case.
A note on the real-time feature in cached unified APIs
While cached APIs may offer webhooks to simulate real-time updates, there is a time gap between the data entering your customer's system (say, Asana), the webhook relaying new data, and your scheduling infrastructure updating the data in your system. Moreover, not all the apps out there provide webhooks, and even the ones that do may not send events for the things you might be interested in.
Pros
Advanced filtering
Access to all data enables direct access to specific information without following a sequence of API calls. The cached Unified API providers build an API on top of the database that caches the data.
Overcome rate limits by the underlying APIs
Since the data is stored by the unified API provider, you do not have to worry about the rate limits of the underlying APIs.
Tradeoffs
-
Need for scheduling infrastructure
Setting up scheduling infrastructure is necessary to fetch data. When the scheduling infrastructure is handled by the unified API provider, there is a need to move to costlier pricing tiers to get higher sync intervals. -
Data storage and privacy
Customer data is stored with the unified API provider. -
Compromise on real-time data
The data fetched is not truly real-time, which may or may not be a disadvantage depending on your use case.
The ETL-free alternative: declarative pass-through pipelines
There is a third architectural pattern that does not fit neatly into the real-time vs. cached binary: declarative pass-through sync. Instead of landing your customers' SaaS data in a warehouse before forwarding it to your application (the warehouse ETL + reverse ETL path), a pass-through pipeline streams data directly from the source API into your own datastore. No intermediate storage. No transformation layer running in a warehouse you have to govern.
A declarative data sync pipeline defines what data to fetch - which resources, which fields, in what order - without procedural code for HTTP calls, pagination, authentication, or error handling. The pipeline runtime handles all of that generically. This makes it an ETL-free data sync strategy purpose-built for embedded B2B SaaS integration.
To be clear: this is not about replacing data warehouses for analytics. If your data team needs cross-customer dashboards or ML training sets, a warehouse is the right tool. But if your product simply needs to pull your customer's tickets, contacts, or employees into your own database for operational use, the double-ETL path (warehouse ingest + reverse ETL back out) adds cost, latency, and compliance surface that a pass-through pipeline avoids entirely.
Decision matrix: architecture trade-offs
The table below compares three common architectures for syncing third-party SaaS data into your application. Use it to anchor the conversation with your engineering and security teams.
| Factor | Declarative pass-through | Warehouse ETL + reverse ETL | Cached unified API |
|---|---|---|---|
| Data freshness | Near real-time (webhook) or scheduled (minutes to hours) | Batch - typically 15-60 min ingest + transformation delay | Periodic cache refresh, provider-controlled intervals |
| Bidirectional writes | Supported via real-time unified API writes; conflict resolution is application-level | Reverse ETL handles outbound; inbound is a separate pipeline | Limited - most cached providers are read-heavy |
| Intermediate data storage | None - data flows through in memory, only sync metadata persists | Warehouse stores a full copy of customer data | Provider stores a full copy of customer data |
| Compliance surface (GDPR, SOC 2) | Minimal - no intermediate data processor to govern | Warehouse becomes a data processor; requires DPAs, retention policies, DSAR procedures | Provider is a data processor; review their sub-processor list |
| Querying & filtering | Depends on your own datastore schema | Full SQL power in the warehouse | Provider-built API on cached data; advanced filtering |
| Rate limit handling | Pipeline handles backoff and retry; your app still bound by source API limits | Decoupled - warehouse absorbs the load; app queries the warehouse | Provider absorbs all rate limits |
| Bulk/historical extraction | Works but requires careful checkpoint management for large backfills | Purpose-built for bulk; warehouse handles scale natively | Handled by provider's sync infrastructure |
| Ongoing engineering cost | Low - config-driven; no per-connector code | High - orchestration (Airflow/dbt), warehouse compute, reverse ETL tool | Low if provider covers your integrations |
| Vendor lock-in | Config is portable if based on open formats (JSON/JSONata) | Heavy - tied to warehouse + orchestrator + reverse ETL vendor | Medium - normalized schema may not transfer |
Sample TCO and latency scenarios
The numbers below are illustrative, built from industry benchmarks. Use them as starting points for your own spreadsheet, not as guarantees.
Scenario 1: Early-stage SaaS, 5 integrations, 200 customers
You need CRM and ticketing data flowing into your product. Each customer connects one or two accounts.
| Cost line | Declarative pass-through | Warehouse ETL + reverse ETL |
|---|---|---|
| Integration platform | ~$500-1,500/mo (unified API) | ~$1,000-2,000/mo (ELT tool) |
| Warehouse compute | $0 | ~$500-1,500/mo |
| Reverse ETL tool | $0 | ~$500-1,000/mo |
| Engineering maintenance | ~5 hrs/mo (config tweaks) | ~20-40 hrs/mo (pipeline monitoring, dbt models, orchestration) |
| Estimated monthly total | ~$1,000-2,500 | ~$4,000-8,500 |
| Typical end-to-end latency | 1-15 minutes | 30-90 minutes |
At this stage, the warehouse path roughly doubles your integration spend with no proportional benefit for operational data use cases.
Scenario 2: Growth-stage SaaS, 15 integrations, 1,000 customers
You have HRIS, CRM, ATS, and ticketing categories. Some customers have 50K+ records.
| Cost line | Declarative pass-through | Warehouse ETL + reverse ETL |
|---|---|---|
| Integration platform | ~$2,000-4,000/mo | ~$3,000-5,000/mo (ELT tool, higher MAR tier) |
| Warehouse compute | $0 | ~$2,000-5,000/mo |
| Reverse ETL tool | $0 | ~$1,500-3,000/mo |
| Engineering headcount | ~0.25 FTE | ~0.5-1 FTE |
| GDPR compliance overhead | Minimal (no intermediate store) | DPAs, retention policies, DSAR procedures for warehouse |
| Estimated monthly total | ~$5,000-8,000 | ~$12,000-25,000+ |
| Typical end-to-end latency | 5-30 minutes | 45 minutes - 2 hours |
The compliance gap widens here. Every intermediate data store becomes a data processor under GDPR, requiring data processing agreements, retention policies, and deletion procedures for a system that exists purely as plumbing.
Scenario 3: Enterprise SaaS with regulated data (healthcare, fintech)
You are handling employee PII or financial data. Your customers require SOC 2 Type II and single-tenant isolation.
| Cost line | Declarative pass-through | Warehouse ETL + reverse ETL |
|---|---|---|
| Integration platform | ~$4,000-8,000/mo | ~$5,000-8,000/mo |
| Warehouse (single-tenant) | $0 | ~$5,000-15,000/mo |
| Reverse ETL | $0 | ~$3,000-5,000/mo |
| Compliance & audit | Lower surface area; fewer sub-processors to document | Full DPIAs for warehouse processing; vendor risk assessments at ~$1,000-5,000 per vendor |
| Estimated monthly total | ~$6,000-12,000 | ~$18,000-40,000+ |
For regulated data, every intermediate storage layer is a liability. GDPR fines can reach up to 4% of annual worldwide turnover, and healthcare data breaches cost approximately $7.42 million per incident on average. A pass-through architecture that retains zero customer data after delivery removes an entire class of risk from your compliance posture.
Scenario 4: Analytics-heavy product needing cross-customer aggregation
Your product builds dashboards that aggregate data across a customer's multiple SaaS tools.
| Cost line | Declarative pass-through | Warehouse ETL + reverse ETL |
|---|---|---|
| Fit | Poor - pass-through delivers to your DB, but cross-source joins require you to build the query layer | Strong - warehouse is designed for analytical queries across large datasets |
This is the scenario where a warehouse earns its keep. If your product's core value depends on SQL-level ad-hoc querying across multiple data sources, pay for the warehouse. A pass-through pipeline is the wrong tool here.
When to avoid pass-through declarative pipelines
Declarative sync pipelines are not a universal answer. Skip them when:
-
Complex bidirectional sync with conflict resolution - If two systems can modify the same record simultaneously, you need application-level conflict resolution logic that goes beyond what a declarative config can express. Think CRM field-level merge rules or inventory count reconciliation.
-
Sub-second event-driven workflows - If you need to react to a webhook event within milliseconds (fraud detection, real-time pricing), a scheduled sync pipeline is too slow. Use webhooks paired with a real-time unified API instead.
-
Bulk historical extraction at warehouse scale - Backfilling 50 million rows from a customer's Salesforce instance is a job for a purpose-built ELT tool with native CDC support. A pass-through pipeline handles incremental syncs well, but initial bulk loads at massive scale will test its limits.
-
Cross-source analytical joins - If you need to JOIN a customer's HubSpot contacts with their Zendesk tickets and their Stripe invoices in a single SQL query, you need a warehouse. Pass-through delivers data to your operational database, but it is not an analytics engine.
-
Radically different per-customer transformation logic - If every customer needs a fundamentally different transformation pipeline (not just different field mappings, but different processing graphs), a code-based orchestrator like Airflow or Dagster gives you more flexibility. That said, declarative transforms with JSONata handle more customization than most teams expect - per-customer field mappings, conditional logic, and computed fields are all expressible in config.
Recommendations for hybrid patterns (partial writeback)
Most real-world architectures end up as hybrids. Here are patterns that work well:
Pattern 1: Pass-through reads, real-time API writes
The most common pattern for B2B SaaS. Use a declarative sync pipeline to pull data from your customers' SaaS accounts into your operational database on a schedule (every 15 minutes, every hour). When your application needs to write back - creating a contact, updating a ticket status, pushing a lead score - make a direct write through the real-time unified API.
This avoids the latency of a reverse ETL batch job for writes while still giving you structured, queryable data locally. The real-time API handles OAuth, rate limiting, and field mapping on the write path. Your application owns the conflict resolution logic because it knows the business context.
Pattern 2: Declarative sync for operational data, warehouse for analytics
Run pass-through pipelines for the data your product needs operationally - employee records, CRM contacts, support tickets. Separately, run a traditional ELT pipeline into a warehouse for the data your analytics team needs for cross-customer reporting, churn models, or usage dashboards.
The key insight: these two data paths serve different consumers with different freshness and query requirements. Forcing both through a single warehouse path over-engineers the operational use case and under-serves the analytical one.
Pattern 3: Selective writeback with override mappings
For products that need partial bidirectional sync - reading most data but writing back a subset of fields - use a declarative pipeline with per-resource write mappings. Define which fields your application is the source of truth for, and sync only those fields back. The unified API's override hierarchy lets you customize write mappings per customer or even per connected account without code changes.
This pattern works well for scenarios like: pushing a computed lead score back into a CRM, updating a custom field with your product's output, or syncing task status between your app and a project management tool.
The right architecture depends on your data's destination, not its source. If data flows into your product's operational database, a declarative pass-through pipeline is almost always cheaper, faster, and easier to govern than a warehouse round-trip. If data flows into analytical models, the warehouse earns its complexity.
Truto's Approach
At Truto, we prioritize flexibility and offer four options for our customers:
Real-time unified API
Truto's default option, providing real-time data without storing customer data, is ideal for just-in-time data needs.
Sync to your database with Truto Daemon
Fetch and store data from our unified APIs in your database, enabling advanced querying and filtering while maintaining data privacy.
The Daemon runs within your VPC or cloud infrastructure and just uses the same real-time Unified API that is available to all of our customers. The code is open-source on GitHub too.
Sync data with Truto RapidBridge
In case you don't want to maintain or run the Daemon in your infrastructure, Truto also provides RapidBridge solution which fetches the data from the Unified APIs and sends it to you via a webhook endpoint. This means that you don't have to maintain a scheduling infrastructure to fetch the data periodically.
Fetch from our database with Truto SuperQuery
Retrieve data stored in our database, allowing richer querying and filtering without the need to manage your databases. Choose between single-tenant or multi-tenant instances as per your requirements.
Understanding the nuances of real-time and cached unified APIs empowers product and engineering managers to select the most suitable approach for their specific data access needs. Choose wisely, considering factors like data freshness, privacy, querying capabilities, and the necessity for scheduling infrastructure.
FAQ
- What is a declarative data sync pipeline?
- A declarative data sync pipeline defines what data to fetch - which resources, fields, and order - without procedural code for HTTP calls, pagination, or authentication. The pipeline runtime handles execution generically, making it an ETL-free approach to SaaS data synchronization.
- When should I use a warehouse ETL pipeline instead of a pass-through sync?
- Use a warehouse ETL pipeline when you need cross-source analytical joins (e.g., joining CRM contacts with billing data in SQL), bulk historical extraction at massive scale, or when your product's core value depends on ad-hoc querying across multiple data sources.
- What are the compliance advantages of pass-through pipelines over cached or warehouse-based approaches?
- Pass-through pipelines retain zero customer data after delivery - only sync metadata persists. This eliminates the intermediate data store that would otherwise become a data processor under GDPR, requiring data processing agreements, retention policies, and data subject access request procedures.
- Can I use both pass-through sync and a warehouse together?
- Yes. A common hybrid pattern uses declarative pass-through pipelines for operational data your product needs (contacts, tickets, employees) and a separate ELT pipeline into a warehouse for analytics, reporting, and ML workloads. Each path serves different consumers with different requirements.
- What is the cost difference between declarative pass-through and warehouse ETL for SaaS integrations?
- For a growth-stage SaaS with 15 integrations and 1,000 customers, a pass-through pipeline typically costs $5,000-8,000/month versus $12,000-25,000+ for warehouse ETL plus reverse ETL. The gap widens with regulated data where compliance overhead for intermediate storage adds significant cost.