Skip to content

How to Create Provider-Specific API Runbooks (With Tested Templates & Code)

Stop firefighting third-party API failures. Learn how to build provider-specific API runbooks for Salesforce, NetSuite, and HubSpot with tested templates.

Yuvraj Muley Yuvraj Muley · · 16 min read
How to Create Provider-Specific API Runbooks (With Tested Templates & Code)

Your on-call engineer just got paged at 3 AM because a Salesforce sync threw TOTAL_REQUESTS_LIMIT_EXCEEDED. The runbook they pull up says "check rate limits and retry." Useless. Salesforce doesn't return HTTP 429 with a Retry-After header like HubSpot. It throws a SOQL governor limit exception that resets at midnight in the org's local timezone, and your retry loop is about to burn the next 8 hours of quota in 12 minutes.

If your engineering team is spending their Tuesday mornings debugging silent webhook drops and undocumented OAuth failures, you do not need another integration. You need an operational framework to stop the bleeding. As we explored in our guide on why SaaS integrations break after launch, when you create provider-specific runbooks with tested examples, you transition your team from chaotic firefighting to predictable, measurable maintenance.

Every third-party API has its own governance model, error vocabulary, and recovery semantics. A standardized template that pretends Salesforce, NetSuite, and HubSpot behave alike will fail at the exact moment you need it. This guide shows you how to create provider-specific runbooks with tested examples—the structure to use, the quirks to document for the three APIs that break most often, and how to stop writing runbooks as static documents and start expressing them as executable configuration.

If you don't yet have a baseline operational playbook for your integrations layer, start with our foundational guide on how to create an operational runbook and monitoring playbook before going provider-specific.

The Myth of the Generic API Integration Runbook

A generic API integration runbook is a dangerous illusion. You cannot write a single standard operating procedure (SOP) that covers Salesforce, NetSuite, and HubSpot. They fail in fundamentally different ways.

When you tell an on-call engineer to "check the rate limits," that instruction means entirely different things depending on the upstream provider. Consider three failure modes that all look like "the integration is broken":

  • Salesforce: A trigger silently truncates results because a synchronous transaction is capped at 100 SOQL queries and 50,000 returned records. There is no Retry-After. There is no 429. There is a LimitException and a transaction that already rolled back. Checking the limit means looking at the Sforce-Limit-Info header for a 24-hour rolling allocation.
  • NetSuite: A RESTlet starts returning 429s mid-sync because the 11th simultaneous request arriving against an account with a 10-slot pool is rejected immediately with a 429 or SSS_REQUEST_LIMIT_EXCEEDED error. Backoff doesn't help if the noisy neighbor is your own marketing job stealing concurrency slots across the entire customer tenant.
  • HubSpot: A bulk export hits a wall because HubSpot enforces a daily quota of roughly 500k to 1M calls per tenant plus a burst cap of approximately 190 calls per 10-second window, and the daily counter doesn't reset until midnight in the account's configured timezone. You must check the X-HubSpot-RateLimit-Secondly header.

Generic runbooks lead to extended downtime because they force responders to context-switch and read third-party API documentation under pressure. A provider-specific runbook codifies the exact quirks, undocumented behaviors, and error payloads of a specific API into an executable checklist. One generic playbook cannot resolve all three. The semantics are completely different: synchronous governor limit, account-wide concurrency cap, dual rolling window. The runbook for each must be written as if the others don't exist.

The True Cost of API Maintenance in 2026

Integration maintenance is no longer a minor operational nuisance. It is a board-level financial liability. If you are a product manager or engineering leader, you must understand the math behind integration downtime to justify the time spent building these runbooks. Before your VP of Engineering signs off on "yet another doc project," anchor the conversation in real numbers.

API maintenance and troubleshooting consume a massive portion of engineering capacity. A 2024 Lunar.dev report of more than 200 software companies found that 60% report spending too much time troubleshooting third-party APIs, and that hidden incremental cost compounds on top of direct API consumption fees. 36% of companies say they spend more time troubleshooting APIs than developing new features, and 88% report that API issues require weekly attention.

The burden falls heavily on data and backend teams. Engineers can spend nearly half of their time manually building and maintaining data pipelines and integrations. Fivetran reports that data engineers spend 44% of their time on manual pipeline maintenance, costing organizations well into six figures annually.

The financial cost of integration downtime is catastrophic, making operational runbooks a necessity rather than a luxury. According to a study by Oxford Economics, 100% of organizations experienced revenue loss from outages in the past year, with an estimated average cost of $9,000 per minute—translating to $540,000 per hour of downtime for enterprise systems.

At the enterprise level, the numbers are even more punishing. Unplanned downtime costs Fortune Global 500 companies 11% of their annual turnover. Siemens' 2024 True Cost of Downtime research found that unscheduled downtime totals nearly $1.5 trillion combined for Fortune 500 companies.

Every additional hour your team spends decoding undocumented provider quirks at 3 AM is an hour not spent on the product. Provider-specific runbooks aren't documentation hygiene. They are the cheapest insurance policy you can buy against integration-driven downtime. If you are planning your integration roadmap, review the SaaS product manager's integration rollout playbook to ensure you are costing these builds correctly upfront.

Core Components of a Provider-Specific Runbook

Every provider-specific runbook should cover the same six sections. The content of each section changes per API, but the structure must stay consistent so on-call engineers can navigate without thinking. If any of these are missing, your responders will eventually hit a dead end during an incident.

1. Authentication Lifecycles and Recovery: Document exactly how the API authenticates. Detail the token type, the token expiration window, the refresh token lifespan, and the exact error payload returned when a token is fully revoked. Include the SQL query or script required to force a manual token refresh in your database.

2. Pagination Contracts and Quirks: APIs handle pagination differently. Document whether the API uses cursor-based pagination, offset-based pagination, or Link headers. Explicitly state whether the cursor is opaque, the maximum page size, and the exact behavior when a query exceeds the maximum allowed offset or reaches the last page.

3. Rate Limit Models: List the exact HTTP headers the provider uses to communicate rate limits. Document the burst window, the daily/monthly window, which headers carry remaining budget, and what the 429 (or non-429) error looks like. Define the expected exponential backoff strategy for this specific provider.

4. Error Normalization Mapping: Third-party APIs are notorious for returning HTTP 200 OK responses with error payloads in the body, or using generic HTTP 400 Bad Request statuses for complex validation failures. Document the specific JSON paths required to extract the actual human-readable error message. Map provider error codes to the four categories you actually care about: retryable, auth_failure, client_error, permanent_data_error.

5. Webhook Verification Procedures: If the integration relies on inbound webhooks, document the exact cryptographic signature validation required (HMAC, JWT, Basic Auth). Specify replay protection mechanisms and verification challenge handling. Include a script to manually generate a signature to test your local webhook ingestion endpoints.

6. Schema Drift Hotspots: Document custom fields, custom objects, and fields that change shape based on the customer's edition or tier. For strategies on mitigating these shifts, see our guide on how to handle breaking API changes across 100+ SaaS integrations without code deploys.

Every section should answer two questions: What is the exact symptom? and What is the exact action? Vague guidance like "add retries" must be replaced with concrete code paths, tested in staging.

flowchart TD
  A[Alert fires] --> B{Auth lifecycle<br>covered?}
  B -- yes --> C{Rate limit model<br>covered?}
  B -- no --> X[Page integration owner]
  C -- yes --> D{Error category<br>known?}
  C -- no --> X
  D -- retryable --> E[Apply documented backoff]
  D -- auth_failure --> F[Trigger reauth flow]
  D -- permanent --> G[Quarantine + open ticket]

A runbook that doesn't let the on-call engineer reach a leaf node in under 60 seconds is a runbook that won't be used.

How to Create a Provider-Specific Runbook for Salesforce (With Tested Examples)

Salesforce is the undisputed heavyweight of CRM integrations. It is also the single most common source of "works in dev, fails at scale" pain. Three behaviors break naive runbooks:

SOQL Governor Limits Are Not Rate Limits: In synchronous Apex execution, the platform caps you at 100 SOQL queries and 50,000 returned records per transaction; asynchronous transactions get 200 queries with the same 50,000-record ceiling. Exceeding either does not produce a 429. It throws System.LimitException, and the transaction is gone. Your runbook needs an explicit "governor limit ≠ rate limit" callout.

OFFSET Pagination Is a Trap: One line that has cost teams entire weekends: the maximum SOQL OFFSET value is 2,000 rows. Deep pagination with LIMIT/OFFSET will silently fail past row 2000. Document the workaround (use WHERE Id > lastSeenId ORDER BY Id or the QueryLocator/Bulk API) and put it at the top of the pagination section.

OAuth Refresh Token Failures: Salesforce refresh tokens can be revoked by an admin, by password rotation, or by hitting the session policy limit. The error returned is invalid_grant with error_description: expired access/refresh token. Your runbook must say: do not retry, mark the account as needs_reauth, notify the customer.

For deeper architectural context on mapping custom fields, read our guide on how to handle custom fields and custom objects in Salesforce via API and how to handle custom Salesforce fields across enterprise customers.

Here is a tested runbook template and classification snippet for Salesforce:

# Runbook: Salesforce REST API
 
## 1. Authentication Failure Modes
Salesforce uses OAuth 2.0. The most common failure is the `invalid_grant` error during token refresh.
 
**Symptoms:**
- API returns HTTP 400 with `{"error": "invalid_grant", "error_description": "expired access/refresh token"}`
 
**Root Causes:**
- The user's Salesforce administrator revoked the OAuth app.
- The user's password expired (in some Salesforce org configurations, this invalidates refresh tokens).
- The org reached its limit of 5 active access tokens per user per connected app.
 
**Recovery Action:**
- The token cannot be recovered programmatically. 
- Trigger the `reauth_required` email flow to the customer.
- Mark the integrated account status as `needs_reauth` in the database.
 
## 2. Rate Limit Constraints
Salesforce enforces a rolling 24-hour limit based on the customer's license type.
 
**Detection:**
- Check the `Sforce-Limit-Info` header on successful responses.
- Format: `api-usage=25000/100000` (Used/Total).
- When exceeded, Salesforce returns HTTP 403 Forbidden with the error code `REQUEST_LIMIT_EXCEEDED`.
 
**Recovery Action:**
- Do NOT apply standard exponential backoff. The limit will not reset for up to 24 hours.
- Pause all sync jobs for this specific tenant.
- Alert the customer that they must contact their Salesforce Account Executive to purchase more API calls, or wait for the rolling window to clear.
 
## 3. SOQL Query Quirks and Custom Objects
Salesforce uses SOQL (Salesforce Object Query Language) instead of standard REST filters.
 
**Known Issue: MALFORMED_QUERY**
- If a customer deletes a custom field (e.g., `Internal_Score__c`) that our sync job is actively querying, Salesforce returns HTTP 400 `MALFORMED_QUERY`.
- **Recovery:** Invalidate the cached schema for this tenant. Re-run the field discovery job via the `/services/data/v59.0/sobjects/Contact/describe` endpoint to rebuild the valid field list.
 
**Known Issue: Query Timeout**
- SOQL queries time out if they take longer than 120 seconds, returning `QUERY_TIMEOUT`.
- **Recovery:** Reduce the `LIMIT` clause or add highly selective indexed filters (like `LastModifiedDate > yesterday`).
def classify_salesforce_error(response, body):
    if response.status_code == 401:
        return "auth_failure"  # token expired/revoked - trigger reauth
    if response.status_code == 403 and "REQUEST_LIMIT_EXCEEDED" in body:
        # 24h API request allocation exhausted - resets at org midnight
        return "rate_limit_daily"
    if response.status_code == 400 and "QUERY_TIMEOUT" in body:
        return "retryable_after_query_optimization"
    if "INVALID_FIELD" in body or "NO_SUCH_COLUMN" in body:
        return "schema_drift"  # custom field renamed/deleted
    return "unknown"

How to Create a Provider-Specific Runbook for NetSuite (With Tested Examples)

NetSuite is what happens when an ERP becomes a platform. It is notorious for its steep learning curve, relying heavily on SuiteQL, custom SuiteScripts, and aggressive concurrency limits. Integrating with NetSuite requires a completely different operational mindset than standard REST APIs. The runbook here is twice as long as everything else and three times as critical.

Concurrency, Not Rate Limits: The word "rate limit" doesn't really describe NetSuite. Concurrency governance regulates simultaneous requests against your account at any given moment. A base account in Service Tier 1 has a limit of 15 concurrent requests, which increases by 10 for each additional SuiteCloud Plus (SC+) license. An account's concurrency cap is shared across all of its integrations—SOAP, REST, and RESTlet calls combined. If your limit is 15 concurrent requests, the 16th arriving at the same millisecond is rejected immediately. Document this loud and clear: your shiny new integration competes with the customer's existing Boomi, Celigo, and Magento connectors for the same pool.

Authentication (TBA vs OAuth 2.0): The recommended pattern for high-volume integrations is to use TBA (OAuth 1.0a) or OAuth 2.0, not legacy session logins. NetSuite advises updating SOAP integrations to TBA to allow for more flexible concurrency. Your runbook must list the exact failure string for an expired TBA token and the different string for a revoked OAuth 2.0 grant.

REST vs SOAP vs SuiteQL: Document when to use each surface. SuiteQL is unbeatable for queryable reads, but single calls are capped at 100,000 rows. SOAP is needed for some legacy operations (like tax rate lookups). REST is the modern default. The runbook must specify which surface each resource uses, because the recovery path differs.

Monitoring: No NetSuite runbook is complete without pointing at the Concurrency Monitor at Setup > Integration > Concurrency Monitor, which provides a real-time and historical graph of concurrency usage. Look for peak rejections, where red bars indicate rejected requests. If your runbook just says "check NetSuite," you've failed.

For a comprehensive look at the underlying architecture required to support this runbook, review the final boss of ERPs: architecting a reliable NetSuite API integration.

Here is a tested runbook template for NetSuite:

# Runbook: Oracle NetSuite REST/SuiteTalk API
 
## 1. Authentication Failure Modes
NetSuite supports OAuth 2.0 and Token-Based Authentication (TBA). We use OAuth 2.0 Machine-to-Machine (M2M) where possible.
 
**Symptoms:**
- API returns HTTP 401 Unauthorized with `Invalid login attempt`.
 
**Root Causes:**
- The integration record in NetSuite was disabled by the administrator.
- The user's role was modified, removing the 'Log in using Access Tokens' permission.
 
**Recovery Action:**
- Escalate to the customer's NetSuite Administrator.
- Provide them with the exact path: Setup > Integration > Manage Integrations, and verify the state is 'Enabled'.
 
## 2. Concurrency Limit Constraints (The 429 Problem)
NetSuite limits the number of simultaneous requests a single account can make. This is the most common cause of failure.
 
**Detection:**
- API returns HTTP 429 Too Many Requests.
- The body contains: `{"error": {"code": "WS_CONCURRENCY_LIMIT_EXCEEDED"}}` or `SSS_REQUEST_LIMIT_EXCEEDED`.
 
**Recovery Action:**
- This is a short-term limit. Apply immediate exponential backoff (retry after 2s, 4s, 8s).
- If the error persists for more than 5 minutes, another integration in the customer's NetSuite environment is hogging the connection pool.
- Throttle our internal queue workers for this specific tenant to 1 concurrent request.
 
## 3. SuiteQL and Metadata Quirks
NetSuite's REST API does not expose all objects natively. We rely on the SuiteQL endpoint (`/query/v1/suiteql`) for deep data extraction.
 
**Known Issue: Multi-Subsidiary Context**
- If a query fails with `Record does not exist` for a record we know exists, it is a subsidiary routing issue.
- **Recovery:** Ensure the request header `Cookie` includes the correct active subsidiary, or that the OAuth token is scoped to a role with cross-subsidiary access.
 
**Known Issue: SOAP Fallbacks**
- Certain tax rates and legacy custom records cannot be queried via SuiteQL.
- **Recovery:** If the REST endpoint returns 404, the runbook must direct the system to fall back to the legacy SOAP web services endpoint (`/services/NetSuitePort_2023_1`).

How to Create a Provider-Specific Runbook for HubSpot (With Tested Examples)

HubSpot offers a modern, developer-friendly REST API, but it is the API teams underestimate most. The docs are clean, the SDKs are polished, and the rate limit will still ambush you through aggressive tiers and complex search payloads.

Two Independent Rate Limit Windows: For accounts on the standard tier, the daily limit is 650,000 requests per day; enterprise subscriptions get 1 million requests per day. The burst limit is typically 150 to 190 requests per 10 seconds. Meanwhile, the CRM search API is brutal and often overlooked—it is capped at 4 requests per second across all search endpoints. Two independent rolling windows mean two independent failure modes. Your runbook must distinguish them via the policyName field in the 429 body: DAILY vs SECONDLY.

Honor Retry-After, Not Exponential Backoff: HubSpot includes a Retry-After header in 429 responses; it is not a suggestion, it is a signal that your integration is violating rate limits and must pause. Naive exponential backoff across worker threads turns into a retry storm and burns more quota. Document this explicitly: read Retry-After, sleep for that many seconds, then resume.

Use filterGroups Instead of N+1 Queries: HubSpot's CRM Search API takes a filterGroups body that lets you batch up to 100 IDs per request, but only at 4 RPS. One search call returns 100 contacts. One hundred individual GETs burns 100 calls against your daily quota. The runbook should explicitly forbid the latter pattern.

POST /crm/v3/objects/contacts/search
{
  "filterGroups": [{
    "filters": [{
      "propertyName": "hs_object_id",
      "operator": "IN",
      "values": ["101", "102", "103"]
    }]
  }],
  "properties": ["email", "firstname", "lastname"],
  "limit": 100
}

Here is a tested runbook template for HubSpot:

# Runbook: HubSpot CRM API
 
## 1. Rate Limit Constraints (Multi-Tiered)
HubSpot enforces both a daily limit and a secondly burst limit.
 
**Detection:**
- API returns HTTP 429 Too Many Requests.
- Check the headers:
  - `X-HubSpot-RateLimit-Daily`: Total daily allocation.
  - `X-HubSpot-RateLimit-Daily-Remaining`: Calls left today.
  - `X-HubSpot-RateLimit-Secondly`: Burst limit (typically 100/10s or 150/10s).
  - `X-HubSpot-RateLimit-Secondly-Remaining`: Burst calls left.
 
**Recovery Action:**
- If `Daily-Remaining` is 0: Pause all syncs until midnight UTC. Alert the customer.
- If `Secondly-Remaining` is 0: Apply a hard sleep of the duration specified in the `Retry-After` header, then retry the request. Do not use standard exponential backoff.
 
## 2. Search API and FilterGroups
HubSpot uses a POST endpoint (`/crm/v3/objects/contacts/search`) for querying, using a complex `filterGroups` array.
 
**Known Issue: 400 Bad Request on Search**
- **Symptom:** Payload rejected with `Operator IN requires an array of values`.
- **Root Cause:** A query mapping attempted to pass a single string to an `IN` operator instead of an array.
- **Recovery:** Update the JSON mapping configuration to wrap single values in an array before sending to the search endpoint.
 
## 3. Pagination Quirks
HubSpot uses cursor-based pagination.
 
**Known Issue: 10,000 Record Limit**
- **Symptom:** The search endpoint refuses to return records past the 10,000th result, even with a valid cursor.
- **Recovery:** The runbook must instruct the sync job to switch sorting parameters. Sort by `createdate` ascending, fetch 10,000 records, note the last `createdate`, and start a new query filtering for `createdate > [last_date]`.

Turning Runbooks into Executable Configuration

Here is the unspoken trap with provider-specific runbooks: the moment you finish writing the Salesforce runbook, Salesforce ships an API change. Your runbook is now wrong. You've added engineering toil, not removed it. For a deeper look at managing these shifts, read our framework on how to survive API deprecations across 50+ SaaS integrations.

Writing and maintaining provider-specific runbooks is an excellent operational practice, but it is ultimately a manual patch for a systemic architectural problem. If your engineers are constantly referencing runbooks to write custom error handlers and backoff scripts, your integration architecture is too rigid.

The better model is to express provider quirks as data the runtime evaluates, not as Confluence pages a human reads at 3 AM. This is the design behind Truto's unified API layer. Truto eliminates the need for manual, code-heavy runbooks by handling provider-specific quirks purely as configuration data. The platform operates on a declarative JSONata architecture, meaning there is zero integration-specific code in the runtime logic.

Instead of writing a custom Node.js handler to catch Salesforce invalid_grant errors and another to catch NetSuite concurrency limits, Truto normalizes these behaviors at the platform level. Adding or fixing an integration is a data operation, not a code deploy. For a deep dive into this architecture, read zero integration-specific code: how to ship API connectors as data-only operations.

Here is how Truto automates the most painful parts of your runbooks:

Automated Error Normalization

Truto uses JSONata-based error expressions to map non-standard third-party errors into structured, predictable HTTP responses. A Salesforce 400 with INVALID_FIELD becomes a normalized schema_drift error your application code already knows how to handle. A NetSuite SSS_REQUEST_LIMIT_EXCEEDED becomes a standard 429. Your internal systems only ever have to handle standard HTTP errors, regardless of how badly the upstream provider formats them.

Proactive Token Management

OAuth token refresh failures are the leading cause of integration downtime. Truto eliminates this by automatically refreshing OAuth tokens with a 30-second buffer. Furthermore, the platform pre-schedules token refreshes 60 to 180 seconds before expiry. If a refresh fails, Truto automatically marks the account as needs_reauth and fires a standardized integrated_account:authentication_error webhook to your system, turning a 3 AM page into a customer email the next morning.

Standardized Rate Limit Headers

One deliberate trade-off worth being honest about: Truto does not automatically retry or absorb rate limit errors, because opaque retries lead to system gridlock. Instead, when an upstream API returns an HTTP 429, Truto passes that error to the caller but normalizes the upstream rate limit information into standardized IETF headers: ratelimit-limit, ratelimit-remaining, and ratelimit-reset. Your application can implement a single, unified backoff strategy that works across all 100+ integrations, whether you are talking to HubSpot, Salesforce, or NetSuite.

Warning

A unified API platform is not a substitute for an operational runbook. It's a substitute for writing the same runbook fifty times. You still need a documented escalation path, an on-call rotation, and customer-facing status communication.

Where to Go From Here

Stop writing prose runbooks that go stale the day after you ship them. The minimum viable provider-specific runbook is:

  1. A six-section template covering auth, pagination, rate limits, error normalization, webhooks, and schema drift.
  2. Tested code snippets for each error classification, not English prose.
  3. Direct links to the provider's monitoring surfaces (Concurrency Monitor for NetSuite, API call usage dashboard for HubSpot, Event Log for Salesforce).
  4. A clear ownership model—who updates this runbook when the provider ships a breaking change?

The further step is moving the recovery logic itself into configuration. Whether you build that platform internally or use one off the shelf, the goal is the same: the runtime knows what to do when HubSpot returns a DAILY 429, when Salesforce throws INVALID_GRANT, or when NetSuite rejects the 11th concurrent call. Your on-call engineer reads the runbook for escalation, not for recovery.

If your team is spending more than a sprint a month firefighting provider-specific quirks, the math on building this layer in-house is already against you. Talk to us about how Truto handles all of the above as declarative configuration, so your runbooks can shrink instead of grow.

FAQ

What should a provider-specific API runbook always include?
Six sections: auth lifecycle (token type and refresh failure modes), pagination contract, rate limit model, error normalization, webhook verification, and schema drift hotspots. Each section must answer 'exact symptom' and 'exact action' - not vague guidance like 'add retries.'
Why doesn't exponential backoff work for HubSpot 429 errors?
HubSpot returns a Retry-After header that indicates exactly how long to wait. Exponential backoff applied across multiple worker threads creates a retry storm that burns more daily quota and worsens the lockout. Always honor Retry-After instead.
How is NetSuite's rate limiting different from a normal API rate limit?
NetSuite governs concurrency, not requests per second. A base Service Tier 1 account allows 15 simultaneous requests across all integrations (SOAP, REST, RESTlet combined), increasing by 10 per SuiteCloud Plus license. Exceeding it returns a 429 immediately.
What's the safest way to paginate Salesforce SOQL queries?
Avoid OFFSET pagination beyond 2,000 rows - that's the hard cap. Use keyset pagination with WHERE Id > lastSeenId ORDER BY Id, or switch to the Bulk API/QueryLocator for very large datasets. Document this prominently in your Salesforce runbook because the failure mode is silent truncation.

More from our Blog