OAuth at Scale: The Architecture of Reliable Token Refreshes
How Truto's OAuth token refresh architecture powers the Unified Calendar API - covering concurrency, proactive refresh, rate limit normalization, and provider capabilities for Google, Outlook, and Calendly.
If you have ever built a Salesforce integration in a weekend, you know the feeling of triumph when that first 200 OK comes back. You store the access token, maybe toss the refresh token into a database column, and call it a day.
Then, three months later, your error logs light up.
Tokens are expiring mid-sync. Refresh requests are hitting race conditions because two background jobs tried to refresh the same token simultaneously. A customer changed their password, revoking all tokens, and your app is still hammering the API until you get rate-limited.
At Truto, we maintain connections to over 100 different SaaS platforms—from HRIS systems like Workday to CRMs like HubSpot. We process millions of API requests, and every single one relies on a valid, fresh credential.
We learned the hard way that managing OAuth token lifecycles is not a storage problem; it is a distributed systems problem.
Here is the engineering deep dive into how we architected a token refresh system that handles concurrency, proactive renewal, and graceful failure at scale.
The "Happy Path" is a Lie
The OAuth 2.0 spec provides a framework, but every provider implements it with their own chaotic flair.
- Expiry Times: Some tokens last 1 hour, some 24 hours, some never expire until used.
- Refresh Behavior: Some providers rotate the refresh token every time you use it (refreshing the refresh token). If you fail to capture the new one due to a network blip, you are locked out forever.
- Concurrency: If two processes try to refresh the same token at the exact same second, many providers will invalidate both requests, assuming a replay attack.
To handle this, we moved away from simple "check and refresh" logic to a multi-layered architecture involving proactive alarms, mutex locks, and self-healing state machines.
Layer 1: The Two-Pronged Refresh Strategy
Waiting for a token to expire before refreshing it is a recipe for latency and failed user requests. Conversely, refreshing it too aggressively wastes API quota. We use a hybrid approach: Proactive Alarms and Just-in-Time (JIT) Checks.
1. Proactive Refresh (The Background Worker)
Whenever a token is created or updated, we schedule work in our auth layer to run 60 to 180 seconds before the token expires - per account, not on a coarse global cron.
Why the randomization? To prevent thundering herds. If 10,000 accounts were connected at 9:00 AM, we don't want 10,000 refresh requests firing exactly at 9:59 AM. Spreading the load ensures stability.
When that scheduled refresh runs, it negotiates a new token, updates durable storage, and re-encrypts the new credentials.
2. Just-in-Time Safety Net
We cannot rely solely on background jobs. Clocks drift, schedulers miss edge cases, and sometimes a token expires faster than the provider claimed.
Before every single API request—whether it's a proxy call or a sync job—our infrastructure checks the token's validity. We use a 30-second buffer logic:
// Simplified logic: treat token as expired if it expires in the next 30s
if (token.expiresAt < (now + 30_seconds)) {
await refreshCredentials();
}This buffer is critical. Without it, a token might be valid when the check runs but expire 100ms later while the request is in flight to Salesforce.
Layer 2: Solving Concurrency with Mutex Locks
This is where most in-house integrations fail.
Imagine a scenario:
- A scheduled sync job starts for Customer A.
- Simultaneously, Customer A triggers a manual "Test Connection" in your UI.
- Both processes see the token is about to expire.
- Both processes fire a request to
POST /oauth/token.
The Result: The provider receives two refresh requests. It processes the first, invalidates the old refresh token, and issues a new one. Then it processes the second request (using the now-invalid old refresh token), throws an error, and potentially revokes the entire grant for security reasons. Your customer is now disconnected.
The Solution: Per-account serialized refresh (mutex)
To prevent this, we wrap the refresh operation in a mutex (mutual exclusion) lock scoped to each integrated account.
Conceptually, each account has a single serializer for refresh work. When a refresh is requested:
- The request attempts to acquire a lock for that specific
integrated_account_id. - If no operation is in progress, it proceeds, arms a short watchdog timeout (e.g., 30s) so a stuck provider cannot block the account forever, and executes the refresh.
- If a refresh is already running, the second request simply awaits the same in-flight operation and reuses its result.
This ensures that no matter how many concurrent callers need a fresh token, we only send one HTTP request to the provider. Everyone else gets the outcome of that single successful call.
Why not just use a database row lock? Row locks and external stores like Redis work well for many teams. We optimized for strictly serialized refresh per account with very low coordination overhead so concurrent proxy traffic, sync jobs, and scheduled refresh all converge on one refresh flight without tuning a separate lock service for every deployment shape.
Layer 3: Handling "Invalid Grant" and Re-Auth
Sometimes, refresh fails. The user might have uninstalled the app, changed their password, or an admin might have revoked the token.
When we receive a fatal error (like HTTP 400 invalid_grant or HTTP 401 Unauthorized), retrying is futile. We need to involve the human.
The needs_reauth State Machine
- Detection: We catch
invalid_granterrors specifically. Transient errors (HTTP 500s) trigger a retry with exponential backoff. Auth errors trigger a state change. - State Update: The account status is flipped from
activetoneeds_reauth. - Notification: We fire a webhook event:
integrated_account:authentication_error.
This allows our customers to listen for this event and immediately show a "Reconnect" banner in their UI.
Auto-Reactivation
We also support auto-reactivation. If an account is in needs_reauth but a subsequent API call succeeds (perhaps the user fixed the issue on the provider side, or it was a false positive from the API), we automatically flip the status back to active and fire a integrated_account:reactivated event.
Security at Rest
Storing thousands of access and refresh tokens requires paranoia. As part of our strict security standards, we never store tokens in plain text.
- Encryption: All sensitive fields (
access_token,refresh_token,client_secret) are encrypted using AES-GCM before hitting the database. - Masking: When developers list accounts via our API, these fields are masked. They are only decrypted inside the secure enclave of the refresh service just before being used.
How This Powers the Unified Calendar API
Everything above - proactive refreshes, mutex locks, reauth detection - is the foundation that keeps Truto's Unified Calendar API running reliably. The Unified Calendar API is a real-time pass-through: when you call GET /unified/calendar/events, Truto calls Google or Microsoft in the same request cycle, maps the response into a normalized schema, and returns it. No calendar data is cached or stored.
This means every single call depends on a valid OAuth token being available instantly. There is no fallback to a local cache if the token has expired. Calendar providers are a particularly demanding stress test for token management - they combine short-lived tokens (typically ~1 hour for both Google and Microsoft) with high-frequency access patterns like availability polling, event sync, and webhook-driven updates across potentially thousands of connected accounts.
The Unified Calendar API normalizes six core entities across providers: Calendars (container for time-based entries), Events (appointments and meetings), Availability (free/busy windows), Contacts (attendees and participants), EventTypes (booking templates for providers like Calendly), and Attachments (files linked to events). You write your integration logic once against this schema; Truto handles the per-provider translation using JSONata expressions at runtime.
Authentication & OAuth for Calendar Providers
Each calendar provider has its own OAuth configuration, scopes, and token behavior. Truto handles the full lifecycle - authorization redirect, token exchange, encrypted storage, and proactive refresh - so your application never touches provider credentials directly.
| Provider | Auth Type | Token Lifetime | Typical Scopes |
|---|---|---|---|
| Google Calendar | OAuth 2.0 | ~1 hour | calendar.readonly, calendar.events, calendar.calendars |
| Outlook Calendar | OAuth 2.0 (Microsoft identity platform) | ~1 hour | Calendars.ReadWrite, User.Read |
| Calendly | OAuth 2.0 | Varies | Read/write access to scheduling data |
| Cal.com | API Key | No expiry | N/A (static key) |
For Google Calendar, Truto's OAuth app is CASA Tier 2 certified - your customers can connect without hitting Google's unverified app warnings or connection limits. For Microsoft, Truto handles the Microsoft identity platform flow including tenant-specific authorization endpoints.
API-key-based providers like Cal.com skip the token refresh cycle entirely. The key is stored encrypted and injected into every request.
The proactive refresh architecture described in Layer 1 applies identically to calendar tokens. Google Calendar tokens expire roughly every hour. Without pre-expiry refresh, a scheduled availability check could fail mid-polling because the token expired between the first and second page of results. The 60-180 second pre-expiry refresh window prevents this.
Bring Your Own OAuth App
Truto provides a default OAuth app for each calendar provider so you can start building immediately. For production, you will likely want to use your own.
You supply your OAuth client credentials (client ID and secret) at the environment integration level - a pure configuration change, no code deployment. From that point, Truto uses your OAuth app for all authorization flows with that provider. The connection flow through Truto's Link SDK shows your app name and branding to the end user.
Why this matters for calendar integrations specifically:
- Your consent screen, your brand. Users see your app name when granting calendar access, not a third party's.
- Credential portability. If you ever switch providers, the OAuth tokens belong to your app. You migrate without forcing every customer to re-authenticate.
- Scope control. You decide exactly which calendar scopes to request. If your product only reads events, drop write scopes entirely.
The token refresh, encryption, and concurrency protections described in the layers above work identically regardless of whether you use Truto's OAuth app or your own.
Rate Limit Handling and Normalized Headers
Calendar providers signal rate limits in provider-specific ways. Google returns HTTP 429 with a Retry-After header. Microsoft Graph uses 429 with Retry-After plus custom RateLimit-* headers. Other providers get creative.
Truto normalizes all of this. Regardless of what the provider returns, your application sees the same headers:
| Header | Description | Example Value |
|---|---|---|
Retry-After |
Seconds until you can retry | 30 |
ratelimit-limit |
Total requests allowed in the window | 600 |
ratelimit-remaining |
Requests remaining | 42 |
ratelimit-reset |
Seconds until the window resets | 58 |
If the provider signals a rate limit - even via a non-429 status code or a custom header - Truto always returns HTTP 429 to your application. One rate-limit handler covers Google, Microsoft, Calendly, and every other integration.
These headers are also returned on successful (2xx) responses, so you can monitor your rate limit headroom proactively without waiting to get throttled.
Retry and Backoff Strategy
When you receive a 429 from Truto, here is a practical client-side pattern:
async function callWithBackoff(
request: () => Promise<Response>,
maxRetries = 5
): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await request();
if (response.status !== 429) return response;
const retryAfter = parseInt(
response.headers.get('Retry-After') || '0', 10
);
const backoff = retryAfter > 0
? retryAfter * 1000
: Math.min(1000 * Math.pow(2, attempt), 30_000);
// Add jitter to avoid synchronized retry storms
const jitter = Math.random() * 1000;
await new Promise(r => setTimeout(r, backoff + jitter));
}
throw new Error('Max retries exceeded');
}Key points:
- Always prefer
Retry-Afterwhen present. The value comes from the provider, normalized to seconds by Truto, and reflects actual quota recovery timing. - Fall back to exponential backoff when
Retry-Afteris absent: 1s, 2s, 4s, 8s, 16s, capped at 30s. - Add jitter if you have many workers hitting the same calendar provider. Without it, retries synchronize and create a second thundering herd.
Provider Capabilities: Proxy vs Unified
Not every calendar provider exposes the same operations. This matrix shows what is available through Truto's Unified Calendar API (normalized schema at /unified/calendar/*) vs the Proxy API (raw provider pass-through at /proxy/*):
| Operation | Google Calendar | Outlook Calendar | Calendly | Cal.com |
|---|---|---|---|---|
| List calendars | Unified + Proxy | Unified + Proxy | Unified + Proxy | Unified + Proxy |
| List events | Unified + Proxy | Unified + Proxy | Unified + Proxy | Unified + Proxy |
| Get event | Unified + Proxy | Unified + Proxy | Unified + Proxy | Unified + Proxy |
| Create event | Unified + Proxy | Unified + Proxy | Via EventTypes | Unified + Proxy |
| Update event | Unified + Proxy | Unified + Proxy | Limited | Unified + Proxy |
| Delete event | Unified + Proxy | Unified + Proxy | Cancel only | Unified + Proxy |
| Availability (free/busy) | Unified + Proxy | Unified + Proxy | Unified + Proxy | Unified + Proxy |
| Event types | N/A | N/A | Unified + Proxy | Unified + Proxy |
| Attachments | Unified + Proxy | Unified + Proxy | N/A | N/A |
| Contacts/Attendees | Unified + Proxy | Unified + Proxy | Unified + Proxy | Unified + Proxy |
| Webhooks | Unified | Unified | Unified | N/A |
"Unified + Proxy" means the operation is available through both the normalized Unified Calendar API and the raw Proxy API. The Proxy API gives you access to every provider-specific field and parameter. The Unified API normalizes responses into Truto's standard calendar schema so you can swap providers without changing client code. Operations marked "N/A" mean the provider does not support that concept at the API level.
Summary: The Checklist for Reliable Auth
If you are building this in-house, ensure your architecture covers these bases:
- Buffer your expiry checks: Don't wait for
0seconds remaining. - Serialize your refreshes: Never let two threads refresh the same token.
- Handle revocation gracefully: Distinguish between "API is down" (retry) and "Token is dead" (alert user).
- Secure your storage: Encrypt at rest, always.
Or, you can offload this entirely. At Truto, we treat authentication infrastructure as a core product, so you can focus on the data, not the handshake.
FAQ
- What is the Truto Unified Calendar API?
- The Truto Unified Calendar API is a real-time pass-through API that normalizes calendar operations (events, availability, calendars, contacts, event types, attachments) across Google Calendar, Outlook Calendar, Calendly, and Cal.com into a single consistent schema. No calendar data is cached or stored - Truto calls the provider in real time, maps the response, and returns it.
- How does Truto handle OAuth token refresh for calendar providers?
- Truto uses a two-pronged strategy: proactive refresh (scheduled 60-180 seconds before token expiry) and just-in-time checks (with a 30-second buffer before every API call). A per-account mutex lock prevents race conditions when multiple requests try to refresh the same token simultaneously.
- Can I use my own OAuth app with Truto's Calendar API?
- Yes. Truto provides a default OAuth app to get started quickly, but you can supply your own client credentials at the environment integration level. This gives you control over consent screen branding, scope selection, and credential portability if you ever switch providers.
- How does Truto normalize rate limits across calendar providers?
- Truto standardizes all rate limit signals into HTTP 429 responses with consistent headers: Retry-After (seconds), ratelimit-limit, ratelimit-remaining, and ratelimit-reset. These headers are returned on both error and success responses, so you can write one rate-limit handler for all calendar providers.
- Which calendar providers does Truto support?
- Truto supports Google Calendar, Outlook Calendar, Calendly, and Cal.com through both the Unified Calendar API (normalized schema) and the Proxy API (raw pass-through). Apple Calendar (iCloud) is not supported due to CalDAV protocol limitations including lack of OAuth and webhooks.