Skip to content

Developer Tutorial: Pulling User Lists End-to-End From Any SaaS API

A complete developer tutorial for programmatically extracting user directories, roles, and access levels from any SaaS API using a Unified Directory architecture.

Yuvraj Muley Yuvraj Muley · · 14 min read
Developer Tutorial: Pulling User Lists End-to-End From Any SaaS API

If your engineering team is tasked with pulling user lists, roles, and access levels from every SaaS application your customers use, you are looking at a massive architectural hurdle. B2B software buyers expect your platform to automatically discover who has access to their tools. If you intend to publish a developer tutorial or build this internally, pulling user lists end-to-end requires moving beyond one-off API scripts and adopting a highly scalable, unified architecture.

Building this yourself, one integration at a time, is a trap. You are sitting in a sprint planning meeting. Your product manager drops a seemingly simple requirement: "Our customers need to pull a list of users from all their SaaS apps so we can run automated access reviews."

You do the math. According to ElectroIQ's 2024 data, the average company uses 130 different SaaS applications, and 75% of employees are expected to acquire, modify, or create technology without IT's oversight by 2027. Your engineering team can realistically build, test, and maintain a few production-grade API integrations per quarter. Building point-to-point connectors for over a hundred platforms means this feature will sit on the roadmap for years. You are staring down terrible vendor API documentation, aggressive rate limits, and undocumented edge cases.

Identity sprawl is directly breaking product roadmaps and introducing massive security risks. 58% of organizations struggle to enforce proper privilege management, and 51% lack the ability to properly offboard dormant identities and accounts, according to the 2025-2026 State of SaaS Security report from Valence Security and the Cloud Security Alliance. Tripwire's 2025 State of SaaS Security report notes that 41% of breaches are caused by overprivileged accounts. Furthermore, 31% of companies have experienced former employees accessing assets stored in SaaS applications after departure (Security Magazine). Every orphaned account is a breach vector waiting to be exploited.

This guide provides a practical, step-by-step technical blueprint for programmatically extracting user directories from disparate SaaS APIs without writing bespoke integration code for each vendor.

Why You Need a Unified Approach to Pulling User Lists

The core problem is straightforward: user data is scattered across hundreds of SaaS platforms, each with its own API, authentication scheme, data model, and pagination strategy.

Here is what the math actually looks like for a mid-stage B2B SaaS company building an access review or directory sync feature:

Factor Reality
Avg. SaaS apps per customer 100+
Engineering time per connector 2-4 weeks (auth + mapping + pagination + testing)
Ongoing maintenance per connector ~20% of initial build effort annually
API documentation quality Wildly inconsistent
Breaking changes per year 2-5 per vendor on average

Multiply that across even 50 integrations and you have consumed your entire engineering roadmap for years. This is why teams adopt a Unified Directory API approach - a single interface that normalizes user data from many providers into a common schema.

The architectural pattern looks like this:

flowchart LR
    A["Your Application"] -->|Single API Call| B["Unified Directory API"]
    B --> C["Salesforce"]
    B --> D["Zendesk"]
    B --> E["Jira"]
    B --> F["Slack"]
    B --> G["100+ more..."]

Instead of writing 100 integrations, you write one. The normalization layer handles the ugly differences between providers.

The Limitations of SCIM and Standard IdP Integrations

The standard engineering reflex is to integrate with Okta, Microsoft Entra ID, or Google Workspace and call it a day. If a customer wants user data, they should just push it via the System for Cross-domain Identity Management (SCIM) protocol.

This approach fails upon contact with reality. Connecting to major Identity Providers (IdPs) is table stakes, but it leaves a massive blind spot across the long tail of identity. If you have built SCIM integrations before, you already know the primary failure modes:

1. The SSO Tax limits coverage. Lots of applications do not support SCIM at all. For those that do, vendors often lock it behind expensive enterprise tiers. As noted by Zluri in their 2025 analysis, most SaaS vendors bundle SCIM with SSO in their enterprise pricing tiers, meaning buyers must upgrade to plans that cost two to four times the base price just to enable it.

Specific examples of this SCIM gatekeeping include:

  • Salesforce supports SCIM 2.0 provisioning, but only on Enterprise ($175/user/month) and Unlimited ($350/user/month) editions.
  • ServiceNow requires Enterprise plans ($50-75/user/month at scale).
  • Atlassian gates SCIM behind an Atlassian Guard subscription that stacks on top of your existing Jira and Confluence licenses.

Your mid-market customers will simply not have SCIM enabled for half their stack.

2. SCIM is a push protocol, not a pull protocol. As covered in our guide to directory integrations, SCIM automates the exchange of user identity data from the client to the service provider. It was designed to automate the provisioning lifecycle - pushing user creates, updates, and deactivates from an IdP to a service provider.

It was not designed to answer the question: "Who has access to this app right now, and what permissions do they have?" It is not designed to audit existing access, discover shadow IT, or pull granular, application-specific permissions that were assigned manually by an administrator inside the app.

If a department head buys a specialized design tool on a corporate credit card and manually invites five team members, Okta has no idea those accounts exist. To get an accurate picture of identity, you must query the application's API directly.

Step 1: Authenticating Across Multiple SaaS Platforms

The first wall you hit when pulling user data from multiple SaaS APIs is authentication. Every vendor has a different scheme, and even vendors using the "same" standard (OAuth 2.0) implement it differently.

Here is what you are actually dealing with across a typical integration portfolio:

  • OAuth 2.0 Authorization Code Flow - Salesforce, HubSpot, Slack, Microsoft 365
  • OAuth 2.0 Client Credentials - ServiceNow, some Atlassian endpoints
  • API Key in Header - Zendesk, many smaller SaaS tools
  • Bearer Token with Custom Headers - Vendors that require extra tenant identifiers or custom auth expressions alongside a token
  • Session-Based Auth - Legacy apps that require a login step to obtain a session cookie

Managing OAuth state is notoriously difficult. Access tokens expire. Refresh tokens get revoked. If you build this in-house, your database will quickly fill up with token refresh logic, encrypted credential storage, and state management tables.

A well-designed abstraction layer stores the auth configuration declaratively - specifying the OAuth URLs, scopes, and token endpoints per provider - and then executes the correct flow at runtime without integration-specific code.

Using a unified API platform like Truto abstracts this away entirely. Truto handles the entire OAuth token lifecycle securely. The platform schedules work ahead of token expiry, automatically refreshing OAuth tokens shortly before they expire without manual intervention.

Here is a visualization of how a unified platform handles the OAuth flow while keeping your application logic clean:

sequenceDiagram
    participant User
    participant Your App
    participant Truto
    participant SaaS API (e.g., Salesforce)
    
    User->>Your App: Clicks "Connect Salesforce"
    Your App->>Truto: Request Link Token<br>(Tenant ID)
    Truto-->>Your App: Returns Secure UI URL
    Your App->>User: Redirect to Auth UI
    User->>SaaS API: Approves OAuth Scopes
    SaaS API-->>Truto: Returns Auth Code
    Truto->>SaaS API: Exchanges Code for Tokens
    Truto-->>Your App: Webhook: Connection Successful
    Note over Truto,SaaS API: Truto autonomously refreshes<br>tokens before they expire

When you need to pull users, you do not pass tokens. You simply pass the Truto Tenant ID (or Account ID), and the platform injects the correct, unexpired credentials into the outbound request.

Here is what a typical authenticated request looks like from the caller's perspective when using a unified API:

curl -X GET "https://api.truto.one/unified/directory/users" \
  -H "Authorization: Bearer YOUR_TRUTO_TOKEN" \
  -H "X-Integrated-Account-ID: customer_salesforce_account_id"

One request. One auth header. The unified layer resolves which provider, which auth scheme, and which credentials to use for that specific customer's connected account.

Step 2: Mapping Disparate APIs to a Unified Directory Schema

Once authenticated, you face the data modeling problem. Every SaaS application structures its user object differently.

A raw Salesforce user response looks like this:

{
  "Id": "0055e000001VbO2AAK",
  "Username": "jdoe@example.com",
  "LastName": "Doe",
  "FirstName": "John",
  "IsActive": true,
  "ProfileId": "00e5e000000Qk12AAC"
}

A raw Zendesk user response looks like this:

{
  "id": 9873843,
  "name": "John Doe",
  "email": "jdoe@example.com",
  "role": "agent",
  "suspended": false,
  "organization_id": 443212
}

Jira returns displayName, emailAddress, and groups. Slack gives you real_name, profile.email, and is_admin.

If you write custom code to parse these, you are building technical debt. Your downstream code needs a custom parser per provider. Instead, you need a declarative mapping layer. By shipping connectors as data-only operations, you can normalize these payloads into a single, predictable Unified Directory schema without writing integration-specific code.

Using JSONata (a lightweight query and transformation language for JSON), the platform maps the disparate fields into a common model:

{
  "id": "usr_abc123",
  "remote_id": "0055e000001VbO2AAK",
  "first_name": "John",
  "last_name": "Doe",
  "email": "jdoe@example.com",
  "role": "00e5e000000Qk12AAC",
  "status": "active",
  "department": "Engineering",
  "groups": [
    { "id": "grp_1", "name": "Platform Team" }
  ],
  "created_at": "2024-03-15T09:00:00Z",
  "raw": { /* original provider response */ }
}

The raw field is important. Any normalization layer that throws away the original response is hiding data you will eventually need. Custom fields, provider-specific permission objects, license tier information - your compliance team will ask for it, and it should be there.

flowchart TB
    subgraph Providers
        SF["Salesforce<br>Username, Profile.Name,<br>UserRole.Name"]
        ZD["Zendesk<br>email, role,<br>organization_id"]
        JR["Jira<br>displayName,<br>emailAddress, groups"]
    end
    subgraph Mapping["Declarative Mapping Layer"]
        M["Field-level<br>transformation expressions<br>(JSONata)"]
    end
    subgraph Output["Unified Schema"]
        U["first_name, last_name,<br>email, role, status,<br>department, groups"]
    end
    SF --> M
    ZD --> M
    JR --> M
    M --> U

When evaluating unified APIs, always check whether the mapping layer is extensible. Can you add custom field mappings per customer? Enterprise Salesforce orgs routinely use custom permission sets and profile structures that will not match any static schema.

This means your application only ever writes one code path. You request /unified/users from the platform, and you receive the exact same JSON structure whether the underlying tool is Jira, Salesforce, Zendesk, or GitHub.

Step 3: Handling Pagination and Rate Limits Gracefully

Extracting a directory of 10,000 users will immediately trigger API rate limits and require strict pagination handling.

Pagination is never consistent across providers. You will encounter:

  • Cursor-based pagination (Slack, Stripe) - uses an opaque token to fetch the next page
  • Offset-based pagination (Salesforce SOQL, many REST APIs) - uses offset and limit parameters
  • Link header pagination (GitHub, some HRIS systems) - next page URL in the response headers
  • Page number pagination (older APIs) - simple page=N parameters

A unified platform normalizes pagination. You pass a standard cursor parameter to the unified endpoint, and the platform translates that into the specific pagination style required by the upstream provider. You request the first page, get a next_cursor in the response, and pass it back for the next page.

Rate limits require a different approach. Every SaaS vendor enforces different limits (Salesforce gives you a daily API call budget, Slack uses per-method per-workspace limits, Zendesk uses per-minute windows), and exceeding them means dropped requests or temporary bans.

Warning

Architectural Note on Rate Limits Truto does not retry, throttle, or apply backoff on rate limit errors. When an upstream API returns an HTTP 429 (Too Many Requests), Truto passes that error directly to the caller.

Absorbing rate limits sounds helpful, but it creates unpredictable latency spikes and masks underlying architectural issues in your sync engine. Silent retries inside a middleware layer make it impossible for you to implement intelligent request scheduling.

Instead, Truto normalizes upstream rate limit information into standardized headers per the IETF specification:

  • ratelimit-limit: The maximum number of requests permitted in the current window.
  • ratelimit-remaining: The number of requests remaining in the current window.
  • ratelimit-reset: The time at which the current rate limit window resets.

Your application is responsible for reading these headers and implement retry and exponential backoff logic.

Here is a practical example of how a senior engineer should handle user extraction with standard backoff logic in Node.js:

async function extractUsersWithBackoff(tenantId, cursor = null) {
  let url = `https://api.truto.one/unified/directory/users`;
  if (cursor) url += `?cursor=${cursor}`;
 
  try {
    const response = await fetch(url, {
      headers: {
        'Authorization': `Bearer ${process.env.TRUTO_API_KEY}`,
        'x-integrated-account-id': tenantId
      }
    });
 
    if (response.status === 429) {
      // Read the standardized IETF headers
      const resetTime = response.headers.get('ratelimit-reset');
      const waitSeconds = resetTime ? 
        Math.max(1, resetTime - Math.floor(Date.now() / 1000)) : 5;
      
      console.log(`Rate limited. Waiting ${waitSeconds} seconds.`);
      await new Promise(resolve => setTimeout(resolve, waitSeconds * 1000));
      
      // Retry the exact same request
      return extractUsersWithBackoff(tenantId, cursor);
    }
 
    if (!response.ok) throw new Error(`API Error: ${response.status}`);
 
    const data = await response.json();
    return data;
 
  } catch (error) {
    console.error('Extraction failed:', error);
    throw error;
  }
}

If your stack is Python, here is a minimal retry handler using the same normalized headers:

import time
import requests
 
def fetch_users_with_retry(url, headers, max_retries=3):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)
        
        if response.status_code == 429:
            reset_time = int(response.headers.get('ratelimit-reset', 5))
            # Use reset time if available, otherwise exponential backoff
            backoff = min(reset_time, 2 ** attempt * 2)
            print(f"Rate limited. Retrying in {backoff}s...")
            time.sleep(backoff)
            continue
        
        response.raise_for_status()
        return response.json()
    
    raise Exception("Max retries exceeded")

This pattern ensures that your extraction pipeline respects the upstream provider's infrastructure while maintaining high throughput.

Step 4: Automating the Extraction Pipeline

With authentication, data mapping, and rate limits handled, the final step is orchestrating the extraction pipeline. User directories are not static. Employees join, leave, and change roles daily. Pulling user lists once is a demo. Pulling them continuously on a schedule, reliably, is a production feature.

Here is the extraction pipeline architecture you should target:

sequenceDiagram
    participant Scheduler as Your Scheduler
    participant App as Your App
    participant API as Unified Directory API
    participant SaaS as Upstream SaaS

    Scheduler->>App: Trigger extraction (cron / webhook)
    App->>API: GET /unified/directory/users
    API->>SaaS: Fetch users (with auth, pagination)
    SaaS-->>API: User data (page 1)
    API-->>App: Normalized users + next_cursor
    loop Until all pages fetched
        App->>API: GET /unified/directory/users?cursor=...
        API->>SaaS: Fetch next page
        SaaS-->>API: User data (page N)
        API-->>App: Normalized users + next_cursor
    end
    App->>App: Diff against last snapshot
    App->>App: Flag new, removed, or changed users
    App->>App: Trigger alerts or workflows

The key elements of a production pipeline:

  1. Scheduled triggers: Run extraction on a cadence that matches your compliance requirements (e.g., daily for SOC 2 access reviews, hourly for near-real-time offboarding alerts). A distributed task queue (like Celery, Temporal, or BullMQ) triggers a sync job for a specific tenant.
  2. Incremental syncing: Use updated_after filters when supported by the upstream API to avoid re-fetching the entire user list on every run.
  3. Extraction Loop: The worker calls the unified endpoint, following the normalized next_cursor until it returns null. It respects the ratelimit-reset headers if an HTTP 429 is encountered.
  4. Snapshot diffing: The worker compares the extracted unified user list against the state stored in your database. It flags new accounts, suspended accounts, and role changes.
  5. Alerting on anomalies: If shadow IT is detected (e.g., an active user in Zendesk who is suspended in Okta), the system generates an alert for the security team. Orphaned accounts and privilege escalations should trigger immediate notifications.

Do not store raw user data longer than you need it. If you are building access review features, your customers' security teams will ask about your data retention policies. Design for minimal data retention from the start - it is far easier than retrofitting it later.

For the extraction itself, a practical implementation loops through all of a customer's connected accounts and pulls the user list from each:

import requests
 
def extract_all_users(api_base, token, integrated_accounts):
    all_users = []
    for account in integrated_accounts:
        cursor = None
        while True:
            params = {"cursor": cursor} if cursor else {}
            resp = requests.get(
                f"{api_base}/unified/directory/users",
                headers={
                    "Authorization": f"Bearer {token}",
                    "X-Integrated-Account-ID": account["id"]
                },
                params=params
            )
            # Handle rate limits (see retry handler above)
            data = resp.json()
            all_users.extend(data["results"])
            cursor = data.get("next_cursor")
            if not cursor:
                break
    return all_users

This script pulls normalized user objects from Salesforce, Zendesk, Jira, Slack, and whatever other accounts the customer has connected - all through the same code path, the same schema, and the same pagination interface.

Handling Complex Upstream APIs with the Proxy API

Occasionally, you will encounter modern SaaS platforms (like Linear or modern GitHub endpoints) that expose their data exclusively via GraphQL. Integrating a GraphQL API into a standard RESTful extraction pipeline usually requires writing custom query builders and managing completely different operational logic.

A top-tier unified API platform solves this by providing a Proxy API that exposes GraphQL-backed integrations as RESTful CRUD resources. The platform handles the placeholder-driven request building and response extraction behind the scenes. Your extraction worker simply makes a standard GET /proxy/linear/users request, and the platform translates that into the necessary GraphQL query, returning a flat JSON array that fits perfectly into your existing pipeline.

What This Architecture Actually Buys You

Let's be direct about the trade-offs. Using a unified API for user list extraction is not free of downsides:

  • You are adding a dependency. If the unified API goes down, your extraction pipeline stops. Evaluate uptime SLAs carefully.
  • Edge cases exist. Some providers expose user data through non-standard endpoints (GraphQL-only APIs, SOAP services, custom report endpoints). Make sure your provider covers these through flexible proxy layers.
  • Custom fields need custom mappings. A static unified schema will not capture your customer's custom Salesforce permission sets or Workday business process security groups. You need a layer that supports per-customer mapping overrides.

What you do get in return:

  • Months of engineering time back. Instead of building 50+ auth flows, pagination handlers, and schema parsers, you build one integration against a unified API.
  • Consistent security posture. Credential management, token refresh, and encryption are handled in one place instead of scattered across dozens of connector implementations.
  • Faster time-to-coverage. Adding a new SaaS provider to your access review feature takes hours instead of sprints.

For teams building access review features, compliance dashboards, or automated offboarding workflows, this architecture is the difference between shipping in weeks and shipping in years.

Next Steps for Your Integration Roadmap

Pulling user lists end-to-end across a highly fragmented SaaS ecosystem requires discipline. Relying on SCIM will leave you blind to shadow IT. Building point-to-point API connectors will drain your engineering resources and leave you managing a fragile web of OAuth tokens and custom pagination loops.

The practical next steps are straightforward:

  1. Audit your coverage gap. List every SaaS app your customers need user data from. Check which ones support SCIM, which offer a REST API, and which have no programmatic access at all.
  2. Evaluate whether a unified API covers your providers. No abstraction layer covers everything. Identify the gap between what is available out of the box and what you would need to build yourself.
  3. Prototype the extraction pipeline. Start with 3-5 high-priority providers, validate the schema mapping meets your needs, and test rate limit handling under realistic load.
  4. Design for incremental sync from day one. Full user list pulls are fine for initial loads. Ongoing syncs need to be incremental or you will burn through API rate limits and waste compute.

The identity sprawl problem is getting worse, not better. 46% of organizations struggle to monitor non-human identities, and 56% are concerned with over-privileged API access. The companies that solve user visibility at scale will have a genuine competitive advantage - both as a product feature and as a security posture.

Stop writing bespoke API connectors. Standardize your data models, respect upstream rate limits, and focus your engineering cycles on the core value of your product.

FAQ

Why is SCIM not enough for pulling user lists from SaaS apps?
SCIM is a push-based provisioning protocol, not a pull-based audit protocol. It was designed to push user creates and updates from an IdP to a service provider, not to query existing access. It fails to capture shadow IT and manually created accounts, and is often gated behind expensive enterprise tiers.
How do unified APIs handle different rate limits across SaaS platforms?
A unified API normalizes upstream rate limit data into standardized IETF headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset). The caller is responsible for reading these headers and implementing exponential backoff logic when receiving an HTTP 429 error.
Do I need to write custom code to map disparate user endpoints?
No. Modern unified APIs use declarative mapping (like JSONata) to transform diverse API responses into a single, predictable JSON object without requiring integration-specific code, while still preserving the raw provider payload.
How do I handle different pagination types across multiple SaaS APIs?
SaaS APIs use cursor-based, offset-based, link header, or page number pagination. A unified API normalizes these into a single cursor-based interface so your code uses one pagination pattern regardless of the upstream provider.
How should I handle OAuth token expiration when extracting data?
You should offload token management to a platform that autonomously handles the lifecycle. The platform schedules work ahead of token expiry to refresh tokens, allowing you to pass a static tenant ID instead of managing raw tokens and refresh logic in your database.

More from our Blog