What is an MCP server for Databricks data access in 2026?

An MCP (Model Context Protocol) server for Databricks exposes Unity Catalog metadata, SQL Warehouse execution, and Jobs APIs as tools that AI agents can discover and call over a standard JSON-RPC/SSE transport. In 2026 there are three paths: Databricks managed MCP servers (hosted by Databricks, governed by Unity Catalog, with endpoints like /api/2.0/mcp/sql and /api/2.0/mcp/genie/{space_id}), custom servers hosted as Databricks Apps with built-in OAuth, and self-hosted open-source projects like databricks-mcp-server that you run in Docker or Kubernetes.

How do I deploy databricks-mcp-server with Docker or Helm?

For Docker, pull or build the image, then run it with DATABRICKS_HOST, DATABRICKS_CLIENT_ID/SECRET (or DATABRICKS_TOKEN for dev), DATABRICKS_SQL_WAREHOUSE_ID, and DATABRICKS_MCP_TOOLS_INCLUDE set as env vars - use stdio for local clients and --transport sse --port 8080 for remote agents. For Helm, mount OAuth credentials from a Kubernetes Secret, increase ingress proxy_read/send_timeout to accommodate long-lived SSE streams, and run at least three replicas behind an HPA. Load only the tool modules your agent needs (e.g. unity_catalog,sql,jobs) to keep tool-selection accuracy high.

How does multi-tenant OAuth work for a Databricks MCP server?

Each tenant registers a Databricks service principal (or does a user OAuth flow) in their own workspace, and your service stores the workspace URL, client ID, encrypted client secret, and refresh token per tenant. At request time, a token manager mints a short-lived access token via /oidc/v1/token, caches it, and refreshes 60-180 seconds before expiry. The MCP server pulls the tenant id from a routing header or subpath and asks the token manager for a valid token before calling Databricks. If you'd rather not hold refresh tokens directly, Databricks-managed external MCP connections use Unity Catalog managed OAuth so the platform handles the exchange for you.

Which MCP tools should a Databricks agent expose for SQL, Unity Catalog, and Jobs?

At minimum: uc_list_catalogs, uc_list_schemas, uc_list_tables, and uc_describe_table for discovery; dbsql_execute for SQL against a Warehouse (with row_limit and wait_timeout_seconds parameters); jobs_submit_run plus a jobs_get_run polling companion for long-running notebooks and pipelines. Keep tool descriptions specific - vague names produce vague selections. For long-running jobs, always return run_id immediately and let the agent orchestrator poll; never block a single tool call while a job runs.

How do I connect AutoGen, CrewAI, or LangGraph to a Databricks MCP server?

All three frameworks have first-class MCP client support. AutoGen uses autogen_ext.tools.mcp.SseMcpToolAdapter, CrewAI wraps a server via crewai_tools.MCPServerAdapter, and LangGraph uses langchain-mcp-adapters' MultiServerMCPClient. Point each at your MCP server's SSE URL, pass an X-Tenant-Id header for multi-tenant routing, and hand the returned tools to your Agent/create_react_agent. Cap the tool surface, force fully-qualified table names in the system prompt, and enforce a read-only regex on SQL at the server layer rather than trusting the model.

Back

Engineering Guides By Example

How to Publish Implementation-Focused API & Code Examples (2026 Guide)

Ship runnable API examples that cut Time to First Call, plus a 2026 playbook for deploying a Databricks MCP server for Unity Catalog, SQL, and Jobs data access.

Uday Gajavalli · May 26, 2026 · 28 min read

B2B SaaS evaluations are won or lost in the terminal. When an enterprise staff engineer evaluates your API, they do not read your marketing copy. They find a code snippet, paste it into their IDE, and hit run. If you sell B2B SaaS with a public API, the highest-leverage asset your product team can ship is a set of implementation-focused code examples that get an evaluating engineer to a successful 200 OK in under five minutes. Not a Swagger dump. Not a marketing page with screenshots. A copy-pasteable script that runs against a real provider, authenticates, returns real data, and proves your platform is worth a deeper look.

Developers evaluate APIs based on friction. They paste your snippet into a terminal, run it, and decide in under five minutes whether you deserve their next two hours. If your documentation requires an engineer to spend three hours reverse-engineering undocumented payloads, guessing OAuth scopes, or writing custom retry logic from scratch, they will abandon the integration. If your example fails on line one because the SDK install path is wrong, or your rate-limit headers are non-standard, you have lost the technical evaluation—and the deal is downstream of that loss.

This guide provides a concrete framework for structuring, writing, and publishing API code examples that convert evaluating developers into active users, complementing our advice on building runnable, step-by-step developer tutorials. We will examine the metrics that dictate API adoption, the structural anatomy of a high-converting code snippet, how to handle the painful realities of authentication and rate limits, and how to scale your documentation across hundreds of third-party integrations using a unified API architecture.

Why Time to First Call (TTFC) Dictates API Adoption

Time to First Call (TTFC) is a developer experience metric that measures the elapsed time from a developer landing on your documentation or signing up for your service to executing their first successful, authenticated API request that returns a non-error response.

If you sell B2B SaaS with a public API, TTFC is the single most important metric governing your technical adoption rate. Postman defines TTFC as the absolute baseline for evaluating developer onboarding success. According to Postman's internal data, developers make a successful call 1.7 to 56 times faster when using a provided, runnable collection compared to starting from scratch. Across their dataset, some APIs improved by as much as 56x. That is the gap between a developer giving up at minute 17 and one shipping a prototype before lunch.

Despite this data, most SaaS companies treat their API documentation as an afterthought. Postman's State of the API Report revealed that 39% of developers cite inconsistent documentation as their biggest roadblock to integrating third-party services.

PayPal is a useful case study here. By focusing on runnable examples, their time to first call was reduced from hours to one minute, testing time decreased to minutes from hours using Postman Collections, and Postman Enterprise saved one hour of developer time per week. That is not a documentation win. That is a revenue lever.

When a senior engineer evaluates your platform, they are not looking for a theoretical explanation of your architecture. They are looking for proof that your API behaves predictably. Every minute they spend debugging your authentication flow or deciphering a generic 400 Bad Request error increases the likelihood that they will recommend a competitor to their procurement team. By publishing implementation-focused code examples, you directly engineer your product's TTFC, reducing friction and accelerating the sales cycle.

Tip

Measure TTFC objectively. Pull the timestamp of account creation, the timestamp of the first non-4xx API response on that account, and graph the distribution. If your p50 is over 10 minutes, your code examples are the problem—not your sales team.

For a deeper dive into structuring long-form tutorials around this metric, see our guide on how to publish an end-to-end developer tutorial with API examples.

The Anatomy of a High-Converting API Code Example

Writing a "good" code snippet is an exercise in reducing cognitive load. Whether you are writing standard documentation or publishing developer API recipes, a runnable code example is not a syntax-highlighted block of pseudo-code. It is an executable artifact that must satisfy five strict constraints:

Self-contained: Every import, environment variable, and helper function is visible in the snippet. If the snippet relies on an undocumented helper function or an unmentioned SDK initialization step, it is useless.
Auth-aware: The reader knows exactly where the token comes from and how to obtain one. Better yet, if the user is logged into your documentation portal, the snippet should auto-populate with their sandbox credentials.
Real-data: It hits a real endpoint and returns a real-looking response, not a mocked stub. Do not use "string" or 0 as placeholder values in your JSON payloads. Use realistic data like "jane.doe@example.com" and 15000.
Copy-pasteable: One click copies the entire block. No <YOUR_KEY_HERE> scavenger hunts buried mid-snippet (put placeholders in environment variables at the top).
Errors handled: At minimum, a non-2xx branch shows what the developer should do.

Stripe established the industry gold standard for API documentation by utilizing a three-column layout. This design pattern places contextual navigation on the left, natural language explanations in the center, and live, copy-pasteable code examples in multiple languages on the right. You don't need to copy Stripe's exact design system, but you should copy the intent: the developer should never have to leave the page to understand what a request does.

graph TD
    A[Developer reads docs] --> B{Snippet is self-contained?}
    B -- No --> C[Developer searches for missing imports]
    C --> D[Developer gets frustrated and leaves]
    B -- Yes --> E[Developer pastes code into IDE]
    E --> F{Snippet handles auth?}
    F -- No --> G[Developer spends 2 hours debugging OAuth scopes]
    F -- Yes --> H[Successful 200 OK Response]
    H --> I[Developer approves technical evaluation]

Here is the difference between a low-converting and a high-converting example for the same task—listing CRM contacts:

Low-converting (typical vendor doc):

GET /contacts
Authorization: Bearer <token>

High-converting:

// Lists CRM contacts. Requires TRUTO_API_KEY and INTEGRATED_ACCOUNT_ID env vars.
// Get yours at https://truto.one/dashboard → Integrated Accounts.
 
const res = await fetch(
  `https://api.truto.one/unified/crm/contacts?integrated_account_id=${process.env.INTEGRATED_ACCOUNT_ID}&limit=25`,
  {
    headers: {
      Authorization: `Bearer ${process.env.TRUTO_API_KEY}`,
      Accept: 'application/json',
    },
  }
);
 
if (!res.ok) {
  console.error('Error:', res.status, await res.text());
  process.exit(1);
}
 
const { data, next_cursor } = await res.json();
console.log(`Fetched ${data.length} contacts. Next cursor: ${next_cursor ?? 'none'}`);

The second example is roughly four times the line count, but it answers four implicit questions: where do credentials come from, what is the URL pattern, how do I handle errors, and how do I paginate. Those four questions are exactly what eats the first 15 minutes of a developer's evaluation.

Handling the Messy Realities: Auth and Rate Limits in Code

Hello-world examples are easy to write. Production-ready examples are difficult because production systems have to deal with expiring tokens and aggressive rate limits. Developers don't fail at the happy path. They fail at the seams—OAuth refresh, expired tokens, 429 backoff, idempotency keys, and the long tail of provider-specific quirks. Your examples need to acknowledge these without burying the reader in 200 lines of boilerplate.

Abstracting OAuth Token Management

OAuth 2.0 authorization code flows require managing access tokens, refresh tokens, and expiry windows. The wrong move is to inline the entire authorization-code flow into every snippet. The right move is to separate getting a token (a one-time setup step) from using a token (the per-request snippet). Link out to a dedicated auth quickstart, then assume the token exists in the example itself.

Your integration architecture should handle token lifecycle management server-side so your client-side code examples remain pristine. For example, Truto refreshes OAuth tokens proactively. The platform schedules work to refresh credentials 60 to 180 seconds before expiry. Because the platform manages this state durably, the developer's code example only needs to focus on making the actual API request. The authentication header is injected automatically by the proxy layer. That absence of refresh logic is itself a marketing asset. A developer reading a 12-line example that just works against a long-lived integration knows, intuitively, that someone else is doing the hard part.

Standardizing Rate Limit Headers

Rate limiting is a painful reality of software engineering and the single biggest source of "why is my integration randomly broken at 3 AM" tickets. Every third-party SaaS provider handles rate limits differently: some return HTTP 429, some return HTTP 403, some use Retry-After, some use X-RateLimit-Reset, and some lie about their actual limits.

Do not claim your platform magically absorbs rate limit errors. Radical honesty is required here. When an upstream API returns an HTTP 429, Truto passes that error to the caller. Silently retrying behind the scenes hides real capacity problems and breaks idempotency assumptions. However, Truto normalizes the upstream rate limit information into standardized headers per the IETF draft specification:

ratelimit-limit: The maximum number of requests permitted in the current window.
ratelimit-remaining: The number of requests remaining in the current window.
ratelimit-reset: The time at which the current rate limit window resets.

By standardizing these headers across 100+ integrations, you allow developers to write a single, clean exponential backoff wrapper in their code. Your documentation should explicitly provide this wrapper:

// Example of a production-ready API wrapper using standardized IETF headers
async function fetchWithBackoff(url: string, options: RequestInit, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);
    
    // Handle 429 Rate Limits cleanly using IETF standard headers
    if (response.status === 429) {
      const resetTime = response.headers.get('ratelimit-reset');
      // Calculate wait time based on header, or fallback to exponential backoff
      const waitSeconds = resetTime 
        ? Math.max(0, parseInt(resetTime) - Math.floor(Date.now() / 1000)) 
        : Math.pow(2, attempt);
      
      const jitter = Math.random() * 0.5;
      console.warn(`Rate limited. Retrying in ${(waitSeconds + jitter).toFixed(2)} seconds...`);
      
      await new Promise(resolve => setTimeout(resolve, (waitSeconds + jitter) * 1000));
      continue;
    }
    
    if (!response.ok) {
      throw new Error(`API call failed: ${response.status} ${response.statusText}`);
    }
    
    return response.json();
  }
  throw new Error('Max retries exceeded after rate limits');
}

Building HIPAA-Compliant Healthcare API Integrations

Healthcare SaaS is where bad integration patterns carry the highest real-world cost. The average cost of a healthcare data breach reached $9.77 million, maintaining its position as the most expensive industry for breaches for over a decade. Worse, 8 out of 14 mega-breaches in 2024 involved business associates of HIPAA-covered entities - the exact third-party vendor layer where integrations live. If your code examples for healthcare integrations don't demonstrate compliant patterns, developers will ship insecure code to production.

The core principle: your integration layer should never become a second PHI warehouse. Every pattern below is designed around a single constraint - electronic Protected Health Information (ePHI) passes through the system but is never persisted by it. For a full treatment of the legal requirements, BAA obligations, and architectural decisions, see our detailed guide on how to build HIPAA-compliant integrations for healthcare SaaS.

Stateless Proxy: The Zero-Persistence Pattern

A stateless pass-through proxy handles ePHI exclusively in memory. The request streams in from the caller, gets forwarded to the upstream EHR or healthcare API, and the response streams back. At no point does the proxy buffer the full payload to disk, write it to a queue, or store it in a cache.

// Stateless pass-through proxy: ePHI lives only in memory during request lifecycle
async function proxyFhirRequest(req: Request): Promise<Response> {
  // 1. Resolve target EHR endpoint and credentials from encrypted credential store
  const { ehrBaseUrl, accessToken } = await resolveIntegration(
    req.headers.get('x-integrated-account-id')!
  );
 
  // 2. Stream request body through - never buffer full payload to disk
  const upstreamRes = await fetch(`${ehrBaseUrl}${new URL(req.url).pathname}`, {
    method: req.method,
    headers: {
      Authorization: `Bearer ${accessToken}`,
      Accept: 'application/fhir+json',
    },
    body: req.method !== 'GET' ? req.body : undefined,
  });
 
  // 3. Log metadata only (see Safe Logging section below)
  logRequestMetadata({
    integratedAccountId: req.headers.get('x-integrated-account-id'),
    method: req.method,
    path: new URL(req.url).pathname,
    status: upstreamRes.status,
    timestamp: new Date().toISOString(),
    // No body content, no patient identifiers, no PHI
  });
 
  // 4. Stream response directly back to caller - proxy retains nothing
  return new Response(upstreamRes.body, {
    status: upstreamRes.status,
    headers: filterResponseHeaders(upstreamRes.headers),
  });
}

This is the architectural approach Truto takes for healthcare integrations. The platform acts as a stateless proxy - credentials are resolved from an encrypted store, the request is forwarded, and the response streams back. No ePHI is written to disk, cached, or queued at the integration layer.

Secure Token Storage and Rotation

The only sensitive data your integration layer should persist is credentials - OAuth tokens, API keys, and client secrets. Under HIPAA, these must be encrypted at rest with AES-256 and rotated on a defined schedule. The proposed HIPAA Security Rule updates eliminate the "addressable" classification for encryption entirely, making AES-256 for data at rest and TLS 1.2 or higher for data in transit mandatory requirements.

import { createCipheriv, createDecipheriv, randomBytes } from 'node:crypto';
 
const ALGORITHM = 'aes-256-gcm';
const IV_LENGTH = 12; // 96 bits, recommended for GCM
 
function encryptToken(plaintext: string, key: Buffer): string {
  const iv = randomBytes(IV_LENGTH);
  const cipher = createCipheriv(ALGORITHM, key, iv);
 
  const encrypted = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()]);
  const authTag = cipher.getAuthTag();
 
  // Store as iv:authTag:ciphertext (base64-encoded)
  return [iv, authTag, encrypted].map(b => b.toString('base64')).join(':');
}
 
function decryptToken(stored: string, key: Buffer): string {
  const [ivB64, tagB64, dataB64] = stored.split(':');
  const decipher = createDecipheriv(
    ALGORITHM, key,
    Buffer.from(ivB64, 'base64')
  );
  decipher.setAuthTag(Buffer.from(tagB64, 'base64'));
 
  return decipher.update(Buffer.from(dataB64, 'base64')) + decipher.final('utf8');
}

Tokens are decrypted into memory only at request time and discarded after use. Truto encrypts all stored credentials at rest using AES-256-GCM and proactively refreshes OAuth tokens 60 to 180 seconds before expiry. When a key rotation occurs, the platform re-encrypts all affected credentials under the new key without downtime.

Warning

Never log decrypted tokens. If a token appears in your application logs - even in a staging environment - that log stream is now subject to the full HIPAA Security Rule, including six-year retention and access controls.

Safe Logging: Metadata Only With Payload Hashing

HIPAA's Security Rule (45 CFR §164.312(b)) requires audit controls that record and examine activity in systems handling ePHI. The trap: if your logs contain ePHI, the logs themselves become protected data subject to the same encryption, access control, and six-year retention requirements. The solution is to log metadata about requests and hash payloads for integrity verification without storing their contents.

import { createHash } from 'node:crypto';
 
interface HipaaAuditEntry {
  timestamp: string;
  requestId: string;
  integratedAccountId: string;
  method: string;
  fhirResourceType: string;   // "Patient", "Observation", etc.
  action: 'read' | 'search' | 'create' | 'update' | 'delete';
  httpStatus: number;
  responseTimeMs: number;
  payloadHash: string;         // SHA-256 of body - proves integrity, contains no PHI
  resourceCount?: number;      // how many resources returned (not the resources themselves)
  sourceIp: string;
}
 
function buildAuditEntry(
  req: Request, res: Response, body: ArrayBuffer, durationMs: number
): HipaaAuditEntry {
  return {
    timestamp: new Date().toISOString(),
    requestId: req.headers.get('x-request-id') ?? crypto.randomUUID(),
    integratedAccountId: req.headers.get('x-integrated-account-id') ?? 'unknown',
    method: req.method,
    fhirResourceType: extractResourceType(new URL(req.url).pathname),
    action: mapMethodToAction(req.method),
    httpStatus: res.status,
    responseTimeMs: durationMs,
    payloadHash: createHash('sha256').update(new Uint8Array(body)).digest('hex'),
    sourceIp: req.headers.get('x-forwarded-for')?.split(',')[0]?.trim() ?? 'unknown',
  };
}

The payloadHash field is the key design decision. It gives you a cryptographic fingerprint of every payload that passed through the system - useful for forensic reconstruction and compliance audits - without the log entry itself containing any ePHI. If an auditor needs to verify what data was transmitted at a specific time, the hash can be compared against the source system's records.

Idempotent Writes and Retry Strategies Without PHI Persistence

Write operations against healthcare APIs fail. EHR systems enforce aggressive rate limits, connections drop, and tokens expire mid-request. The challenge is retrying safely without persisting PHI between attempts. FHIR's conditional create mechanism provides native idempotency support:

async function idempotentFhirWrite(
  ehrBaseUrl: string,
  accessToken: string,
  resourceType: string,
  payload: object,
  idempotencyKey: string, // e.g., "identifier=http://your-system|12345"
  maxRetries = 3
): Promise<Response> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch(`${ehrBaseUrl}/${resourceType}`, {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${accessToken}`,
        'Content-Type': 'application/fhir+json',
        'If-None-Exist': idempotencyKey, // FHIR conditional create prevents duplicates
      },
      body: JSON.stringify(payload), // from caller's in-memory request, never queued
    });
 
    // 201 Created, 200 OK, or 409 Conflict (already exists) are all safe outcomes
    if ([200, 201, 409].includes(res.status)) return res;
 
    // 429 Rate Limited - back off using Retry-After header
    if (res.status === 429) {
      const retryAfter = parseInt(res.headers.get('retry-after') ?? '0', 10);
      const waitMs = (retryAfter > 0 ? retryAfter : Math.pow(2, attempt)) * 1000;
      await new Promise(r => setTimeout(r, waitMs + Math.random() * 500));
      continue;
    }
 
    // 5xx Server Error - transient, retry with exponential backoff
    if (res.status >= 500) {
      await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
      continue;
    }
 
    // 4xx Client Error (not 429) - deterministic failure, do not retry
    throw new Error(`FHIR write failed: ${res.status}`);
  }
  throw new Error(`Max retries exceeded for ${resourceType} write`);
}

The critical constraint: the payload object lives only in the caller's request memory. It is not serialized to a retry queue, not written to a dead-letter store, and not cached between attempts. If the calling process crashes, the write is simply not retried - which is the correct behavior when the alternative is unencrypted PHI sitting in a queue.

SMART on FHIR: Backend Services Proxy Pattern

SMART Backend Services is the standard authentication pattern for system-to-system EHR access with no user in the loop. It uses the OAuth 2.0 client credentials grant with asymmetric JWT-based client authentication - no browser redirects, no user consent screens. The client registers a public key with the EHR authorization server and authenticates by signing a short-lived JWT assertion with the corresponding private key.

import { SignJWT, importPKCS8 } from 'jose';
 
async function discoverSmartEndpoints(fhirBaseUrl: string) {
  const res = await fetch(`${fhirBaseUrl}/.well-known/smart-configuration`);
  if (!res.ok) throw new Error(`SMART discovery failed: ${res.status}`);
  const config = await res.json();
  return { tokenEndpoint: config.token_endpoint };
}
 
async function getSmartBackendToken(
  tokenEndpoint: string,
  clientId: string,
  privateKeyPem: string,
  scopes: string[]
): Promise<{ accessToken: string; expiresIn: number }> {
  const privateKey = await importPKCS8(privateKeyPem, 'RS384');
 
  // Build JWT client assertion per SMART Backend Services spec (HL7)
  const assertion = await new SignJWT({})
    .setProtectedHeader({ alg: 'RS384', typ: 'JWT' })
    .setIssuer(clientId)
    .setSubject(clientId)
    .setAudience(tokenEndpoint)
    .setJti(crypto.randomUUID())
    .setExpirationTime('5m')
    .sign(privateKey);
 
  const res = await fetch(tokenEndpoint, {
    method: 'POST',
    headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
    body: new URLSearchParams({
      grant_type: 'client_credentials',
      scope: scopes.join(' '),
      client_assertion_type: 'urn:ietf:params:oauth:client-assertion-type:jwt-bearer',
      client_assertion: assertion,
    }),
  });
 
  if (!res.ok) throw new Error(`SMART token request failed: ${res.status}`);
  const { access_token, expires_in } = await res.json();
  return { accessToken: access_token, expiresIn: expires_in };
}

The developer's side of this is simple: they make a standard Truto API call and never touch JWKs, JWT signing, or token endpoint discovery. The platform resolves the SMART configuration from the EHR's .well-known/smart-configuration endpoint, generates short-lived JWT assertions, exchanges them for access tokens, and encrypts the resulting tokens at rest. When the token approaches expiry, the platform re-authenticates automatically.

sequenceDiagram
    participant Dev as Developer
    participant Proxy as Integration Proxy
    participant Auth as EHR Auth Server
    participant EHR as EHR FHIR API

    Dev->>Proxy: GET /unified/healthcare/patients
    Proxy->>Proxy: Decrypt stored private key
    Proxy->>Auth: POST /token (JWT assertion)
    Auth-->>Proxy: access_token (short-lived)
    Proxy->>EHR: GET /Patient (Bearer token)
    EHR-->>Proxy: FHIR Bundle (streams through)
    Proxy->>Proxy: Log metadata + payload hash only
    Proxy-->>Dev: FHIR Bundle (no PHI persisted)

For a complete walkthrough of BAA requirements, the minimum necessary rule, and how to architect your integration layer for HIPAA compliance from the ground up, see our detailed guide: How to Build HIPAA-Compliant Integrations for Healthcare SaaS.

Scaling Documentation Across 100+ Integrations

Writing a perfect code example for one API is manageable. Maintaining perfect code examples for 100 distinct SaaS APIs is an operational nightmare. Here is where most PM strategies fall apart.

Most unified API platforms solve the multi-integration problem with brute force. Behind the scenes, they maintain separate code paths for each integration—if (provider === 'hubspot') { ... } else if (provider === 'salesforce') { ... }. They templatize tutorials and rely on a content team to keep them fresh. This degrades quickly. When HubSpot ships a new pagination format, you don't just update one tutorial—you update the SDK, the example, the architecture diagram, the field mapping notes, and 30 cross-references.

A unified API architecture changes the math. Truto takes a radically different architectural approach. The entire platform contains zero integration-specific code. The runtime engine is a generic pipeline that reads declarative configuration and executes it. Integration-specific behavior is defined entirely as data—JSON configuration blobs in the database and JSONata expressions mapping the data models.

This architectural shift changes the economics of API documentation. If your platform exposes one generic endpoint pattern for listing CRM contacts, the same code example works for every CRM you support. You do not need to publish 100 different code examples for creating a CRM contact. You publish one generic, runnable code example that targets the unified model. The generic execution pipeline handles the translation to Salesforce, HubSpot, Pipedrive, Zoho, or Close automatically.

flowchart LR
    A[Your code example] --> B[Unified API endpoint]
    B --> C[Generic execution engine]
    C --> D[Config + JSONata mapping]
    D --> E1[HubSpot]
    D --> E2[Salesforce]
    D --> E3[Pipedrive]
    D --> E4[Zoho]
    D --> E5[...50+ providers]

Per-Customer Customization Without Code

Enterprise SaaS customers inevitably have custom fields and heavily modified data models. If a user needs to map a custom Salesforce object, they should not have to write custom code or wait for your engineering team to deploy a new endpoint.

The trade-off worth being honest about: unified APIs work best when the underlying providers genuinely share semantics. For deeply provider-specific behavior, Truto handles this through a three-level override hierarchy using JSONata as the universal transformation language:

Platform Base: The default mapping that works for most customers.
Environment Override: A customer's environment can override any aspect of the mapping without affecting other environments.
Account Override: Individual connected accounts can have their own mapping overrides for highly specific custom fields.

Because these customizations happen via configuration rather than code, your primary API examples remain universally applicable. The developer writes the same clean API request, and the JSONata engine handles the bespoke data transformation in flight. For detailed examples of how to document this architecture, review our guide on 3-Level API Mapping: Per-Customer Data Model Overrides Without Code.

The Role of AI and Code Churn in Modern DevEx

The proliferation of AI coding assistants has fundamentally altered how developers interact with API documentation. Developers are now pasting your examples directly into Cursor, Claude Code, or GitHub Copilot, then asking the LLM to extend them.

This introduces a severe new problem: high-velocity hallucination. GitClear analyzed around a billion lines of code over five years, with 211 million meaningful line changes used for their research. Their analysis found that the share of copy-pasted code blocks increased 8x during 2024, and code churn—new code revised within two weeks—nearly doubled from 3.1% to 5.7%. The share of refactored lines dropped from 24.1% in 2020 to just 9.5% in 2024.

Translation: AI tools make developers ship more code, faster, with more duplication and less reuse. AI models are highly confident and frequently wrong when guessing third-party API schemas, pagination strategies, and authentication headers. If your example is wrong, ambiguous, or relies on undocumented behavior, the AI confidently extrapolates that error into 200 lines of broken code that ship to production.

Clear, implementation-focused API documentation is the only defense against AI-generated garbage code. When you publish strict, machine-readable developer API references with runnable examples, you ground the AI models.

This is why MCP (Model Context Protocol) servers matter for API publishers. Exposing your docs and integration capabilities as an MCP server lets AI agents discover endpoints, parameters, and examples without you having to anticipate every prompt. Auto-generated MCP tool definitions keep the AI-facing surface in sync with the human-facing one.

Warning

AI-generated client code is only as good as the example it starts from. Treat your top 20 code snippets as production code: version them, test them in CI against the real API, and fail builds when they break. "Documentation drift" is now a runtime risk.

MCP Server for Databricks Data Access in 2026

Data-warehouse-native AI agents are the fastest-growing agent category in 2026, and Databricks sits at the middle of that trend. If your product needs an agent to answer questions grounded in lakehouse data, run governed SQL, or trigger Databricks Jobs, an MCP server is now the standard interface. This section is the practical companion to everything above: how to actually stand up a Databricks Model Context Protocol server for data access, what the tool schemas look like, and how to wire it into AutoGen, CrewAI, and LangGraph.

There are three broad paths, and picking the right one determines almost every downstream decision:

Databricks managed MCP servers. Ready-to-use servers hosted by Databricks that connect agents to Unity Catalog, AI Search indexes, Genie Spaces, and custom functions with no setup - Databricks hosts them, manages authentication, and Unity Catalog enforces permissions so agents access only what you grant.
Custom MCP server on Databricks Apps. Databricks Apps provide out-of-the-box OAuth, Git-based deployment, and built-in permissions, letting you turn legacy services into MCP servers in minutes.
Self-hosted open-source databricks-mcp-server. You run the process yourself (Docker, Kubernetes, or a laptop), authenticate with a PAT or service principal OAuth, and expose it to any MCP client.

The managed servers are the fastest path if you are already inside Databricks. The self-hosted option gives you the most control - useful for multi-tenant SaaS products that need to fan out across many customer workspaces.

Prerequisites and Architecture

Before you deploy anything, get the identity and permissions story right. The identity attached to your token needs USE CATALOG on catalogs, USE SCHEMA on schemas, SELECT on tables you want to query or describe, and CAN_USE on the SQL Warehouse used for query execution and lineage. For production or automated scenarios, a service principal with narrowly defined permissions is strongly preferred over a personal access token.

Minimum checklist:

A Databricks workspace with Unity Catalog enabled.
A SQL Warehouse (serverless or classic) - you'll need its ID for SQL execution and lineage.
A service principal with an OAuth client ID and client secret (or a scoped PAT for dev).
MCP client of choice: Claude Desktop, Cursor, VS Code Copilot, or a custom agent framework.
For managed servers: the MCP server must use the Streamable HTTP transport mechanism - Databricks only supports external MCP servers that use Streamable HTTP.

The reference architecture for a self-hosted, multi-tenant deployment:

flowchart LR
    subgraph agents ["Agent Layer"]
      A1["AutoGen agent"]
      A2["CrewAI crew"]
      A3["LangGraph graph"]
    end
    subgraph mcp ["MCP Layer"]
      M["databricks-mcp-server<br>(HTTP/SSE)"]
      TR["Tenant router<br>+ OAuth cache"]
    end
    subgraph dbx ["Databricks Workspace (per tenant)"]
      UC["Unity Catalog"]
      SQL["SQL Warehouse"]
      JOBS["Jobs API"]
      VS["Vector Search"]
    end
    A1 --> M
    A2 --> M
    A3 --> M
    M --> TR
    TR -->|"per-tenant token"| UC
    TR -->|"per-tenant token"| SQL
    TR -->|"per-tenant token"| JOBS
    TR -->|"per-tenant token"| VS

The key design choice is the tenant router: the MCP server itself is stateless, but a thin routing layer maps the incoming MCP session to the correct workspace hostname, service principal, and OAuth token bundle. We'll implement that below.

Deploying databricks-mcp-server (Docker and Helm)

The open-source ecosystem has consolidated on a few implementations. The databricks-mcp-server project is a comprehensive MCP server built on the official Databricks Python SDK, providing 263 tools and 8 prompt templates across 28 service domains including Unity Catalog, SQL, Compute, Jobs, Pipelines, Serving, Vector Search, Apps, Lakebase, Dashboards, Genie, Secrets, IAM, Connections, Experiments, and Delta Sharing. It delegates authentication entirely to the Databricks SDK, so PAT, OAuth, Azure AD, and service principal auth all work automatically, and you can include or exclude tool modules via environment variables.

For lighter deployments, the RafaelCartenet mcp-databricks-server focuses specifically on Unity Catalog metadata, data discovery, lineage analysis, and intelligent SQL execution. Pick the comprehensive one if your agents need write operations (Jobs, Serving, Repos); pick the focused one if all you need is read-only SQL + UC exploration.

Docker (stdio for local, SSE for remote):

# Build once
docker build -t databricks-mcp:latest .
 
# Local stdio transport (Claude Desktop, Cursor)
docker run -i \
  -e DATABRICKS_HOST=https://your-workspace.cloud.databricks.com \
  -e DATABRICKS_TOKEN=dapi... \
  -e DATABRICKS_SQL_WAREHOUSE_ID=abc123def456 \
  databricks-mcp:latest
 
# Remote SSE transport (agent frameworks, multi-tenant services)
docker run -d -p 8080:8080 \
  -e DATABRICKS_HOST=https://your-workspace.cloud.databricks.com \
  -e DATABRICKS_CLIENT_ID=$SP_CLIENT_ID \
  -e DATABRICKS_CLIENT_SECRET=$SP_CLIENT_SECRET \
  -e DATABRICKS_SQL_WAREHOUSE_ID=abc123def456 \
  -e DATABRICKS_MCP_TOOLS_INCLUDE=unity_catalog,sql,jobs \
  databricks-mcp:latest --transport sse --port 8080

With 263 tools available, load only the modules you need via DATABRICKS_MCP_TOOLS_INCLUDE - fewer tools improves agent tool-selection accuracy. A CRM-agent that only needs SQL and UC should never load the Jobs, IAM, or Delta Sharing modules.

Helm chart for Kubernetes (illustrative values.yaml):

# values.yaml
replicaCount: 3
image:
  repository: your-registry/databricks-mcp
  tag: "1.0.0"
  pullPolicy: IfNotPresent
 
service:
  type: ClusterIP
  port: 8080
 
env:
  DATABRICKS_HOST: https://your-workspace.cloud.databricks.com
  DATABRICKS_SQL_WAREHOUSE_ID: abc123def456
  DATABRICKS_MCP_TOOLS_INCLUDE: unity_catalog,sql,jobs
  MCP_TRANSPORT: sse
  MCP_PORT: "8080"
 
# Pull OAuth credentials from a Kubernetes Secret, never inline them
secretRefs:
  - name: databricks-sp-credentials
    keys:
      - DATABRICKS_CLIENT_ID
      - DATABRICKS_CLIENT_SECRET
 
ingress:
  enabled: true
  className: nginx
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"   # long-running SSE
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
  hosts:
    - host: mcp-databricks.internal.example.com
      paths:
        - path: /
          pathType: Prefix
 
resources:
  requests: { cpu: 250m, memory: 512Mi }
  limits:   { cpu: 1000m, memory: 1Gi }
 
podDisruptionBudget:
  minAvailable: 2
 
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

Create the secret separately so credentials never sit in Git:

kubectl create secret generic databricks-sp-credentials \
  --from-literal=DATABRICKS_CLIENT_ID=$SP_CLIENT_ID \
  --from-literal=DATABRICKS_CLIENT_SECRET=$SP_CLIENT_SECRET \
  -n mcp
 
helm upgrade --install databricks-mcp ./chart -f values.yaml -n mcp

The two things engineers get wrong here: forgetting to bump ingress read/write timeouts (SSE connections stay open for minutes), and running one replica (an agent stampede on a single pod will surface as mysterious timeouts).

Example MCP Tool JSON Schemas (Unity Catalog, SQL, Jobs)

MCP tool schemas are just JSON Schema wrapped in the MCP tools/list response envelope. The important part is that names and descriptions are the actual signal the LLM uses to decide which tool to call - vague names produce vague tool selection.

Unity Catalog: list tables in a schema.

{
  "name": "uc_list_tables",
  "description": "List all tables in a Unity Catalog schema. Returns table names, table types (MANAGED, EXTERNAL, VIEW), and comments. Use this after uc_list_schemas to enumerate data assets before describing or querying them.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "catalog": {
        "type": "string",
        "description": "Unity Catalog catalog name (e.g. 'main', 'analytics_prod')."
      },
      "schema": {
        "type": "string",
        "description": "Schema name inside the catalog (e.g. 'sales', 'default')."
      },
      "include_columns": {
        "type": "boolean",
        "default": false,
        "description": "When true, include column names and types in the response. Use sparingly for large schemas."
      }
    },
    "required": ["catalog", "schema"],
    "additionalProperties": false
  }
}

Databricks SQL: execute a query against a SQL Warehouse.

{
  "name": "dbsql_execute",
  "description": "Execute a read-only SQL statement against the configured SQL Warehouse and return rows as JSON. Use fully-qualified names (catalog.schema.table). Statements that mutate data are rejected. Prefer LIMIT clauses for exploratory queries.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "statement": {
        "type": "string",
        "description": "A single SQL SELECT statement. No DDL/DML."
      },
      "warehouse_id": {
        "type": "string",
        "description": "Databricks SQL Warehouse ID. Defaults to DATABRICKS_SQL_WAREHOUSE_ID when omitted."
      },
      "row_limit": {
        "type": "integer",
        "default": 1000,
        "minimum": 1,
        "maximum": 100000,
        "description": "Maximum rows to return. The server truncates larger result sets."
      },
      "wait_timeout_seconds": {
        "type": "integer",
        "default": 30,
        "description": "How long to synchronously wait before returning a statement handle for async polling."
      }
    },
    "required": ["statement"],
    "additionalProperties": false
  }
}

Jobs: submit a one-shot job run.

{
  "name": "jobs_submit_run",
  "description": "Submit a one-shot Databricks Job run. Use this to trigger long-running notebook, JAR, Python, or SQL tasks. Returns a run_id immediately. Poll jobs_get_run for status - do not block the agent.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "run_name": { "type": "string", "description": "Human-readable run label." },
      "tasks": {
        "type": "array",
        "minItems": 1,
        "items": {
          "type": "object",
          "properties": {
            "task_key":       { "type": "string" },
            "notebook_task":  { "type": "object", "properties": {
              "notebook_path":    { "type": "string" },
              "base_parameters":  { "type": "object", "additionalProperties": { "type": "string" } }
            }, "required": ["notebook_path"] },
            "existing_cluster_id": { "type": "string" },
            "timeout_seconds":     { "type": "integer", "default": 3600 }
          },
          "required": ["task_key"]
        }
      },
      "idempotency_token": {
        "type": "string",
        "description": "Guarantees a single run for retries; reuse the same token to deduplicate."
      }
    },
    "required": ["run_name", "tasks"],
    "additionalProperties": false
  }
}

For long-running jobs, always pair jobs_submit_run with a companion jobs_get_run tool that returns the run's life_cycle_state and result_state. The agent's control loop looks like:

# Simplified agent-side polling loop for a long-running Databricks Job
import time
 
run = call_tool("jobs_submit_run", {
    "run_name": "nightly-etl-manual",
    "tasks": [{
        "task_key": "etl",
        "notebook_task": {"notebook_path": "/Repos/data/etl_main"},
        "existing_cluster_id": "0715-xxxx-yyyy"
    }],
    "idempotency_token": f"agent-{session_id}"
})
run_id = run["run_id"]
 
while True:
    status = call_tool("jobs_get_run", {"run_id": run_id})
    state = status["state"]["life_cycle_state"]
    if state in ("TERMINATED", "SKIPPED", "INTERNAL_ERROR"):
        break
    time.sleep(15)  # 15s polling is a reasonable default for ETL jobs
 
if status["state"].get("result_state") != "SUCCESS":
    raise RuntimeError(f"Job failed: {status['state']}")

Critically: do not have the agent while True inside a single tool call. That blocks the MCP session and burns tokens on the wait. Return the run_id immediately and let the agent's orchestration loop own the polling.

If you are using Databricks managed MCP endpoints, the URL patterns are /api/2.0/mcp/sql, /api/2.0/mcp/vector-search/{catalog}/{schema}/{index_name}, /api/2.0/mcp/genie/{genie_space_id}, and /api/2.0/mcp/functions/{catalog}/{schema}/{function_name}. The tool surface is fixed by Databricks; your agent code stays identical.

Multi-Tenant OAuth Lifecycle for Databricks MCP

If you are a SaaS vendor exposing a Databricks-backed agent to your customers, you can't ship a single PAT. Each tenant connects their own Databricks workspace, and your MCP server needs to route the right OAuth token to the right upstream call.

The pieces:

App registration in the customer's account console (or automated via SCIM/service principal APIs).
Authorization code flow to get a refresh token, stored encrypted per tenant.
Token exchange at request time to mint short-lived access tokens.
Refresh well before expiry, not on 401. Databricks access tokens are typically ~1 hour.

App registration values you'll store per tenant: workspace_url, client_id, client_secret_encrypted, refresh_token_encrypted, access_token_encrypted, access_token_expires_at, scopes.

Sample multi-tenant token manager:

import time
import httpx
from dataclasses import dataclass
from typing import Optional
 
@dataclass
class TenantCreds:
    tenant_id: str
    workspace_url: str          # https://acme.cloud.databricks.com
    client_id: str
    client_secret: str          # already decrypted in-memory
    refresh_token: Optional[str]
    access_token: Optional[str]
    expires_at: float           # unix seconds
 
class DatabricksTokenManager:
    REFRESH_SKEW_SECONDS = 120   # refresh 2 minutes before expiry
 
    def __init__(self, store):
        self.store = store       # encrypted per-tenant credential store
 
    async def get_access_token(self, tenant_id: str) -> str:
        creds = await self.store.load(tenant_id)
        if creds.access_token and creds.expires_at - time.time() > self.REFRESH_SKEW_SECONDS:
            return creds.access_token
        return await self._refresh(creds)
 
    async def _refresh(self, creds: TenantCreds) -> str:
        # Service principal (M2M) - client credentials grant
        # For U2M with a refresh token, swap grant_type + payload accordingly.
        token_url = f"{creds.workspace_url}/oidc/v1/token"
        auth = (creds.client_id, creds.client_secret)
        data = {"grant_type": "client_credentials", "scope": "all-apis"}
 
        async with httpx.AsyncClient(timeout=10) as client:
            resp = await client.post(token_url, data=data, auth=auth)
            resp.raise_for_status()
            body = resp.json()
 
        access = body["access_token"]
        expires_at = time.time() + int(body.get("expires_in", 3600))
        await self.store.update_tokens(
            creds.tenant_id, access_token=access, expires_at=expires_at
        )
        return access
 
    async def on_401(self, tenant_id: str) -> str:
        """Called when an upstream call returns 401 despite a fresh-looking token.
        Forces a re-mint and returns a new token."""
        creds = await self.store.load(tenant_id)
        creds.expires_at = 0
        return await self.get_access_token(tenant_id)

The MCP server itself pulls the tenant id from the incoming session (via a header or a subpath in the URL) and asks the token manager for a valid token per call:

async def call_databricks_sql(tenant_id: str, statement: str):
    token = await token_manager.get_access_token(tenant_id)
    creds = await store.load(tenant_id)
 
    async with httpx.AsyncClient(timeout=60) as client:
        resp = await client.post(
            f"{creds.workspace_url}/api/2.0/sql/statements",
            headers={"Authorization": f"Bearer {token}"},
            json={"statement": statement, "warehouse_id": creds.warehouse_id},
        )
        if resp.status_code == 401:
            token = await token_manager.on_401(tenant_id)
            resp = await client.post(
                f"{creds.workspace_url}/api/2.0/sql/statements",
                headers={"Authorization": f"Bearer {token}"},
                json={"statement": statement, "warehouse_id": creds.warehouse_id},
            )
        resp.raise_for_status()
        return resp.json()

For customers who prefer not to expose refresh tokens to your service at all, Databricks provides a managed-OAuth path: external MCP servers use Unity Catalog connections with managed OAuth to securely handle authentication without exposing credentials to end users. Your product connects to the Databricks-managed proxy, and Databricks handles the token dance on the customer's behalf.

This is exactly the same lifecycle pattern Truto uses for its 100+ integrations - Truto refreshes credentials well before expiry, encrypts everything at rest, and re-encrypts under a new key on rotation without downtime. If you don't want to build a per-tenant Databricks OAuth manager from scratch, wiring Databricks as an integration behind a unified API is the shortest path to the same behavior.

Agent Integration Examples (AutoGen, CrewAI, LangGraph)

Once the MCP server is up and serving tools, plugging it into an agent framework is mostly a matter of transport plumbing. All three of the popular frameworks below have first-class MCP client support in 2026.

AutoGen (Microsoft). AutoGen agents consume MCP tools through the mcp toolkit adapter:

from autogen_agentchat.agents import AssistantAgent
from autogen_ext.tools.mcp import SseMcpToolAdapter, McpServerParams
from autogen_ext.models.openai import OpenAIChatCompletionClient
 
async def build_databricks_agent():
    params = McpServerParams(
        url="https://mcp-databricks.internal.example.com/sse",
        headers={"X-Tenant-Id": "acme-prod"},   # your multi-tenant routing header
    )
    tools = await SseMcpToolAdapter.from_server_params(params)
 
    return AssistantAgent(
        name="data_analyst",
        model_client=OpenAIChatCompletionClient(model="gpt-4.1"),
        tools=tools,
        system_message=(
            "You are a data analyst with access to Databricks Unity Catalog and "
            "SQL Warehouses via MCP tools. Always list schemas before querying, "
            "use fully-qualified table names, and add LIMIT to exploratory queries."
        ),
    )

CrewAI. CrewAI wraps MCP servers as MCPServerAdapter and hands the resulting tools to each Agent:

from crewai import Agent, Crew, Task
from crewai_tools import MCPServerAdapter
 
mcp_config = {
    "url": "https://mcp-databricks.internal.example.com/sse",
    "transport": "sse",
    "headers": {"X-Tenant-Id": "acme-prod"},
}
 
with MCPServerAdapter(mcp_config) as databricks_tools:
    analyst = Agent(
        role="Databricks Data Analyst",
        goal="Answer revenue questions grounded in the analytics_prod catalog.",
        backstory="You know Unity Catalog cold and prefer explicit joins over CTEs.",
        tools=databricks_tools,
        allow_delegation=False,
    )
 
    task = Task(
        description="What was total net revenue by product category last quarter?",
        expected_output="A markdown table with category, revenue, and QoQ delta.",
        agent=analyst,
    )
 
    result = Crew(agents=[analyst], tasks=[task]).kickoff()
    print(result)

LangGraph. LangGraph pairs cleanly with the official langchain-mcp-adapters package for turning MCP tool sets into LangChain tools:

from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
 
client = MultiServerMCPClient({
    "databricks": {
        "url": "https://mcp-databricks.internal.example.com/sse",
        "transport": "sse",
        "headers": {"X-Tenant-Id": "acme-prod"},
    },
    # Add more MCP servers here and LangGraph will route across them.
})
 
tools = await client.get_tools()
 
agent = create_react_agent(
    ChatOpenAI(model="gpt-4.1", temperature=0),
    tools,
    prompt=(
        "You are a Databricks data analyst. Use uc_list_* tools to discover "
        "schema before writing SQL. Never write UPDATE or DELETE."
    ),
)
 
result = await agent.ainvoke({
    "messages": [("user", "How many active users did we have last week?")],
})

A few production notes that apply to all three:

Cap tool exposure. Load only the modules the agent actually needs (unity_catalog, sql, jobs) - large tool surfaces measurably degrade tool-selection quality.
Force fully-qualified names in the system prompt. LLMs love to drop catalog.schema prefixes and then hallucinate ambiguous table names.
Wrap SQL execution with a read-only guard at the MCP layer, not just in the prompt. Prompt guards fail; a regex in the server that rejects INSERT|UPDATE|DELETE|MERGE|DROP|TRUNCATE does not.
Log the tool call payloads with hashed queries so you can audit what the agent actually ran without leaking sensitive filters.

That gives you an end-to-end pattern: a Databricks MCP server exposing Unity Catalog discovery, lineage analysis, and SQL execution, deployed with Docker or Helm, secured with per-tenant OAuth, and consumed by AutoGen, CrewAI, or LangGraph agents.

Next Steps: Turning API Docs into a Growth Lever

API documentation is a primary product surface. It requires the same rigorous product management, user testing, and iteration as your core application UI. If you are a PM or DevRel leader auditing your API examples this quarter, here is your prioritized list:

Measure your TTFC today: Pull the median time from signup to first successful authenticated API call. Sit down with a developer who has never seen your API. Watch where they get stuck. This is your baseline.
Audit your top 10 examples: Can each one be copy-pasted into a terminal, with only environment variables changed, and return real data? If not, fix those first.
Standardize rate-limit and error-response handling: Pick the IETF draft headers and commit to them across every endpoint. Provide the complete, copy-pasteable fetch wrapper that handles the backoff logic.
Separate auth setup from per-request examples: Create one canonical OAuth quickstart, then assume the token exists in everything else. Handle token refreshes server-side.
Reduce the integration surface: If you are maintaining 50 nearly identical tutorials, that's a structural problem. Move to a declarative, data-driven architecture that allows you to publish generic code examples that work universally across a category.
Test examples in CI: Run your published code against the real API on every release. AI assistants will amplify any drift.

Enterprise deals die when evaluating engineers cannot get your API to work in five minutes. By focusing relentlessly on runnable, implementation-focused code examples, you remove the friction that kills adoption and empower developers to champion your product.

FAQ

What is an MCP server for Databricks data access in 2026?: An MCP (Model Context Protocol) server for Databricks exposes Unity Catalog metadata, SQL Warehouse execution, and Jobs APIs as tools that AI agents can discover and call over a standard JSON-RPC/SSE transport. In 2026 there are three paths: Databricks managed MCP servers (hosted by Databricks, governed by Unity Catalog, with endpoints like /api/2.0/mcp/sql and /api/2.0/mcp/genie/{space_id}), custom servers hosted as Databricks Apps with built-in OAuth, and self-hosted open-source projects like databricks-mcp-server that you run in Docker or Kubernetes.
How do I deploy databricks-mcp-server with Docker or Helm?: For Docker, pull or build the image, then run it with DATABRICKS_HOST, DATABRICKS_CLIENT_ID/SECRET (or DATABRICKS_TOKEN for dev), DATABRICKS_SQL_WAREHOUSE_ID, and DATABRICKS_MCP_TOOLS_INCLUDE set as env vars - use stdio for local clients and --transport sse --port 8080 for remote agents. For Helm, mount OAuth credentials from a Kubernetes Secret, increase ingress proxy_read/send_timeout to accommodate long-lived SSE streams, and run at least three replicas behind an HPA. Load only the tool modules your agent needs (e.g. unity_catalog,sql,jobs) to keep tool-selection accuracy high.
How does multi-tenant OAuth work for a Databricks MCP server?: Each tenant registers a Databricks service principal (or does a user OAuth flow) in their own workspace, and your service stores the workspace URL, client ID, encrypted client secret, and refresh token per tenant. At request time, a token manager mints a short-lived access token via /oidc/v1/token, caches it, and refreshes 60-180 seconds before expiry. The MCP server pulls the tenant id from a routing header or subpath and asks the token manager for a valid token before calling Databricks. If you'd rather not hold refresh tokens directly, Databricks-managed external MCP connections use Unity Catalog managed OAuth so the platform handles the exchange for you.
Which MCP tools should a Databricks agent expose for SQL, Unity Catalog, and Jobs?: At minimum: uc_list_catalogs, uc_list_schemas, uc_list_tables, and uc_describe_table for discovery; dbsql_execute for SQL against a Warehouse (with row_limit and wait_timeout_seconds parameters); jobs_submit_run plus a jobs_get_run polling companion for long-running notebooks and pipelines. Keep tool descriptions specific - vague names produce vague selections. For long-running jobs, always return run_id immediately and let the agent orchestrator poll; never block a single tool call while a job runs.
How do I connect AutoGen, CrewAI, or LangGraph to a Databricks MCP server?: All three frameworks have first-class MCP client support. AutoGen uses autogen_ext.tools.mcp.SseMcpToolAdapter, CrewAI wraps a server via crewai_tools.MCPServerAdapter, and LangGraph uses langchain-mcp-adapters' MultiServerMCPClient. Point each at your MCP server's SSE URL, pass an X-Tenant-Id header for multi-tenant routing, and hand the returned tools to your Agent/create_react_agent. Cap the tool surface, force fully-qualified table names in the system prompt, and enforce a read-only regex on SQL at the server layer rather than trusting the model.

Updates

Jul 4, 2026 Added a major new section on deploying and configuring MCP servers for Databricks data access in 2026, covering prerequisites and architecture, Docker/Helm deployment of databricks-mcp-server, example JSON tool schemas for Unity Catalog/SQL/Jobs, multi-tenant OAuth lifecycle with sample code, and agent integration patterns for AutoGen, CrewAI, and LangGraph.
Jun 15, 2026 Added new section 'Building HIPAA-Compliant Healthcare API Integrations' with five subsections covering stateless proxy patterns, AES-256-GCM token encryption, metadata-only audit logging with payload hashing, idempotent FHIR write pipelines without PHI persistence, and SMART on FHIR Backend Services proxy authentication with code examples and a sequence diagram.