Skip to content

How to Connect AI Agents to Read and Write Data in Salesforce and HubSpot

Learn how to give AI agents read/write access to Salesforce and HubSpot. Compare custom LangChain tools, vendor MCP servers, and unified APIs for production CRM integration.

Nachi Raman Nachi Raman · · 13 min read
How to Connect AI Agents to Read and Write Data in Salesforce and HubSpot

To connect an AI agent to Salesforce and HubSpot so it can read and write CRM data, do not hand the model two raw vendor APIs and hope prompt engineering saves you. Put a unified execution layer between the agent and the CRMs. Expose a small set of business-level tools — find_contact, upsert_contact, list_open_deals, create_note — and let that layer absorb SOQL, HubSpot filterGroups, OAuth refresh, pagination, retries, and field mapping. For a one-off prototype against a single CRM, custom tools are fine. For a customer-facing SaaS product, direct connectors turn into permanent connector tax.

The pressure to do this is real, not theoretical. 88% of B2B organizations are adopting or planning to adopt AI agents, according to Forrester's State of Customer Obsession Survey, 2025. But that demand is colliding with a painful reality: over 40% of agentic AI projects will be canceled by the end of 2027, due to escalating costs, unclear business value, or inadequate risk controls, according to Gartner. A huge chunk of those costs come from integration work — not the LLM reasoning, not the prompt engineering, but the plumbing between your agent and the systems it needs to act on. The same firm predicts that by 2029, agentic AI will autonomously resolve 80% of common customer service issues without human intervention, leading to a 30% reduction in operational costs.

If your agent can't reliably pull a deal from Salesforce or create a contact in HubSpot, it's a chatbot with extra steps. Read-only assistants are useful. Read/write assistants are where the business case gets serious.

Warning

If an agent can write into CRM, treat it like production infrastructure from day one. Narrow tool scopes, account-scoped auth, idempotency keys, audit logs, and approval gates for risky actions are not optional.

Why Salesforce and HubSpot APIs Break AI Agents

The fundamental issue is that Salesforce and HubSpot model the same business objects in completely incompatible ways. Building a single AI tool that works across both requires handling every divergence at the schema, query, authentication, and write-semantics layer.

Read Operations: Two Completely Different Query Languages

When you ask an AI agent to "find all contacts at Acme Corp updated in the last 30 days," the API calls required are wildly different.

Salesforce requires constructing SOQL (Salesforce Object Query Language) with exact schema knowledge of the customer's highly customized instance:

SELECT Id, FirstName, LastName, Email 
FROM Contact 
WHERE Account.Name LIKE '%Acme%' 
AND LastModifiedDate >= LAST_N_DAYS:30

Field names are flat PascalCase (FirstName, LastModifiedDate). Custom fields use the __c suffix. Relational traversals use dot-notation syntax that LLMs frequently hallucinate. Phone numbers are spread across six separate fields: Phone, Fax, MobilePhone, HomePhone, OtherPhone, AssistantPhone. The query layer supports relationship queries and nested parent-child traversals up to five levels deep — powerful, but it couples your agent to Salesforce syntax unless you isolate it below the tool boundary.

HubSpot uses a completely different paradigm — a JSON payload with filterGroups:

{
  "filterGroups": [
    {
      "filters": [
        {
          "propertyName": "company",
          "operator": "CONTAINS_TOKEN",
          "value": "Acme"
        },
        {
          "propertyName": "lastmodifieddate",
          "operator": "GTE",
          "value": "1710000000000"
        }
      ]
    }
  ]
}

Data lives inside a nested properties object. Multiple filters within a group use AND logic; multiple groups use OR logic. Additional emails are stored as a semicolon-delimited string in hs_additional_emails. The search endpoint has hard caps that matter for agent design: five requests per second per account, 200 objects per page, and 10,000 total results per query.

Here is the divergence side by side:

Aspect Salesforce HubSpot
Name fields FirstName, LastName properties.firstname, properties.lastname
Email Single Email field properties.email + hs_additional_emails (semicolons)
Phone numbers 6 separate fields 3 fields including hs_whatsapp_phone_number
Filtering SOQL WHERE clause filterGroups JSON arrays
Custom fields __c suffix convention Keys outside default property set
Pagination Cursor-based (own format) Cursor-based (after parameter)

Write Operations: Same Intent, Different Contracts

Writes diverge just as sharply. Salesforce creates a record with a POST to /services/data/{apiVersion}/sobjects/{ObjectName} and updates with a PATCH to the same path plus the record ID. Field names in the request body must exactly match the object's API field names.

HubSpot expects a properties object and can include an associations array in the same request. Updates can target a contact by record ID or by email, but vendor-specific rules hide in the fine print: lifecycle stage can only move forward, batch upsert supports email or a custom unique identifier, and partial upserts are not supported when email is the identifier.

There is also a gotcha that explains a lot of duplicate-contact bugs in naive agent loops: HubSpot's documentation explicitly warns that newly created or updated CRM objects may take a few moments to appear in search results. An agent that creates a contact and immediately searches for it to verify will easily create duplicates.

The Event-Driven Layer

Read and write operations are only half the story. If your agent needs to react to a deal closing in real time, Salesforce relies on Outbound Messages or Change Data Capture (CDC) streams, while HubSpot relies on standard webhooks. Normalizing these event streams so your agent can react uniformly is an entirely separate engineering challenge.

Expecting a generalized LLM to reliably switch context between constructing SOQL strings for one customer and nested filterGroups arrays for another is a recipe for token bloat, high latency, and constant tool-call failures.

The Current Solutions: LangChain Tools, MCP Servers, and Zapier

Engineering teams typically attempt to solve this with one of three approaches. All three have serious limitations for production B2B SaaS.

Custom LangChain Tools

LangChain provides a tool for interacting with Salesforce CRM. The langchain-salesforce package integrates LangChain with Salesforce, allowing you to query data, manage records, and explore object schemas. You can register it as an agent tool and get a working demo fast.

The problem? You now need to build an equivalent tool for HubSpot. The two tools expose completely different interfaces to the LLM. Your agent needs to know which CRM a given customer uses before it can even construct the right tool call. An AI agent is inherently stateless, but external SaaS APIs are highly stateful — your application must sit in the middle managing OAuth tokens, refresh cycles, and pagination state. If a customer has 50,000 contacts, the LLM cannot ingest them all at once; you need a pagination state machine inside your tool logic.

Scaling this to 10 or 20 CRMs — because your customers don't all use Salesforce — means 10 or 20 separate tool implementations, each with its own auth, pagination, and error handling quirks. The data integration complexity is likely understated. Connecting reliably to Salesforce, Gong, BigQuery, and other systems requires substantial engineering effort for authentication, rate limiting, error handling, and data consistency management.

Warning

The Maintenance Trap: Third-party APIs change constantly. If you hardcode API logic into your LangChain tools, your AI engineers will spend 80% of their time maintaining brittle API connectors instead of improving the agent's reasoning capabilities.

Vendor MCP Servers

The Model Context Protocol (MCP) is an emerging open standard that allows AI models to interact securely with external systems. HubSpot was the first major CRM to ship a production-grade MCP integration — a remote MCP server at mcp.hubspot.com that supports read-only access to CRM objects including contacts, companies, deals, and tickets. It requires OAuth with PKCE authentication. However, the HubSpot remote MCP server currently supports read-only access to CRM objects and does not allow custom Sensitive Data Properties.

Salesforce announced the beta release of Salesforce Hosted MCP Servers, currently in Beta with General Availability targeted for February 2026. But Salesforce's adoption of MCP, while promising, remains tightly scoped and proprietary in its implementation. Agentforce shows what's possible when AI agents operate across tools, but the broader Salesforce platform remains closed to external builders. Since OpenAI released the Apps SDK, developers have been building independent MCP servers that expose Salesforce data to frontier models — with or without official blessing. These DIY connectors bypass Agentforce, bypass the Trust Layer, and bypass Salesforce's consumption metering.

So you are choosing between an official MCP server that is locked to a specific ecosystem, or a DIY connector that bypasses security controls. As we explore in our build vs. buy analysis for MCP, neither gives you a portable read/write abstraction across both CRMs.

Zapier and Workflow Automation

Some teams try to bypass custom code by routing the LLM's output to a Zapier webhook. One MCP tool call uses 2 tasks from Zapier's plan quota. Zapier is designed for linear trigger-action workflows, not the bi-directional, synchronous data retrieval required for an LLM to think, observe, and act. You have zero direct control over the API schemas, making error handling and retry logic nearly impossible to manage programmatically. When your agent needs to read a contact, check associated deals, then update a field based on reasoning over that data, Zapier's per-action pricing and stateless execution model becomes both expensive and fragile.

The Fundamental Problem

All three approaches share the same structural flaw: they couple your AI agent's reasoning logic to a specific CRM's data model. Every time you add a new CRM, you are adding new tool definitions, new schemas for the LLM to learn, and new edge cases to test.

flowchart TD
    A[AI Agent] --> B{Which CRM?}
    B -->|Salesforce| C[SOQL Queries<br>PascalCase Fields<br>OAuth + Security Token]
    B -->|HubSpot| D[filterGroups Arrays<br>Nested Properties<br>OAuth + PKCE]
    B -->|Pipedrive| E[REST + API Key<br>Different Schema<br>Different Pagination]
    B -->|Zoho| F[Yet Another Schema<br>Yet Another Auth<br>Yet Another Pagination]
    
    style A fill:#e8f4f8,stroke:#2196F3
    style B fill:#fff3e0,stroke:#FF9800
    style C fill:#fce4ec,stroke:#f44336
    style D fill:#fce4ec,stroke:#f44336
    style E fill:#fce4ec,stroke:#f44336
    style F fill:#fce4ec,stroke:#f44336

This is the integration bottleneck that kills most agentic AI projects. The LLM reasoning is the easy part. The plumbing is where teams burn months.

Architecting a Unified AI Tool for Salesforce and HubSpot

The architectural fix is to decouple the LLM's tool interface from each CRM's native API. Instead of teaching your agent to speak SOQL and HubSpot filterGroups, you expose a single, normalized tool interface and handle the translation at a separate layer.

Your agent sees one tool: search_crm_contacts(email="john@example.com"). The translation layer figures out that for a Salesforce account, it needs to build a SOQL WHERE clause, and for a HubSpot account, it needs to construct a filterGroups array with a CONTAINS_TOKEN operator. The agent never needs to know.

Design Principles

  • One tool definition per operation, regardless of how many CRMs you support
  • Schema normalization happens at the translation layer, not in the LLM's context
  • Auth, pagination, and rate limiting are handled below the tool interface
  • The LLM context stays clean — no CRM-specific field names leaking into prompts
  • Preserve raw vendor payloads as an escape hatch for custom fields and edge cases

This is the same pattern behind ORMs and API gateways. The difference is that for AI agents, getting this wrong doesn't just mean ugly code — it means hallucinated field names, failed writes, and agents that confidently report success on operations that silently failed.

The Execution Pipeline

When a prompt instructs the agent to search for a contact, the agent generates a generic request. A middleware layer intercepts this and resolves the configuration for the specific integrated account.

sequenceDiagram
    participant LLM as AI Agent<br>(LangGraph/MCP)
    participant Middleware as Unified API<br>Middleware
    participant Config as Configuration<br>Store
    participant CRM as Salesforce /<br>HubSpot

    LLM->>Middleware: GET /unified/crm/contacts?email=john@example.com
    Middleware->>Config: Fetch Integration Mapping
    Config-->>Middleware: Return Provider Mapping
    Note over Middleware: Translate to Native Format
    Middleware->>CRM: Execute Native Request<br>(SOQL or filterGroups)
    CRM-->>Middleware: Return Native Response<br>(PascalCase or Nested)
    Note over Middleware: Normalize Response
    Middleware-->>LLM: Return Standardized JSON
  1. Resolve Configuration: Look up the integration configuration — base URL, auth scheme, and declarative mapping expressions for the specific CRM.
  2. Transform the Request: A generic engine evaluates mapping expressions against the unified request. For Salesforce, it outputs a SOQL WHERE clause. For HubSpot, it outputs a filterGroups array.
  3. Execute the API Call: The proxy layer handles the HTTP request, injecting the correct OAuth bearer token and managing rate limit headers.
  4. Transform the Response: The vendor response is mapped back into the unified schema. The LLM receives a clean, standardized JSON array, completely unaware of whether the data came from Salesforce or HubSpot.

The critical insight: branching belongs in declarative mapping configurations, not in your product code and definitely not in your prompt. Every CRM difference is handled by data, not conditional branches.

Safety for Write Paths

For writes, the bar is higher than reads. Do not expose a generic execute_soql or write_properties tool to an autonomous production agent. Give it narrow, business-scoped tools like upsert_contact or create_note_for_contact. A good production setup also needs:

  • Idempotency keys to prevent duplicate creates from retries
  • Retry with backoff and vendor-specific conflict handling
  • Approval gates for destructive or high-risk actions (ownership reassignment, deal deletion)
  • Audit logs capturing every tool call with input, connected account, model reasoning, and vendor response

Models are much better at choosing business intents than they are at constructing safe vendor-specific payloads. Keep the dangerous details below the tool boundary.

Using Truto to Give AI Agents Instant Read/Write CRM Access

Building this unified execution layer from scratch requires a dedicated integration engineering team. Truto implements the architecture described above — every integration is defined as configuration: JSONata mapping expressions, declarative auth and pagination definitions, and zero integration-specific application code. The same generic pipeline handles Salesforce's SOQL, HubSpot's filterGroups, and every other CRM's native quirks.

One Unified Request, Every CRM

Your AI agent makes a single API call:

curl -X GET "https://api.truto.one/unified/crm/contacts?email_addresses=john@example.com" \
  -H "Authorization: Bearer YOUR_TRUTO_TOKEN" \
  -H "X-Integrated-Account-ID: customer_abc123"

The response is normalized regardless of which CRM backs customer_abc123:

{
  "result": [{
    "id": "123",
    "first_name": "John",
    "last_name": "Doe",
    "email_addresses": [{"email": "john@example.com", "is_primary": true}],
    "phone_numbers": [{"number": "+1-555-0123", "type": "phone"}],
    "account": {"id": "456"},
    "custom_fields": {},
    "remote_data": {}
  }],
  "next_cursor": "cD0yMDI0LTAxLTE1VDEwOjMwOjAwWg=="
}

Under the hood, if the account is Salesforce, Truto built a SOQL query with WHERE Email = 'john@example.com' and mapped FirstName to first_name. If it's HubSpot, it constructed a filterGroups search with CONTAINS_TOKEN and extracted values from nested properties. The calling code — and by extension, your AI agent — never needed to know.

That remote_data field matters. A serious unified layer normalizes the common case and still preserves source payloads for the day an enterprise customer asks about a custom Salesforce field or a HubSpot association label you didn't model on day one. The right design is normalized by default, source-specific when needed.

Registering as a LangChain Tool

Wiring this into your agent takes a few lines:

from langchain_core.tools import tool
import httpx
 
TRUTO_BASE = "https://api.truto.one"
TRUTO_TOKEN = "your-truto-token"
 
@tool
def search_crm_contacts(
    integrated_account_id: str,
    email: str = None,
    name: str = None
) -> dict:
    """Search CRM contacts by email or name. Works across any connected CRM."""
    params = {"integrated_account_id": integrated_account_id}
    if email:
        params["email_addresses"] = email
    if name:
        params["name"] = name
    
    resp = httpx.get(
        f"{TRUTO_BASE}/unified/crm/contacts",
        params=params,
        headers={"Authorization": f"Bearer {TRUTO_TOKEN}"}
    )
    return resp.json()

One tool. Every CRM. The LLM sees a clean, predictable schema, and Truto handles OAuth token refreshes, cursor-based pagination, rate limit backoff, and the full translation between unified and native formats.

MCP Server Access

If you're building with MCP, Truto provides managed MCP servers out of the box, giving your agent access to 100+ SaaS integrations through a single server configuration. Your LangChain agent can load tools directly:

from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain.agents import create_agent
 
async def build_agent():
    client = MultiServerMCPClient({
        "crm": {
            "transport": "http",
            "url": "https://your-mcp-endpoint.example.com/mcp",
            "headers": {"Authorization": "Bearer <token>"}
        }
    })
    tools = await client.get_tools()
    return create_agent("openai:gpt-4.1", tools)

Every API connected through Truto instantly becomes an AI-ready tool definition complete with descriptions and schemas that MCP clients and LangChain agents can consume directly.

Truto handles the full operational burden:

  • Managed Authentication: OAuth token lifecycles, refresh logic, and secure credential storage.
  • Declarative Pagination: Whether the provider uses cursor-based, offset-limit, or link headers, Truto normalizes it into a standard next_cursor format.
  • Error Normalization: Provider-specific errors mapped to standard HTTP status codes for reliable retry logic.
Warning

Honest trade-off: A unified API adds a hop between your agent and the CRM. You're trading direct API access for normalized schemas and reduced maintenance. For most B2B SaaS products this is the right call, but if you need sub-50ms latency on individual CRM calls or deep access to vendor-specific features like Salesforce Flows or HubSpot Workflows, go direct for those specific operations and use the unified layer for standard CRUD.

Choosing the Right Approach

Factor Custom LangChain Tools Vendor MCP Servers Unified API (Truto)
Time to first integration 2-4 weeks per CRM Days (if available) Hours
CRM coverage One at a time One per vendor 20+ CRMs, single interface
Write support Full (you build it) Limited (HubSpot: read-only) Full read/write
Auth management You own it Vendor-specific Managed OAuth lifecycle
LLM context overhead High (CRM-specific schemas) Medium Low (normalized schema)
Maintenance burden High (per CRM) Low (vendor maintains) Low (Truto maintains)

The right answer depends on your constraints:

  • One internal assistant, one CRM, read-only — a native MCP server or hand-rolled tool is enough.
  • Customer-facing product, multi-tenant auth, read/write workflows — put a unified execution layer under the agent now.
  • Roadmap includes more CRMs later — start with a configuration-driven mapping layer, or you will rebuild the same connector logic repeatedly.

What to Do Next

If you're an engineering leader evaluating how to give your AI agents CRM access:

  1. Audit your customer base. How many different CRMs do your customers actually use? If it's more than two, per-CRM tooling won't scale.
  2. Separate reasoning from plumbing. Your LLM orchestration layer should never contain CRM-specific logic. That is a design smell.
  3. Start with reads, then writes. Get your agent reliably reading contacts and deals before letting it create or update records. Validate against real CRM data — sandbox environments don't surface the edge cases.
  4. Test with real customer accounts. Every CRM instance is customized. Custom fields, required fields, validation rules — these vary wildly between customers even on the same platform.
  5. Evaluate the unified API path. If your goal is shipping AI features, not becoming a CRM integrations company, a managed layer like Truto lets your team focus on the agent logic that actually differentiates your product.

The mistake is not choosing the wrong framework. LangChain, LangGraph, MCP, and direct HTTP tools can all work. The mistake is letting your agent couple itself to vendor syntax and vendor payloads. That feels fast for a sprint and terrible six months later.

FAQ

Can AI agents write data back to Salesforce and HubSpot?
Yes. LangChain's Salesforce tool supports full CRUD operations via SOQL. HubSpot's official MCP server currently supports read-only access, so writes require the REST API or a unified API layer like Truto that handles both reads and writes across CRMs.
Does Salesforce have an MCP server for external AI agents?
Salesforce launched Hosted MCP Servers in beta (October 2025) with general availability targeted for February 2026. However, the implementation remains tightly scoped to the Agentforce ecosystem, limiting access for external AI agent frameworks. External agents must use custom middleware or a unified API.
How do I handle different CRM schemas in one AI agent?
Use a schema normalization layer that maps each CRM's native fields to a unified data model. Your agent calls one tool like search_contacts(email=X), and the translation layer handles SOQL for Salesforce or filterGroups for HubSpot automatically.
How do I avoid duplicate contacts when an agent writes to HubSpot?
Use stable identifiers and upsert operations where possible. HubSpot documents that new or updated records may take time to appear in search results, and its batch upsert supports email or a custom unique identifier. Add idempotency keys to your write tools to prevent duplicates from retries.
Why do agentic AI projects fail at CRM integration?
Gartner predicts over 40% of agentic AI projects will be canceled by 2027. Integration complexity — authentication, rate limits, schema differences, pagination, write semantics — consumes engineering time that should go toward agent reasoning and business logic.

More from our Blog