Skip to content

Connect AssemblyAI to Claude: Process Speech and Generate Subtitles

Learn how to connect AssemblyAI to Claude using a managed MCP server. Automate asynchronous transcription, subtitle generation, and speech analysis.

Uday Gajavalli Uday Gajavalli · · 9 min read
Connect AssemblyAI to Claude: Process Speech and Generate Subtitles

If you need to connect AssemblyAI to Claude to process raw audio, generate subtitles, or run speech intelligence tasks, you need a Model Context Protocol (MCP) server. This server acts as the translation layer between Claude's tool calling capabilities and AssemblyAI's REST APIs. You can either build, host, and maintain this infrastructure yourself, or use a managed integration platform like Truto to dynamically generate a secure, authenticated MCP server URL. If your team uses ChatGPT, check out our guide on connecting AssemblyAI to ChatGPT or explore our broader architectural overview on connecting AssemblyAI to AI Agents.

Giving a Large Language Model (LLM) the ability to orchestrate complex speech-to-text pipelines is an engineering challenge. You are not just dealing with standard JSON payloads - you have to handle binary file uploads, long-running asynchronous polling loops, and strict content schema requirements. Every time AssemblyAI updates a model version or introduces a new intelligence endpoint, you have to update your server code, redeploy, and test the integration.

This guide breaks down exactly how to use Truto to generate a secure, managed MCP server for AssemblyAI, connect it natively to Claude, and execute sophisticated speech processing workflows using natural language.

The Engineering Reality of the AssemblyAI API

A custom MCP server is a self-hosted integration layer. While the open MCP standard provides a predictable way for models to discover tools over JSON-RPC, the reality of implementing it against AssemblyAI's specific infrastructure requires handling several unique architectural constraints.

If you decide to build a custom MCP server for AssemblyAI, you own the entire API lifecycle. Here are the specific challenges you will face:

Asynchronous Processing and LLM Timeouts Claude expects synchronous tool execution. When it calls a tool, it expects an immediate response. However, AssemblyAI's transcription architecture is inherently asynchronous. Submitting an audio file returns an immediate 200 OK with a status: "queued" and a job ID. The actual processing might take seconds or minutes. If you expose raw endpoints to Claude, the model will assume the transcription failed when it doesn't get the text back immediately. You must expose separate tools for initiation and polling, and strictly prompt the model to wait and retrieve the final payload.

Multi-Step Binary Uploads AssemblyAI allows transcription via public URLs, but if you have local or protected files, you must first use the upload endpoint to post raw binary data. Sending raw binary through an LLM tool call is not natively supported. A managed MCP server abstracts this, allowing the LLM to trigger a file pipeline where the server handles the multipart or binary transfer stream, returning a secure media URL to the LLM for the subsequent transcription request.

Rate Limits and Concurrency Ceilings AssemblyAI enforces strict concurrency limits on transcription jobs. If an over-eager AI agent decides to batch-process 50 historical audio files at once, AssemblyAI will return an HTTP 429 Too Many Requests. Truto does not retry, throttle, or absorb these rate limits. Instead, Truto passes the 429 error directly back to Claude while normalizing the upstream rate limit information into standardized IETF headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset). The caller - your agent or Claude Desktop - is responsible for reading these headers and implementing its own retry and backoff logic.

Instead of building this infrastructure from scratch, you can use Truto. Truto derives tool definitions directly from AssemblyAI's documentation records, exposing normalized endpoints as ready-to-use MCP tools.

How to Generate an AssemblyAI MCP Server with Truto

Truto's MCP architecture does not rely on hand-coded tool definitions. Instead, it dynamically generates tools based on the API endpoints defined in the integration's configuration and documentation schema. If an endpoint is documented, it becomes an AI tool.

Each MCP server is scoped to a single integrated account (a connected AssemblyAI instance for a specific tenant). The resulting server URL contains a cryptographically hashed token that encodes the account, environment, and filtering rules. You can create this server through the Truto UI or via the API.

Method 1: Via the Truto UI

For internal workflows or one-off agent deployments, generating the server via the UI is the fastest path.

  1. Navigate to the integrated account page for your AssemblyAI connection.
  2. Click the MCP Servers tab.
  3. Click Create MCP Server.
  4. Select your desired configuration (name, allowed methods like read or write, tags, and expiration).
  5. Click Save and copy the generated MCP server URL.

Method 2: Via the Truto API

For production applications provisioning AI agents for end-users, you will generate these servers programmatically. Make a POST request to the integrated account endpoint with your configuration payload.

curl -X POST https://api.truto.one/integrated-account/{integrated_account_id}/mcp \
  -H "Authorization: Bearer YOUR_TRUTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "AssemblyAI Transcript Processor",
    "config": {
      "methods": ["read", "write", "custom"],
      "tags": ["transcription", "intelligence"]
    },
    "expires_at": null
  }'

The API returns a fully qualified JSON-RPC 2.0 endpoint:

{
  "id": "mcp_token_abc123",
  "name": "AssemblyAI Transcript Processor",
  "url": "https://api.truto.one/mcp/a1b2c3d4e5f6..."
}

This URL is completely self-contained. It handles authentication to AssemblyAI, dynamic schema generation, and pagination normalization.

Connecting the MCP Server to Claude

Once you have the Truto MCP server URL, you must register it with your Claude client. Anthropic supports remote server-sent events (SSE) connections for custom endpoints.

Method A: Via the Claude UI

If you are using Claude Desktop or the Claude Enterprise web interface:

  1. Open Settings.
  2. Navigate to Integrations (or Connectors depending on your tier).
  3. Click Add MCP Server or Add custom connector.
  4. Paste the Truto MCP server URL and provide a name (e.g., "AssemblyAI").
  5. Click Add.

Claude will immediately send an initialize request to the Truto server, validate the protocol version, and populate the model's context window with the available AssemblyAI tools.

Method B: Via Manual Config File

For developers running custom environments or local Claude Desktop instances, you can wire the server directly into the configuration JSON. Since Truto provides an HTTP endpoint, you use the standard SSE transport wrapper.

Locate your claude_desktop_config.json file:

  • Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Update the configuration to include the Truto endpoint:

{
  "mcpServers": {
    "assemblyai-truto": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-sse",
        "--url",
        "https://api.truto.one/mcp/a1b2c3d4e5f6..."
      ]
    }
  }
}

Restart Claude. The integration is now live.

Hero Tools for AssemblyAI

Truto automatically translates AssemblyAI's endpoints into descriptive, snake_case tools with JSON schemas attached. Here are the highest-leverage tools your agents will use to orchestrate speech workflows.

create_a_assembly_ai_upload

Use this tool to upload a local media file as raw binary data to AssemblyAI's secure servers. This is the required first step if your audio file is not already hosted on a publicly accessible URL.

Usage note: The tool returns an uploaded file object containing an upload_url. This URL is temporary and strictly scoped for use in the transcription request.

"I have a local file named 'interview_raw.mp3'. Upload this binary payload to AssemblyAI and capture the upload URL for processing."

create_a_assembly_ai_transcript

This tool initiates the asynchronous transcription process. You provide an audio_url (either public or generated via the upload tool) and optional parameters for intelligence models like speaker diarization, profanity filtering, or sentiment analysis.

Usage note: This tool does not return the text. It returns a transcript object with an id and a status (typically "queued" or "processing"). The LLM must capture this ID for the polling step.

"Start a transcription job for the audio at this URL: https://example.com/audio.mp3. Enable speaker diarization so we know who is talking. Give me the transcript ID when the job is queued."

get_single_assembly_ai_transcript_by_id

This is the polling tool. The agent uses this to check the status of a specific transcription job. Once the status string changes to "completed", the response payload will include the full text, speaker labels, and any requested intelligence data.

Usage note: If you hit rate limits while polling, Truto passes the 429 back with ratelimit-reset headers. Prompt your agent to wait the specified seconds before trying again.

"Check the status of transcript ID 550e8400. If it is still processing, wait 10 seconds and check again. Once completed, summarize the main topics discussed."

list_all_assembly_ai_sentences

Instead of a massive block of raw text, this tool returns the transcript segmented into grammatically correct sentences. This is highly optimized for LLMs that need to extract quotes or build structured summaries.

Usage note: You must provide a valid transcript_id of a completed job.

"Get the sentence breakdown for transcript ID 550e8400. Find the specific sentence where the speaker mentions 'quarterly revenue' and tell me the exact timestamp."

get_single_assembly_ai_subtitle_by_id

This tool instantly converts a completed transcription into formatted subtitle files. You pass the transcript ID and the subtitle_format (either srt or vtt).

Usage note: The returned payload is the raw text of the subtitle file, including all sequence numbers and timestamp boundaries. Claude can easily write this to a local file or code block.

"Generate VTT subtitles for transcript ID 550e8400. Output the VTT format directly into a code block so I can copy it into my video editor."

list_all_assembly_ai_word_searches

This tool performs server-side keyword searching across the entire transcript. You provide the transcript_id and a comma-separated list of words.

Usage note: This is vastly more efficient than asking the LLM to read a 40,000-word transcript just to find specific terms. Let AssemblyAI's search endpoint do the heavy lifting.

"Search transcript ID 550e8400 for occurrences of 'lawsuit', 'litigation', and 'settlement'. Return the exact matches and their timestamps."

To view the complete inventory of AssemblyAI endpoints - including speech understanding, streaming tokens, and redaction features - visit the AssemblyAI integration page.

Workflows in Action

By connecting these tools through an MCP server, Claude transforms from a text-only assistant into a multimodal speech processing engine. Here are real-world examples of how personas use this setup.

Scenario 1: The Podcaster Subtitle Pipeline

A content creator needs to turn an hour-long podcast interview into accurate, timestamped VTT subtitles for YouTube.

"I have an audio file hosted at https://audio.com/ep42.mp3. I need you to transcribe it, wait for the job to finish, and then generate the VTT subtitles for the entire episode."

How Claude executes this:

  1. Calls create_a_assembly_ai_transcript with the provided audio_url.
  2. Receives a queued job response and extracts the id.
  3. Enters a loop, calling get_single_assembly_ai_transcript_by_id periodically.
  4. Once the status reads completed, it calls get_single_assembly_ai_subtitle_by_id with subtitle_format: "vtt".
  5. Returns the fully formatted VTT content in a code block for the user.

Scenario 2: Compliance Officer Keyword Auditing

A financial compliance officer needs to audit recorded sales calls for prohibited guarantees or high-risk language.

"Take the call recording hosted at https://internal.company.com/call-104.wav. Transcribe it with speaker labels. Once done, search the audio for the phrases 'guaranteed return', 'risk free', and 'off the record'. Flag any occurrences."

How Claude executes this:

  1. Calls create_a_assembly_ai_transcript passing the URL and enabling speaker diarization.
  2. Polls get_single_assembly_ai_transcript_by_id until completion.
  3. Calls list_all_assembly_ai_word_searches passing the transcript ID and the requested keyword phrases.
  4. If matches are found, it queries list_all_assembly_ai_sentences to pull the exact context and speaker label for the flagged timestamps.
  5. Presents a formatted compliance risk report to the user.

Security and Access Control

When connecting an enterprise workspace to an AI model, security cannot be an afterthought. Exposing administrative API endpoints to an autonomous agent is dangerous without proper guardrails. Truto MCP servers include built-in authorization primitives at the token level:

  • Method Filtering: Limit the server to specific operation types. By passing methods: ["read"] during server creation, you ensure Claude can only poll and fetch transcripts, preventing it from deleting historical data.
  • Tag Filtering: Group specific resources using integration-level tags. You can restrict the server to only expose tools tagged with intelligence, hiding billing or account-level endpoints.
  • Mandatory API Auth (require_api_token_auth): For shared environments, possession of the MCP URL isn't enough. Enabling this flag forces the client to also pass a valid Truto session or API token, authenticating both the server and the individual user executing the prompt.
  • Automatic Expiration (expires_at): Generate short-lived MCP servers for contractors or temporary AI workflows. Setting an ISO datetime ensures the token is purged from distributed edge storage the moment it expires, neutralizing the URL permanently.

Automate Speech Data the Right Way

Connecting AssemblyAI to Claude gives your LLMs the power to hear. But managing asynchronous polling logic, mapping complex JSON schemas, and dealing with 429 rate limit backoff is a massive drain on engineering resources. A managed MCP layer solves this by abstracting the boilerplate and serving dynamic, documentation-driven tools directly to your models.

By leveraging Truto, you can build autonomous workflows that upload media, generate transcriptions, create subtitles, and search speech data without writing a single line of integration code.

FAQ

How does Claude handle AssemblyAI's asynchronous transcriptions?
Claude uses a two-step tool calling process. First, it calls the creation tool to start the transcription and receives a transcript ID. Then, it uses the get transcript tool to poll that ID until the status returns as 'completed'.
Does Truto automatically retry when AssemblyAI rate limits are hit?
No. Truto passes upstream HTTP 429 errors directly back to the caller while normalizing the rate limit information into standard IETF headers (ratelimit-limit, ratelimit-remaining, ratelimit-reset). The client (or agent) is responsible for implementing retry and backoff logic.
Can I restrict which AssemblyAI operations Claude can perform?
Yes. When generating the MCP server URL in Truto, you can pass a configuration object that filters available tools by specific methods (like 'read' or 'list') or specific tool tags, preventing unauthorized data modification.

More from our Blog