---
title: "Connect PagerDuty to AI Agents: Orchestrate Events and Team Access"
slug: connect-pagerduty-to-ai-agents-orchestrate-events-and-team-access
date: 2026-06-09
author: Uday Gajavalli
categories: ["AI & Agents"]
excerpt: "Learn how to connect PagerDuty to AI agents using Truto's tool-calling API. Orchestrate incidents, schedules, and team access across your tech stack."
tldr: "Connecting AI agents to PagerDuty requires handling complex nested schemas, rigid time window constraints, and strict rate limits. This guide shows how to fetch native AI tools for PagerDuty and orchestrate them using LangChain or any modern agent framework."
canonical: https://truto.one/blog/connect-pagerduty-to-ai-agents-orchestrate-events-and-team-access/
---

# Connect PagerDuty to AI Agents: Orchestrate Events and Team Access


If your team uses standard chat interfaces, check out our guide on [connecting PagerDuty to ChatGPT](https://truto.one/connect-pagerduty-to-chatgpt-manage-incidents-and-on-call-schedules/) or [connecting PagerDuty to Claude](https://truto.one/connect-pagerduty-to-claude-automate-workflows-and-service-health/). But if you are building autonomous, multi-step AI agents - systems designed to detect system failures, identify on-call engineers, and manage the entire incident lifecycle without human intervention - you need a programmatic architecture.

Giving a Large Language Model (LLM) read and write access to your PagerDuty instance is an engineering bottleneck. You either spend weeks building, hosting, and maintaining a custom connector, dealing with authentication lifecycles and schema updates, or you leverage an integration infrastructure layer designed for AI tool calling. As discussed in our guide on [Architecting AI Agents](https://truto.one/architecting-ai-agents-langgraph-langchain-and-the-saas-integration-bottleneck/), bypassing the integration build phase is critical for shipping agentic features to production.

This guide details exactly how to use Truto's `/tools` endpoint and Proxy API architecture to fetch AI-ready tools for PagerDuty, bind them to an LLM using frameworks like LangChain, LangGraph, or the Vercel AI SDK, and build highly reliable incident response workflows.

## The Engineering Reality of PagerDuty's API

Building a basic Python script to pull PagerDuty incidents is straightforward. Building an agentic loop that dynamically interacts with the PagerDuty API based on unpredictable LLM reasoning is a different discipline. 

Every integration on Truto represents how an underlying product's API behaves. Integrations have `Resources` that map to specific endpoints, and `Methods` defined on them (List, Get, Create, Update, Delete). Truto provides Proxy APIs for these methods, handling pagination, authentication, and query parameter processing. When an AI agent needs to act, it relies on these Proxy APIs formatted as tools. 

However, the agent still has to navigate the strict business logic imposed by PagerDuty. Here are the specific engineering constraints you must account for.

### The Object Reference Requirement

LLMs are inherently string-based systems. When instructed to "Create an incident for the Database Service", an LLM will naturally attempt to pass the string "Database Service" into a `create_incident` tool payload. 

PagerDuty's API strictly rejects this. Almost all resources in PagerDuty require nested object references containing the specific resource `id` and `type`. To create an incident, the agent cannot simply provide a service name. It must provide an object shaped like `{"id": "P12345", "type": "service_reference"}`. 

This requires your AI agent to execute a two-step retrieval pattern: first searching or listing services to obtain the correct `id`, and then injecting that `id` into the incident creation tool. If your tool definitions do not enforce this nested schema requirement strictly, the LLM will hallucinate flat JSON payloads that return HTTP 400 Bad Request errors.

### Strict Time Windowing

When querying historical data - such as pulling log entries, active on-call shifts, or past notifications - PagerDuty enforces rigid time window constraints. The `since` and `until` query parameters are almost always required for list endpoints. 

More importantly, PagerDuty often restricts this window to a maximum span (frequently 3 months, or sometimes just 1 month depending on the endpoint and the account tier). If an AI agent attempts to query "all incidents from the past year" in a single tool call without chunking the date ranges, the API will fail. Your agent must be instructed on how to calculate date boundaries and iterate over time chunks if it needs deep historical context.

### Hard Rate Limits and Backoff Engineering

PagerDuty enforces aggressive rate limiting to protect platform stability during major incidents. The REST API restricts concurrent requests and enforces strict limits on operations per minute. 

A critical architectural note: Truto does not retry, throttle, or apply backoff on rate limit errors. When PagerDuty returns an `HTTP 429 Too Many Requests` error, Truto passes that exact error to the caller. However, Truto normalizes the upstream rate limit information into standardized IETF headers: `ratelimit-limit`, `ratelimit-remaining`, and `ratelimit-reset`. 

It is entirely the responsibility of your agent execution framework to catch the 429 error, read the `ratelimit-reset` header, and follow [best practices for handling API rate limits and retries](https://truto.one/best-practices-for-handling-api-rate-limits-and-retries-across-multiple-third-party-apis/) to pause the execution thread and retry the tool call. If your framework assumes tool calls are instantaneous and reliable, your agent will crash mid-workflow during a high-volume incident storm.

## The PagerDuty AI Agent Tools

Truto provides a set of tools for LLM frameworks by offering a description and schema for all the `Methods` defined on the `Resources` for an integration. By calling the `/integrated-account/:id/tools` endpoint, you instantly equip your LLM with PagerDuty capabilities, similar to the process used when [building MCP servers for AI agents](https://truto.one/the-hands-on-guide-to-building-mcp-servers-for-ai-agents-2026/).

Here are the highest-leverage hero tools for PagerDuty workflows.

### 1. list_all_pager_duty_on_calls

Before an AI agent can escalate an issue, it must determine who is currently responsible. This tool lists all active on-call entries, returning the user details, associated escalation policy, and schedule data for the exact moment the query is executed.

**Contextual Usage:** Agents should use this tool when a user asks for routing information or when the agent needs to identify a target for a direct notification. It requires `escalation_policy_ids` or `schedule_ids` to narrow down the search.

> "Query the active on-call schedule for the Primary Database escalation policy and tell me the name and contact method of the engineer currently on shift."

### 2. create_a_pager_duty_incident

This is the core write tool for automated response. It generates a new incident representing a problem that requires human resolution. 

**Contextual Usage:** The agent must assemble a payload containing the `title`, `type` (always `incident`), `service` reference, and `priority`. As noted earlier, the `service` must be a reference object containing the service ID.

> "The Datadog monitor just triggered a CPU alert on the payment gateway. Create a high urgency incident in PagerDuty under the Payments service, and assign the priority as P1."

### 3. get_single_pager_duty_incident_by_id

Agents frequently need to check the state of an existing ticket before acting. This tool retrieves the complete, detailed schema of a specific incident, including its status (triggered, acknowledged, resolved), who it is currently assigned to, and its urgency.

**Contextual Usage:** Use this to verify if someone has already acknowledged an alert before deciding to page a secondary team.

> "Fetch the details for incident ID #PT48XYZ. Has anyone acknowledged this yet, or is it still in the triggered state?"

### 4. update_a_pager_duty_incident_by_id

An agentic workflow is incomplete if it cannot close the loop. This tool allows the agent to update incident states, reassign incidents to different users, or escalate the priority.

**Contextual Usage:** Common actions include changing the `status` to `acknowledged` when the agent begins investigation, or `resolved` when an automated remediation script succeeds.

> "The automated rollback script completed successfully and the health checks are passing. Update incident #PT48XYZ to resolved."

### 5. create_a_pager_duty_override

Schedule management is a highly requested automation feature. This tool creates one or more schedule overrides, effectively replacing the regularly scheduled on-call user with a different user for a specified block of time.

**Contextual Usage:** The agent must provide the `start` time, `end` time, and the target `user` reference object. This is exceptionally useful for handling sick leave or shift swaps via natural language.

> "Sarah is feeling unwell and needs to drop off the on-call rotation for the next 8 hours. Create an override on the Infrastructure schedule, replacing her with Marcus starting immediately."

### 6. list_all_pager_duty_log_entries

Incident post-mortems require data. This tool extracts the complete timeline of an incident, including when it triggered, who was notified, when they acknowledged it, and any notes added to the timeline.

**Contextual Usage:** Agents use this tool to compile timeline summaries or perform root-cause analysis documentation after an incident is resolved.

> "Pull the complete log entries for the database outage incident yesterday, and generate a chronological timeline of when alerts were sent and when engineers acknowledged them."

### 7. list_all_pager_duty_services

Since incident creation requires a specific service ID, agents need a discovery mechanism. This tool returns the registry of all services in the account, including their status, associated escalation policies, and routing information.

**Contextual Usage:** This is the agent's internal lookup table. It should call this tool when it knows the human-readable name of a service but needs the UUID to execute a subsequent API call.

> "I need to trigger an alert for the frontend caching layer. List the available services, find the ID for the 'Frontend Cache' service, and then draft an incident payload."

For the complete inventory of available PagerDuty tools, schemas, and resource definitions, visit the [PagerDuty integration page](https://truto.one/integrations/detail/pagerduty).

## Workflows in Action

Tools are isolated functions. Workflows are the business value derived from chaining those tools together intelligently. Here is how specific engineering personas utilize these AI tools in production environments.

### 1. Automated Incident Triage & Assignment

**Persona:** Site Reliability Engineer (SRE)

An observability platform fires a generic alert webhook into an internal channel. The SRE wants the AI agent to interpret the alert, figure out the impacted system, and officially page the correct team in PagerDuty.

> "An alert just fired for 'High latency on API Gateway us-east-1'. Find out who is currently on call for the API Gateway service and create a high urgency incident assigned directly to them."

**Agent Execution Steps:**
1. The agent calls `list_all_pager_duty_services` searching for "API Gateway" to retrieve the Service ID and its associated Escalation Policy ID.
2. The agent calls `list_all_pager_duty_on_calls` filtering by the retrieved Escalation Policy ID to identify the specific user currently on shift.
3. The agent calls `create_a_pager_duty_incident`, injecting the Service ID and assigning the incident directly to the identified user.

**Outcome:** The LLM translates an unstructured text alert into a properly routed, high-priority page to the exact engineer on duty, saving minutes of manual lookup time.

### 2. Natural Language Schedule Management

**Persona:** Engineering Manager

Managing schedule overrides in the PagerDuty UI requires navigating calendars and selecting precise timestamps. Managers want to handle this via chat interface.

> "David has a family emergency and cannot finish his on-call shift for the Core Backend team. It ends tomorrow at 9 AM. Find the schedule and swap him out for Elena starting right now."

**Agent Execution Steps:**
1. The agent calls `list_all_pager_duty_schedules` to find the schedule ID for "Core Backend".
2. The agent calls `list_all_pager_duty_users` to retrieve the exact User ID for "Elena".
3. The agent calculates the ISO 8601 timestamps for "right now" and "tomorrow at 9 AM".
4. The agent calls `create_a_pager_duty_override` using the schedule ID, Elena's User ID, and the calculated start and end times.

**Outcome:** The schedule is seamlessly updated without human navigation of the PagerDuty web console, ensuring the escalation path remains unbroken.

### 3. Post-Mortem Data Aggregation

**Persona:** Incident Commander

After a major incident, compiling the exact timeline of events for the post-mortem document is a tedious manual task.

> "We just wrapped up incident #PT99ABC. I need to write the post-mortem. Pull all the event logs for this incident and write a strict chronological timeline showing exactly how many minutes passed between the trigger, the first human acknowledgement, and the final resolution."

**Agent Execution Steps:**
1. The agent calls `get_single_pager_duty_incident_by_id` to confirm the incident is resolved and fetch baseline metadata.
2. The agent calls `list_all_pager_duty_log_entries` for the specific incident ID to retrieve the raw event timeline.
3. The LLM processes the returned JSON array, extracts the timestamps for the `notify`, `acknowledge`, and `resolve` log entry types, performs the time delta math, and formats a markdown summary.

**Outcome:** The Incident Commander receives an accurate, mathematically verified timeline of the response metrics instantly.

## Building Multi-Step Workflows

Implementing this in code requires an execution framework capable of managing state, tool execution, and error handling. We strongly recommend using modern frameworks like LangGraph or LangChain, alongside the Truto SDK.

Our LLM SDKs use the `/tools` endpoint to register tools dynamically. Below is a conceptual implementation demonstrating how to bind Truto's PagerDuty tools to an agent, while explicitly handling the HTTP 429 rate limit errors passed through by Truto.

```typescript
import { ChatOpenAI } from "@langchain/openai";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { TrutoToolManager } from "truto-langchainjs-toolset";

async function runPagerDutyWorkflow(integratedAccountId: string, userPrompt: string) {
  // 1. Initialize the LLM
  const model = new ChatOpenAI({
    modelName: "gpt-4o",
    temperature: 0,
  });

  // 2. Fetch PagerDuty Proxy API Tools from Truto
  // This automatically translates PagerDuty Resources/Methods into LangChain tools
  const toolManager = new TrutoToolManager({
    apiKey: process.env.TRUTO_API_KEY,
  });
  
  const tools = await toolManager.getTools(integratedAccountId);

  // 3. Define the Prompt Template
  const prompt = ChatPromptTemplate.fromMessages([
    ["system", "You are a senior DevOps automation agent. You manage PagerDuty incidents and schedules. You must use the provided tools to interact with PagerDuty. Always ensure you look up necessary IDs (like user IDs or service IDs) before attempting to create resources."],
    ["human", "{input}"],
    ["placeholder", "{agent_scratchpad}"],
  ]);

  // 4. Bind tools and create the agent execution loop
  const agent = await createOpenAIToolsAgent({
    llm: model,
    tools,
    prompt,
  });

  const agentExecutor = new AgentExecutor({
    agent,
    tools,
    maxIterations: 10,
    tools,
  });

  // 5. Execute with standard Rate Limit Retry Logic
  // Truto passes the 429 error from PagerDuty and normalizes the headers.
  const maxRetries = 3;
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await agentExecutor.invoke({ input: userPrompt });
      console.log("Workflow Complete:", result.output);
      return result;
    } catch (error: any) {
      if (error.status === 429) {
        // Read the standardized IETF header returned by Truto
        const resetTimeHeader = error.headers['ratelimit-reset'];
        const resetSeconds = resetTimeHeader ? parseInt(resetTimeHeader, 10) : 5;
        
        console.warn(`[Rate Limited] PagerDuty API saturated. Retrying in ${resetSeconds} seconds... (Attempt ${attempt} of ${maxRetries})`);
        await new Promise(resolve => setTimeout(resolve, resetSeconds * 1000));
      } else {
        console.error("Workflow Failed:", error.message);
        throw error;
      }
    }
  }
  throw new Error("Workflow failed after maximum rate limit retries.");
}

// Example execution
runPagerDutyWorkflow(
  "acct_pd_12345xyz", 
  "An alert just fired for the Database service. Find out who is on call for the Database escalation policy, and create a high urgency incident assigned to them."
);
```

### Architectural Considerations for Production

When deploying AI agents connected to incident management systems, reliability is paramount. 

First, never hardcode schemas. Truto's integration UI allows you to customize tool definitions natively. If you need to restrict the agent from using specific query parameters, or if you want to improve the tool description to guide the LLM's behavior, you make those changes in the Truto interface. The `/tools` endpoint updates automatically in real-time, instantly reflecting your new definitions in the LangChain SDK without requiring a code deploy.

Second, understand the abstraction level. Truto's Unified APIs are excellent for programmatic data normalization across broad categories (like syncing users across 10 different HRIS platforms). However, when solving problems agentically with PagerDuty, Proxy APIs are the superior choice. The LLM acts as the normalization engine. By providing the raw Proxy API tools, the LLM can interpret PagerDuty's unique nested objects and business logic directly, preserving the full fidelity of the underlying platform.

Finally, implement strict human-in-the-loop workflows for destructive actions. While fetching logs and listing users is safe, deleting an escalation policy or mass-resolving incidents should require an approval checkpoint. Use frameworks like LangGraph to interrupt the state graph before executing a `delete` or `update` tool, ping an administrator via Slack or email, and resume execution only upon confirmation.

> Stop spending engineering cycles building and maintaining custom integration infrastructure for your AI agents. Let Truto handle the authentication, pagination, and tool generation layers so your team can focus on agentic business logic.
>
> [Talk to us](https://cal.com/truto/partner-with-truto)

## Orchestrating Reliability

The industry is moving beyond read-only chatbots. Enterprise customers expect AI features that can autonomously execute complex operations across their SaaS stack. PagerDuty is the nervous system of IT operations, making it a critical integration target for agentic workflows.

By leveraging Truto's tool-calling endpoint, you bypass the massive technical debt of building custom API connectors. You equip your LLM with dynamically updated, schema-accurate tools that respect the underlying engineering realities of the PagerDuty platform. Combine this with rigorous error handling and execution frameworks, and you transform a basic LLM script into a robust, production-grade automated incident commander.
