AI for Operations Teams: Triage Tickets, Page Owners, Keep Status Pages Honest
Key Takeaways
- Operations teams are still paid to route. The job is not fixing the incident, it is getting the incident to the right human within minutes. That routing work is what an AI coworker handles well.
- Three signals define operations work. A ticket comes in, an alert fires, or a status page needs updating. Every operations hour reduces to one of those three.
- An AI coworker reads the alert, finds the owner, opens the incident channel. Viktor pulls context from PagerDuty, Datadog, Linear, and Slack, proposes a routing, and waits for a human to confirm before paging.
- Status pages fail because humans forget to update them. An AI coworker that writes the first draft of the public-facing update, then waits for an SRE to approve, fixes the real failure mode.
- Review-first is non-negotiable in operations. You do not want an AI agent auto-paging a VP at 3 AM. Viktor drafts the page, the on-call confirms it, the page goes out.
Why operations still runs on human routing
Every growing company rebuilds the same operations layer. A ticket lands in a shared inbox or a Slack channel. Someone has to decide if it is a bug, a billing issue, an infrastructure incident, or a feature request. That someone then has to find the owner, open the right channel, tag the right service, and make sure the customer sees a response.
The routing is the whole job. The fix rarely takes longer than 20 minutes once the right human is in the room. The problem is the 40 minutes it took to get the right human into the room.
Gartner's 2024 IT operations survey found that more than 60% of Mean Time To Resolve is spent on triage and notification, not repair. The tools did not solve this. PagerDuty, Opsgenie, Linear, Jira, Statuspage, Datadog all ship with sophisticated routing rules, and the rules still break the first time an alert does not fit the schema.
The tax is not the alert. The tax is the 20 tabs the on-call opens to figure out what the alert means.
What operations teams actually do every day
Before talking about what an AI coworker can do, it helps to name the four artifacts operations teams move every day. Each has a different entry point and a different approver.
| Artifact | Entry point | Approver | Where it breaks |
|---|---|---|---|
| Incident triage | PagerDuty alert, Slack ping, Statuspage check | On-call engineer | Alerts that do not match a runbook |
| Ticket routing | Zendesk, Linear, shared inbox, @ops in Slack | Ops lead | Tickets that span two teams |
| Status page updates | Customer complaints, internal incident | SRE or ops lead | Human forgets to post the update |
| Runbook execution | Scheduled maintenance, known issue | On-call engineer | Runbook is three months out of date |
Every row on that table is a routing problem. An operations team is not an engineering team with fewer commits. It is a routing layer with a shared keyboard.
How an AI coworker handles the routing layer
An AI coworker like Viktor does not replace your on-call. It replaces the 30 minutes your on-call spends opening Datadog, PagerDuty, Linear, and Slack side by side to figure out which service is on fire and which human owns it.
Nadia, our ops lead, drops this in our incidents channel when a new Datadog alert fires:
@Viktor triage the Datadog alert for checkout-service p95 latency.
Pull the last 30 min of checkout-service traces, grep for anything over
2 seconds, and find the owner in the Linear service catalog. Open an
incident channel, invite the owner, and draft a Statuspage entry
(investigating, no commitment on ETA). Wait for me to approve the
status page update before posting.
Viktor connects to Datadog, Linear, Slack, and Statuspage through the OAuth your ops team already uses. It pulls 4,200 traces from the last half hour, finds 38 that breached the 2-second threshold, identifies a hot path through the payment provider call, and looks up the checkout service in the Linear catalog to find the on-call owner. It opens a new Slack channel called inc-checkout-latency-2026-05-08, invites the owner and the ops lead, and drafts the Statuspage entry. Nadia edits two words and approves. The status page goes live 90 seconds after the alert fired, not 12 minutes later.
The work that used to take 12 minutes of context gathering now takes one Slack message and a glance.
A comparison: three ways to run operations
Most operations teams already tried to automate routing. Some wrote PagerDuty escalation policies. Some built a Zendesk ruleset that looks like a Rube Goldberg machine. Some still rely on a senior ops person with a lot of muscle memory. The table below is where each approach actually fits.
| Workflow | PagerDuty rules only | Zendesk automations | AI coworker (Viktor) |
|---|---|---|---|
| Route a Datadog alert to the right on-call | Works if the service tag matches | Does not touch infra alerts | Reads the trace, finds owner in Linear catalog |
| Decide if a Zendesk ticket is a bug or billing | Does not see ticket text | Keyword rules, brittle | Reads the ticket, checks Stripe for the customer, proposes a team |
| Open an incident channel with the right invitees | Manual after page | Not applicable | Opens the channel, invites from the service catalog |
| Draft a customer-visible Statuspage update | Manual | Manual | Drafts from alert context, waits for human approval |
| Identify a stale runbook during the incident | Not applicable | Not applicable | Pulls the runbook from Notion, flags last-updated date |
The gap is not the rule engine. PagerDuty and Zendesk are very good at matching once the rule is written. The gap is every alert that does not match a rule, every cross-team ticket, every incident where the runbook is stale. That is where an AI coworker earns its keep, because it can read the alert text, the runbook, and the service catalog, and propose a specific answer rather than a generic escalation.
How to trust the routing when production is on fire
Operations is the function where a confidently wrong action causes the most damage. So the trust model matters more here than anywhere else.
Viktor runs review-first by default. It drafts the Statuspage update, proposes the channel invitees, and waits for the on-call to confirm before anything goes out to customers. Your on-call sees:
- The exact alert that triggered the routing
- Why Viktor thinks this is the right owner (from the Linear service catalog, the last commit author on the failing service, or the on-call schedule in PagerDuty)
- A confidence flag when the routing is ambiguous (two services touched the same request)
- A link back to the raw traces so the on-call can spot-check
Every action Viktor takes in an operations workflow is logged. If Viktor pages someone, the page record shows Viktor's proposal and the human approver who confirmed it. Your postmortem template does not change. The same person signs off on the same actions, just with 10 fewer minutes of context gathering before they do.
This is the opposite of an auto-remediation bot. We have written about why an AI agent that acts without asking is a liability, and operations is the function where that argument is loudest. Your AI coworker should never auto-page a VP, never post to Statuspage without approval, never run a destructive runbook step on its own.
Where this still breaks
An AI coworker is not a replacement for a senior on-call, and there are parts of operations where you should keep Viktor out of the loop on purpose.
Anything that touches a security incident: if an alert looks like a breach, credential leak, or data exfiltration, route it through your security lead first. Viktor can help triage the noise, but the call on whether to escalate to legal belongs to a human.
Anything that touches a customer refund: operations work often spills into billing decisions. Viktor can draft the Stripe refund request, but a human approver has to confirm before the money moves. We wrote about this pattern in our review-first approach to AI agents.
Anything that changes a runbook: Viktor can flag that a runbook is stale, pull the failing step, and propose an edit. A senior SRE still owns the merge.
The Stanford 2024 AI Index reported a 32% year-over-year jump in publicly reported AI incidents, and most of them clustered around agents that were given too much autonomy too fast. Operations is the function where that risk is highest. Start with triage. Earn the trust of your on-call rotation. Then expand.
What an operations team looks like after 60 days
The shape of the job changes faster than most functions. The on-call stops opening 14 tabs to diagnose a single alert. The ops lead stops being the human router for every cross-team ticket. The SRE stops writing Statuspage updates at 3 AM.
What you keep is the judgment work. Deciding if an incident deserves a customer-visible post, whether to escalate to the VP of Engineering, whether the runbook is right. That work still belongs to your senior ops people, and it is the work that was getting crowded out by routing tax.
If you want to start with one workflow, start with incident triage in Slack. It is the cleanest fit for an AI coworker, the highest-volume routing work, and the easiest to audit, because every routing decision shows up as a channel invite.
For teams still deciding whether an AI coworker is the right fit for ops, our 8-question checklist before you buy an AI agent covers the security, audit, and approval questions ops leaders need answered before rollout.
Frequently Asked Questions
What is AI for operations teams, in one sentence? AI for operations teams is software that reads alerts and tickets, finds the right owner from your service catalog, opens the right channel, and drafts the customer-facing update while waiting for a human to approve every action.
How is this different from PagerDuty or Opsgenie? PagerDuty and Opsgenie are rule-based pagers. They route alerts when the tags match. An AI coworker like Viktor reads the alert text, the failing traces, and the service catalog, and proposes a specific routing when the rules do not cover the case. Many teams run both.
Does Viktor auto-page people? No. Viktor proposes the page, shows the evidence, and waits for the on-call or ops lead to confirm before the page goes out. The human name is on the final escalation, not the AI.
Which ops tools does Viktor connect to? PagerDuty, Opsgenie, Datadog, Statuspage, Linear, Jira, Zendesk, Slack, Microsoft Teams, and Notion for runbooks. Viktor is one install inside Slack or Microsoft Teams and connects to 3,000+ integrations from there.
What happens to the incident postmortem? Every Viktor action is logged with a timestamp, the source alert, and the human approver. The postmortem reads the same as one run by a senior ops engineer, with Viktor's proposal visible alongside the final action.
Can Viktor run a runbook step on its own? Not by default, and we recommend leaving it that way. Viktor can execute read-only runbook steps (pull logs, check a health endpoint, dump a query plan). Any write step, including a restart, a deploy, or a config flip, needs a human approver.
Where should I start if I want to try this? Start with incident triage inside one Slack channel. Pick a service your team is already tired of manually routing alerts for. Let Viktor draft the routing for 20 alerts, watch how often your on-call accepts the proposal, and expand from there.
Viktor is an AI coworker that lives in Slack, connects to 3,000+ integrations, and does real work for your operations team. Add Viktor to your workspace.