Skip to main content

Get Started

Set up AI SRE

Overview

How AI SRE works

Workflows

Detection, triage, RCA

Connectors

Connect your tools

What AI SRE does

AI SRE investigates production incidents by accessing your code repositories, infrastructure, and telemetry. It does the heavy lifting: querying systems, correlating data, building evidence chains, and identifying root causes—work that typically takes hours. Incident investigation — Reviews code changes, determines scope, queries multiple systems, and builds evidence chains to identify root causes. Change correlation — Links recent deployments, commits, and configuration changes to incidents. Impact assessment — Evaluates blast radius, quantifies business impact, and identifies affected services and users.

How it works

  1. Connect your stack — Link your code repositories, telemetry, infrastructure, knowledge base, and alerting systems
  2. Ask questions — Describe incidents in natural language
  3. AI SRE investigates — Accesses code repositories, queries logs and metrics, correlates changes, and builds evidence chains across your systems
  4. Review findings — Get evidence-based conclusions with confidence levels and suggested actions
  5. Resolve faster — Use insights to fix incidents in minutes instead of hours