> ## Documentation Index
> Fetch the complete documentation index at: https://docs-sre.lightrun.com/llms.txt
> Use this file to discover all available pages before exploring further.

# On-Call Engineers

> AI SRE workflow for On-Call Engineers

On-Call Engineers respond to alerts and incidents during their shifts. AI SRE helps understand noisy alerts, investigate with limited context, and respond effectively.

## Workflow stages

### Alert intake & triage

**Challenge:** Gets noisy alerts with limited context; scans logs and dashboards

**How AI SRE helps:**

* Provides context for noisy alerts
* Identifies important signals in alert noise
* Rapidly explains what alerts mean
* Helps prioritize alerts based on evidence

**Example:**

```
[Receives noisy alert]
You: "What's this alert about? Is it critical?"
AI SRE: [Analyzes alert, provides context, assesses severity]
```

### Scope & impact assessment

**Challenge:** Infers blast radius from service metrics; guesses severity

**How AI SRE helps:**

* Provides evidence-based blast radius
* Identifies affected services and dependencies
* Quantifies severity with evidence
* Replaces guesses with evidence

### Root cause investigation

**Challenge:** Greps logs; discovers missing fields; asks for redeploy with more logging

**How AI SRE helps:**

* Analyzes logs automatically
* Identifies what data is missing
* Gathers evidence from available sources
* Reduces manual log scanning

### Fix design

**Challenge:** Reviews fix for urgency, not correctness

**How AI SRE helps:**

* Reviews fixes for correctness, not just urgency
* Suggests fixes based on evidence
* Assesses fix risks
* Validates fix approach

### Deployment & verification

**Challenge:** Watches dashboards and error rates post-deploy

**How AI SRE helps:**

* Monitors system health automatically
* Verifies fixes are working
* Assesses impact reduction
* Identifies if fix didn't work

### Post-incident learning

**Challenge:** Moves on once alerts stop

**How AI SRE helps:**

* Documents investigation automatically
* Provides root cause summary
* Captures learnings from incident
* Retains investigation knowledge

## Key workflows

### Alert triage

1. Receive alert with limited context
2. Ask AI SRE to analyze alert
3. Get context and severity assessment
4. Prioritize based on evidence
5. Take appropriate action

### Quick investigation

1. Get alert or incident report
2. Ask AI SRE to investigate
3. Get evidence-based findings
4. Understand root cause
5. Take action

## Best practices

* Use AI SRE immediately when alerts come in
* Get context quickly before acting
* Verify fixes with AI SRE
* Document learnings before moving on
* Don't just move on once alerts stop

## Next steps

<CardGroup cols={2}>
  <Card title="SREs" icon="user-cog" href="/workflows/sres" />

  <Card title="Working with AI SRE" icon="wrench" href="/working-with-ai-sre/overview/detection" />
</CardGroup>
