Skip to main content
Root cause analysis identifies which recent change most likely caused an incident by building evidence chains.

Process

  1. Establish timeline — Incident onset time and correlation window (typically T-1h)
  2. Identify changes — Recent deployments, commits, config changes in the window
  3. Build evidence chain — Temporal + code + signal correlation
  4. Present findings — Root cause with confidence level

Example

Question: “What caused checkout failures at 14:23 UTC?” Response:
Timeline: Incident 14:23 UTC, correlation window 13:00-14:23
Changes: PR #1847 deployed 14:11 UTC (12 minutes before incident)
Evidence:
- Temporal: Change deployed 12 minutes before incident
- Code: Modified OrderRepository.findPendingOrders()
- Signal: Connection pool errors in same method
Root cause: PR #1847 connection leak in error handling
Confidence: High

Correlation dimensions

  • Temporal — Change timing vs. incident onset
  • Code — What changed vs. what’s failing
  • Signal — Logs/metrics matching changed code

Confidence levels

  • High — Strong correlation across all dimensions
  • Medium — Good correlation, some gaps
  • Low — Limited evidence, multiple possibilities