Home/
Part XIV — Team Workflows and "Shipping With Adults in the Room"/44. Governance and Compliance (If You're Building a Real Company Product)/44.4 Incident handling for AI mistakes
44.4 Incident handling for AI mistakes
Overview and links for this section of the guide.
On this page
Incident Types
| Severity | Example | Response Time |
|---|---|---|
| P0 | PII leaked, harmful content | 15 min |
| P1 | Wrong financial advice, refund promises | 1 hour |
| P2 | Incorrect but not harmful responses | 24 hours |
| P3 | Style issues, minor inaccuracies | Sprint backlog |
Runbook
## AI Incident Response Runbook
### P0: Critical (Safety/Legal)
1. Immediately disable the AI feature
2. Page on-call + engineering lead + legal
3. Collect evidence (logs, user report)
4. Assess blast radius (how many users affected?)
5. Prepare customer communication
6. Fix and validate before re-enabling
### P1: High (Business Impact)
1. Assess if rollback is needed
2. Page on-call + product owner
3. Document affected interactions
4. Deploy fix or rollback
5. Follow up with affected users
### P2-P3: Standard Flow
1. Create ticket with evidence
2. Prioritize in backlog
3. Fix in next sprint
Post-Mortem
## Post-Mortem Template
**Incident:** AI promised unauthorized refund
**Date:** 2024-01-15
**Duration:** 2 hours
**Users affected:** 47
### What happened?
A prompt change removed the "$50 refund limit" rule.
### Root cause
PR merged without eval suite run.
### What we'll fix
1. Mandatory eval run before merge
2. Add test case for refund limits
3. Add specific safety rule check