19.4 Postmortems: writing a useful incident report

On this page

Goal: convert an incident into learning
Postmortem principles (blameless, specific, actionable)
Incident report template
How to use the model to draft (safely)
Follow-up actions that actually prevent repeats
Where to go next

Goal: convert an incident into learning

A good postmortem is not paperwork. It’s how you prevent recurrence and improve systems over time.

The goal is to capture:

what happened (timeline),
impact,
root cause and contributing factors,
what worked and what didn’t,
follow-ups that reduce future risk.

Postmortems are a reliability feature

Teams that write good postmortems get faster and calmer over time. Teams that don’t repeat the same incidents with different symptoms.

Postmortem principles (blameless, specific, actionable)

Blameless: focus on system causes, not individual fault.
Specific: concrete times, metrics, diffs, and outcomes.
Actionable: follow-ups have owners and deadlines.
Truthful: include uncertainty where it exists; don’t invent a narrative.

Incident report template

Incident title:

Summary (3–6 sentences):

Impact:
- Who was affected:
- What was affected:
- Duration:
- Severity:

Timeline (UTC):
- T0: detection
- ...
- Resolution

Root cause:

Contributing factors:
- ...

Detection:
- How did we notice?
- Which alerts/logs/metrics?

Resolution:
- What changed?
- Verification steps:

What went well:
- ...

What went poorly:
- ...

Action items:
- [ ] Action (owner, due date)
- [ ] ...

How to use the model to draft (safely)

LLMs are useful for drafting postmortems because they can organize messy notes quickly. Use them safely:

paste redacted notes (no secrets, no PII)
demand that the model flags uncertainty (“unknown”)
require a clear separation between facts and hypotheses
review carefully: the model may invent a clean narrative

Copy-paste prompt

Draft a postmortem using the template below.

Rules:
- Use only the facts I provide.
- If something is unknown, mark it as unknown.
- Do not invent details to make a nicer story.

Facts/notes (redacted):
...

Template:
(paste template)

Models like tidy stories

Incidents are messy. A model may invent missing steps to “complete” the story. Force it to mark unknowns instead.

Follow-up actions that actually prevent repeats

High-leverage action categories:

Tests: regression tests and characterization tests
Validation: input validation and schema validation
Observability: add missing logs/metrics and alert thresholds
Rate limit/caching: reduce retry storms and overload
Runbooks: document “how to diagnose this class of failure”

Each action item should be concrete and owned.

19.4 Postmortems: writing a useful incident report

Goal: convert an incident into learning

Postmortem principles (blameless, specific, actionable)

Incident report template

How to use the model to draft (safely)

Copy-paste prompt

Follow-up actions that actually prevent repeats

Where to go next