Post-Mortem: {incident_title}
Incident ID: {incident_id} Date of Incident: {incident_date} Date of Post-Mortem: {postmortem_date} Severity: {P0 | P1} Duration: {duration} (from detection to resolution) Incident Commander: {name} Post-Mortem Author: {name} Status: Draft | Reviewed | Published Classification: Confidential — Internal Operations
§1 — Executive Summary
One-paragraph summary of what happened, who was affected, and the business impact. Write this for a non-technical audience (Founder, legal, users).
{executive_summary}
§2 — Incident Timeline
All times in UTC. Reconstruct the exact sequence of events from logs, alerts, and team communications.
| Time (UTC) | Event | Source |
|---|---|---|
| HH:MM | {event_description} | {log_file, alert, person} |
| HH:MM | ||
| HH:MM | ||
| HH:MM | ||
| HH:MM | DETECTION — {how was the incident first detected?} | |
| HH:MM | ACKNOWLEDGMENT — {who acknowledged? when?} | |
| HH:MM | CONTAINMENT START — {what initial containment actions?} | |
| HH:MM | CONTAINMENT COMPLETE — {all affected systems isolated?} | |
| HH:MM | ROOT CAUSE IDENTIFIED — {what was the root cause?} | |
| HH:MM | FIX DEPLOYED — {what was deployed?} | |
| HH:MM | VERIFIED — {how was fix confirmed?} | |
| HH:MM | RESOLVED — {incident officially closed} |
§3 — Detection
How was the incident detected?
{Describe the detection method: automated alert, manual observation, external report, user report, etc.}
Detection Source
| Attribute | Detail |
|---|---|
| Alert system | {AI Firewall / PII spike / outbound anomaly / GCP Monitoring / external report / manual} |
| Alert time | {timestamp} |
| Alert recipient | {Founder / AIRO / Hermes / AINA} |
| Alert latency | {time from incident start to alert firing} |
Could we have detected it sooner?
{Analysis of detection gaps. Are there monitoring improvements that would reduce time-to-detect?}
§4 — Root Cause Analysis
Technical Root Cause
{Detailed technical explanation of what caused the incident. Include code paths, configuration, infrastructure, or process failures.}
Contributing Factors
| Factor | Category | Description |
|---|---|---|
| {factor_1} | {code / config / process / dependency / human_error} | {description} |
| {factor_2} | ||
| {factor_3} |
Five Whys
- Why did {immediate_cause} happen? → {answer}
- Why {answer_1}? → {answer}
- Why {answer_2}? → {answer}
- Why {answer_3}? → {answer}
- Why {answer_4}? → ROOT CAUSE: {root_cause}
§5 — Impact Assessment
User Impact
| Metric | Value |
|---|---|
| Users affected | {count} |
| API requests failed | {count or percentage} |
| Data exposed | {PII types and count, or "None"} |
| Services degraded | {list of affected services} |
| Duration of impact | {duration} |
Business Impact
| Metric | Value |
|---|---|
| Revenue lost | {estimated amount in CAD} |
| Provider costs incurred | {cost of failed/retry requests} |
| Support tickets generated | {count} |
| Reputational impact | {assessment} |
| Compliance impact | {PIPEDA breach report required? OPC notified? Users notified?} |
PIPEDA Breach Assessment
- Did the incident involve personal information? → {Yes/No}
- Does it pose a real risk of significant harm (RROSH)? → {Yes/No}
- OPC notified? → {Yes (date) / No / Not applicable}
- Affected users notified? → {Yes (date, count) / No / Not applicable}
- Breach record stored? → {breach_record_id in breach_records table}
§6 — Response Actions
Containment Actions
| Action | Performed By | Time | Effective? |
|---|---|---|---|
| {action_1} | {person/ai} | {timestamp} | {Yes/No/Partial} |
| {action_2} |
Remediation Actions
| Action | Performed By | Time | Permanent? |
|---|---|---|---|
| {fix_1} | {person/ai} | {timestamp} | {Yes/No — follow-up needed} |
| {fix_2} |
Communication Actions
| Communication | Channel | Audience | Time | Template Used |
|---|---|---|---|---|
| {comm_1} | {Telegram/Email/Status page} | {Founder/Users/Public} | {timestamp} | {template_name} |
| {comm_2} |
§7 — What Went Well
Identify things that worked correctly during the incident response. Celebrate successes.
- {positive_1} — {explanation}
- {positive_2} — {explanation}
- {positive_3} — {explanation}
§8 — What Went Wrong
Identify areas for improvement. Be specific and blameless — focus on systems, not individuals.
- {issue_1} — {explanation and impact}
- {issue_2} — {explanation and impact}
- {issue_3} — {explanation and impact}
§9 — Lessons Learned
- {lesson_1} — {what we learned and why it matters}
- {lesson_2} — {what we learned and why it matters}
- {lesson_3} — {what we learned and why it matters}
§10 — Action Items
Tracked in the development plan. Each action item must have an owner and a deadline.
| # | Action Item | Priority | Owner | Deadline | Status |
|---|---|---|---|---|---|
| AI-1 | {action_description} | P0 | {owner} | {date} | ☐ Open |
| AI-2 | {action_description} | P1 | {owner} | {date} | ☐ Open |
| AI-3 | {action_description} | P2 | {owner} | {date} | ☐ Open |
§11 — Appendices
A. Relevant Logs
Attach or reference relevant log entries. Sanitize PII and sensitive data.
{paste relevant log excerpts or reference Cloud Logging query}
B. Communication Records
- Founder Telegram alert: {reference or screenshot}
- User notification email: {reference or copy}
- Status page update: {link or copy}
- OPC breach report: {reference or copy}
C. Code Changes
| Commit | Description | PR Link |
|---|---|---|
| {commit_hash} | {commit_message} | {pr_link} |
D. External References
- {reference_1}
- {reference_2}
Post-Mortem Sign-Off:
- Incident Commander review: {name} — {date}
- Technical review (if AIRO involved): {name} — {date}
- All action items assigned with deadlines
- Filed in
ops/postmortems/YYYY-MM-DD-{incident_id}-{slug}.md
Template Version: 1.0.0 Related:
docs/incident-response-plan.md§7 — Post-Mortem Process