Kubernetes Crash Loop Postmortem
P1This Kubernetes CrashLoopBackOff postmortem documents a production incident where checkout-api pods entered a crash loop after deploying v2.15.0. The root cause was a nil pointer dereference in PaymentService.Process() when Redis session cache was unavailable. The incident lasted 18 minutes and affected approximately 8,000 users. Timeline: 09:00:02 PagerDuty alert, 09:00:20 checkout-api connection refused, 09:00:35 rollback initiated, 09:01:15 all pods healthy. Detection → Response → Resolution completed in under 60 seconds. This postmortem includes 5 Whys analysis, prevention checklist, and log evidence.