AWS Outage Postmortems

Greplin: http://tech.blog.greplin.com/aws-best-practices-and-benchmarks
CloudHarmony: <a href="http://cloudharmony.com/b/2011/04/unofficial-ec2-outage-postmortem-sky-is.html">http://cloudharmony.com/b/2011/04/unofficial-ec2-outage-postmortem-sky-is.html
Joyent: <a href="http://joyeur.com/2011/04/24/magical-block-store-when-abstractions-fail-us/">http://joyeur.com/2011/04/24/magical-block-store-when-abstractions-fail-us/
O'reilly: <a href="http://broadcast.oreilly.com/2011/04/the-aws-outage-the-clouds-shining-moment.html">http://broadcast.oreilly.com/2011/04/the-aws-outage-the-clouds-shining-moment.html

April 29, 2011

Amazon’s EBS system caused a days-long outage last week, which impacted almost everyone in the us-east-1 region. I love reading a good postmortem, so I’m collecting here the useful writeups I’ve found (mostly on Hacker News) explaining what happened and how to improve.

Postmortems