Amazon S3 - When in Doubt Reboot

cperciva · on July 22, 2008

I disagree with the author. Well-designed systems have permanent state (in S3's case, user data stored on disks) and transient state (data in caches, gossip protocol state, etc), and are designed to be able to do a "clean reboot".

Obviously this isn't something which you want to be doing on a regular basis, but the very fact that Amazon was able to say "things are too screwed up, let's just do a clean reboot" says that they did something right. Yes, downtime sucks -- but seeing that S3 can be cleanly rebooted tells me that as far as data loss is concerned, I have far less to worry about... since I know now that if worst comes to worst, Amazon has a way of getting S3 back online without losing data.