Hacker News new | past | comments | ask | show | jobs | submit login
Massive S3 outage (amazonwebservices.com)
42 points by ptm on Feb 15, 2008 | hide | past | favorite | 29 comments



Interestingly, my desktop applications did not go down this morning. It was right there waiting for me as I sipped my coffee.

It also didn't go down all the times the 37Signals web apps I pay for or the hosted FogBugz installation my company uses have gone down.

Desktop apps rock! :)


My PC crashed while I was reading your comment. :)


Of course, I have lost massive amounts of valuable data because hard drives crashed and, like 95+ % of PC users, I don't back things up.


Bingo. Lesson learned: Data in one place is going to die. Data in two places is probably going to live. Defense in depth.

Lesson NOT learned: S3/my hard drive/Carbonites/etc rulz!


Note: we had no data loss or corruption with the S3 outage


Yeah, My PC crashed while I wa


So it is 6 hours complete outage in around 22 months since its opening beta. The lifetime outage is somewhat around 6/(30 * 22 * 24) = 0.00037 = 0.037%! I think this is pretty impressive achievement to build a system with uptime as 99.963%. Especially for some poor engineers woke up at 2am in Seattle and started to figure out what went wrong and get it back on line. I think it is pretty cool.

In the case when our PCs/Macs crashed. Even I could rush to a Circuit city/JR store to get a replacement hard drive. I probably will spend the same amount of time just to revive my system, given I have good habit back up the system. If that is not the case, I will need to reinstall operating system and applications. I guess the down time may be 24 to 48 hours.

So the downtime for a person without good habit in backup. The uptime will be 99.849%! if it takes 24 hours to get back the system in 22 months.


3 9's uptime as "pretty impressive"? You must be a programmer.


Yes. as a lousy programmer and lousy administrator myself, I am pretty impressed at 3 9's for a new system that support HTTP DELETE, PUT under heavy load can reach such record. I am too ignorant to know any WEBDAV based system can be that reliable under the same load. And I am also amazed that the downtime for my PCs/Macs is far inferior than an immature s3! And I am also wondering the day those systems can be as reliable as telecom's 5 9's (5 minutes/year).


We use them commercially too, and I am very happy. For a system that is under development (and presumably iterating internally), support has been nothing short of fantastic.


I think the internet needs an S3 clone, offered by another company. Both companies would be better off because of eachother.

S3 is still more reliable than a couple of dedicated servers, though :)


well, there's Nirvanix (http://www.nirvanix.com) which competes with S3.


"S3 is still more reliable than a couple of dedicated servers, though "

Maybe. I've had colo or dedicated servers since 2000, and the last time I had one fail in any way was in 2001. I move servers every 2-4 years to newer, faster hardware, but even so, my current uptime is longer than S3 has existed.


I was worried people were going to reply with a bunch of anecdotal evidence. Thank you for resisting the urge.


I'm not quite sure if you really think I resisted the urge (presumably because I used "maybe"), or if you were being sarcastic. :)


Phew, back up. Although that the fact that it was possible to have the entire network go down is quite worrying.

S3 actually has an SLA; http://aws.amazon.com/s3-sla If I'm reading that right, if S3 is completely down for more than about 40 mins in Feb (which it was - about 90 mins by my count) then we should get a 10% discount for this month. Is that right?


I'm still getting spotty S3 service


Yeah actually only HTTP object GET requests seem to be working for me (although they're working 100% of the time).


It seems to be back on. I wonder if the outage's length had something to do with it being so early on the west coast.


Kathrin of the The Amazon Web Services Team has posted some more specific details on the failure here. In summary it seems their Authentication service was overloaded.

http://developer.amazonwebservices.com/connect/message.jspa?...


Bummer. Would anyone like backup virtual hosting on our physical servers?


Our site seems to be running fine (EC2/S3). We actually have all files currently on EC2 and backed up to S3 (We haven't checked to see if the backup is still working yet)


From our S3 Logs from our EC2 instance ()which saw no interruption): The first failure we had was at 4:25 this morning, no success until 7:08, then mixed results, now full success since 8:55


Fudge - ec2 is working for me, but s3 reads and writes are not. Guess its time to get to work on some kind of failover...


People seem to have trouble with ec2 as well. http://developer.amazonwebservices.com/connect/thread.jspa?m...


I think its fair to remove the 'Possibly' from the post's title at this point


Ok - we are back in business


Our EC2 instance is still running the same process since we booted it 1.5 months ago - no downtime yet.


Today was the day I was going to tackle the S3 component of our application... and I wake up to see this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: