

Massive S3 outage - ptm
http://developer.amazonwebservices.com/connect/thread.jspa?threadID=19714&tstart=0
Massive S3 outage.  Seems to affect other AWS services (SDB, SQS) as well.<p>Other AWS services are down too ...
EC2 http://developer.amazonwebservices.com/connect/thread.jspa?threadID=19715&#38;tstart=0
SQS  http://developer.amazonwebservices.com/connect/thread.jspa?threadID=19713&#38;tstart=0
======
henning
Interestingly, my desktop applications did not go down this morning. It was
right there waiting for me as I sipped my coffee.

It also didn't go down all the times the 37Signals web apps I pay for or the
hosted FogBugz installation my company uses have gone down.

Desktop apps rock! :)

~~~
bfioca
My PC crashed while I was reading your comment. :)

~~~
henning
Of course, I have lost massive amounts of valuable data because hard drives
crashed and, like 95+ % of PC users, I don't back things up.

~~~
pchristensen
Bingo. Lesson learned: Data in one place is going to die. Data in two places
is probably going to live. Defense in depth.

Lesson NOT learned: S3/my hard drive/Carbonites/etc rulz!

~~~
gibsonf1
Note: we had no data loss or corruption with the S3 outage

------
eugenejen
So it is 6 hours complete outage in around 22 months since its opening beta.
The lifetime outage is somewhat around 6/(30 * 22 * 24) = 0.00037 = 0.037%! I
think this is pretty impressive achievement to build a system with uptime as
99.963%. Especially for some poor engineers woke up at 2am in Seattle and
started to figure out what went wrong and get it back on line. I think it is
pretty cool.

In the case when our PCs/Macs crashed. Even I could rush to a Circuit city/JR
store to get a replacement hard drive. I probably will spend the same amount
of time just to revive my system, given I have good habit back up the system.
If that is not the case, I will need to reinstall operating system and
applications. I guess the down time may be 24 to 48 hours.

So the downtime for a person without good habit in backup. The uptime will be
99.849%! if it takes 24 hours to get back the system in 22 months.

~~~
boredguy8
3 9's uptime as "pretty impressive"? You must be a programmer.

~~~
eugenejen
Yes. as a lousy programmer and lousy administrator myself, I am pretty
impressed at 3 9's for a new system that support HTTP DELETE, PUT under heavy
load can reach such record. I am too ignorant to know any WEBDAV based system
can be that reliable under the same load. And I am also amazed that the
downtime for my PCs/Macs is far inferior than an immature s3! And I am also
wondering the day those systems can be as reliable as telecom's 5 9's (5
minutes/year).

~~~
trevelyan
We use them commercially too, and I am very happy. For a system that is under
development (and presumably iterating internally), support has been nothing
short of fantastic.

------
johnrob
I think the internet needs an S3 clone, offered by another company. Both
companies would be better off because of eachother.

S3 is still more reliable than a couple of dedicated servers, though :)

~~~
randallsquared
"S3 is still more reliable than a couple of dedicated servers, though "

Maybe. I've had colo or dedicated servers since 2000, and the last time I had
one fail in any way was in 2001. I move servers every 2-4 years to newer,
faster hardware, but even so, my current uptime is longer than S3 has existed.

~~~
foonamefoo
I was worried people were going to reply with a bunch of anecdotal evidence.
Thank you for resisting the urge.

~~~
randallsquared
I'm not quite sure if you really think I resisted the urge (presumably because
I used "maybe"), or if you were being sarcastic. :)

------
zemaj
Phew, back up. Although that the fact that it was possible to have the entire
network go down is quite worrying.

S3 actually has an SLA; <http://aws.amazon.com/s3-sla> If I'm reading that
right, if S3 is completely down for more than about 40 mins in Feb (which it
was - about 90 mins by my count) then we should get a 10% discount for this
month. Is that right?

~~~
goodgoblin
I'm still getting spotty S3 service

~~~
zemaj
Yeah actually only HTTP object GET requests seem to be working for me
(although they're working 100% of the time).

~~~
goodgoblin
It seems to be back on. I wonder if the outage's length had something to do
with it being so early on the west coast.

------
bayareaguy
Kathrin of the The Amazon Web Services Team has posted some more specific
details on the failure here. In summary it seems their Authentication service
was overloaded.

[http://developer.amazonwebservices.com/connect/message.jspa?...](http://developer.amazonwebservices.com/connect/message.jspa?messageID=79982#79982)

------
xirium
Bummer. Would anyone like backup virtual hosting on our physical servers?

------
gibsonf1
Our site seems to be running fine (EC2/S3). We actually have all files
currently on EC2 and backed up to S3 (We haven't checked to see if the backup
is still working yet)

~~~
gibsonf1
From our S3 Logs from our EC2 instance ()which saw no interruption): The first
failure we had was at 4:25 this morning, no success until 7:08, then mixed
results, now full success since 8:55

------
goodgoblin
Fudge - ec2 is working for me, but s3 reads and writes are not. Guess its time
to get to work on some kind of failover...

~~~
ptm
People seem to have trouble with ec2 as well.
[http://developer.amazonwebservices.com/connect/thread.jspa?m...](http://developer.amazonwebservices.com/connect/thread.jspa?messageID=79806&tstart=0)

~~~
goodgoblin
I think its fair to remove the 'Possibly' from the post's title at this point

~~~
goodgoblin
Ok - we are back in business

------
tlrobinson
Today was the day I was going to tackle the S3 component of our application...
and I wake up to see this.

