
Amazon S3+SQS are down, bringing down Scribd, Docstoc, Twitter, SmugMug, JungleDisk - alexwg
http://search.twitter.com/search?q=s3
======
SwellJoe
This has been the problem with AWS all along: Aggregate downtime at good
hosting providers is measured in minutes, or even seconds, per year. Downtime
at AWS has historically been measurable in _days_ per year. This level of
reliability puts it well into the bottom ranks of hosting providers. We're
talking about the dregs of the industry here...the hosts who have a single
cheap Cogent pipe running into a single cage of machines with no power backup
and no backup pipe or infrastructure redundancy. This is the sole reason we
don't recommend AWS to our customers, and why we don't use it ourselves for
any vital services. We want to like it, and recommend it, and we have quite a
bit of software that works with it, that we enjoy selling to people. But, the
reliability just isn't there, and it has been a recurring problem since the
service launched.

~~~
modoc
Absolutely. I'm pretty shocked by the amount of downtime they've been having.

I've been hosting with various providers for almost 10 years. I'm now with
SoftLayer and VERY happy. In those 10 years, I think I've had less combined
downtime than AWS has had in the last 6 months.

The big advantage to running your apps in a nebulous "cloud", aside from the
scaling up-down flexibility, is that in theory the difficulty in running a
stable data center (or ideally set of load balanced, geo-graphically diverse
data centers) is taken care of for you. If the reality is it's a trade-off
between getting easy scaling, and losing decent uptime numbers, I'll take the
"hassle" of adding/dropping servers at SoftLayer which are actually UP, and
which I have good visibility into, any day.

Hopefully they'll get to that point eventually, but for now, I'm staying far
far away.

------
nickb
9:05 AM PDT We are currently experiencing elevated error rates with S3. We are
investigating.

9:26 AM PDT We're investigating an issue affecting requests. We'll continue to
post updates here.

9:48 AM PDT Just wanted to provide an update that we are currently pursuing
several paths of corrective action.

10:12 AM PDT We are continuing to pursue corrective action.

10:32 AM PDT A quick update that we believe this is an issue with the
communication between several Amazon S3 internal components. We do not have an
ETA at this time but will continue to keep you updated.

11:01 AM PDT We're currently in the process of testing a potential solution.

11:22 AM PDT Testing is still in progress. We're working very hard to restore
service to our customers.

------
babul
I've been working on a private cloud using <http://eucalyptus.cs.ucsb.edu/>
that maps to Amazon incase of issues like this (though _Amazon_ was supposed
to be the backup).

Has anyone else been doing the same? What have you been using?

------
staunch
Amazon should give their sysadmins bonuses tied directly to uptime.

~~~
gaius
Yes and no. There are two sides to uptime - there's that the infrastructure is
up, and there's that the application is running. It sucks to be a sysadmin
bonused on _service_ uptime when the network is up, the servers are up, the
database is up, but the application won't stay up and as a sysadmin there's
actually nothing you can do about it; all you can do is wait for the
developers to patch it.

~~~
tlrobinson
Well clearly the bonuses should be tied to the uptime of the particular system
the employees/managers are responsible for.

------
tx
Bezos likes the analogy about Amazon services being "electricity" for other
businesses, i.e. you don't to have to own a generator if you operate a
restaurant (as they used to back in the day) - just "hook up to the grid" and
you're all set.

Funny analogy, since all data centers DO have their own generators: they're
not restaurants.

~~~
demandred
data center = electricity for startups. why is this analogy funny?

Instead of running their own generator (server and storage), for those
startups who don't need to, they can use Amazon's power, AWS.

am I missing something here?

~~~
gscott
Yes you are missing the part where the eletricity cuts off, you don't know
why, you can't do anything about it, and because you relied so heavily on your
electricity provider you didn't set up a plan to make your own.

If your site is your personal blog or something not important, then downtime
might not be a big deal. If you don't have the money for a backup electricity
provider then you have to take your chances also.

~~~
emmett
Yes, that's exactly so. Remember when there was a whole day of downtime for
everyone in the colo in downtown SF?

Running things yourself is no guarantee of uptime.

The only real fix is to maintain fully redundant systems, which is extremely
expensive. Otherwise, put up with downtime sometimes, because no other system
will fully protect you.

------
danw
At least twitter is smart and only uses S3 for profile images. The service can
survive without S3. Tumblr images and audio posts are also affected by the
downtime.

------
tlrobinson
SQS too. <http://status.aws.amazon.com/>

~~~
alexwg
Thanks, added that!

~~~
alexwg
SmugMug too, apparently.

~~~
tlrobinson
Interesting, since they like to talk a lot about how they expect S3 to go down
once in awhile and supposedly can handle it:

<http://blogs.smugmug.com/don/category/amazon/>

------
vaksel
I like how techcrunch is yet to make a post about this

~~~
fallentimes
They were too busy covering female bloggers featured in Playboy.

------
akd
I run my own server for a hobby project and it goes down as frequently as S3
for various reasons.

------
tom_rath
It's back for us (~7:15 pm Eastern).

Any estimates on total time of the outage?

~~~
tlrobinson
S3 and SQS seemed to first go down around 9:00 am PDT, and its now 4:45 pm
PDT, so about 7-8 hours... not too good. At least it happened to be a Sunday.

------
sh1mmer
It's an interesting question of web services. If you depend on Amazon for your
file storage, big table for your database, yahoo geo, etc then your uptime is
figure is a product of the uptimes of those services.

This means that using 4 services that have a 99.9% SLA actually gives you an
approx uptime of 99.6%. It doesn't sound like much as soon as you include
something like Twitter you can really see the the whole uptime graph skew.

------
mechanical_fish
Oh, _this_ is why Jungledisk is down.

------
cpinto
and this is the reason why you let other people pay to test the infrastructure
"cloud". it's hard to justify not being able to do _anything_ when any AWS
goes down as smugmug must be figuring out by now.

here's a piece of advice: start by leasing a couple of $75 USD per month
servers. if you can, buy instead of lease. if you go bust, you can sell the
hardware on ebay whereas with AWS you can't do any of that it's just money
you're throwing away for 0 assets. AWS still needs to be managed, you still
need sysadmins available 24/7 so you won't save any money there. the only
thing AWS has going for it is provisioning. be smart and take advantage of
that (eg. have your own physical infrastructure and be able to send some of
the load the way of AWS if and when you need to).

~~~
bpedro
Or... use multiple cloud infrastructure solutions creating a fail-over in case
some of them goes down. Think of this as a "Cloud Balancer".

~~~
trezor
And here I thought the whole cloud infrastructure was supposed to provide its
own redundancy.

If you need to first setup your site to work with a cloud, and then need to
add a cloud balancer to guarantee uptime, maybe a regular network load
balancer and old-fashioned solutions might be a better option.

At least then you have a tried and tested solution, not to mention you got it
all under your control so things can actually be fixed.

------
andyking
I did wonder why the Panoramio site they use for Google Maps photos was
playing up. That explains it.

------
hello_moto
Dropbox went down as well. I like Dropbox a lot because it's so simple to use.
Unfortunately, they're what some people called "Amazon S3 re-seller".
Dropbox's heartbeat depends on Amazon.

~~~
pchristensen
Fortunately, with Dropbox you still have the latest version (pre-AWS crash) on
your machine. It's much more usable in a crash than web-only services.

------
zandorg
Boingboing.net went down for me, though I had no problems with Amazon!

