
The AWS Outage: The Cloud's Shining Moment - turoczy
http://broadcast.oreilly.com/2011/04/the-aws-outage-the-clouds-shining-moment.html
======
strmpnk
This guy is pretty dense to think that his argument is any different for any
system. You control your SLA in traditional datacenters by using multiple
locations or maybe tier 4 isolation.

Now why would we need to worry about the same things everyone else does that
isn't in the cloud? That's right: the cloud doesn't solve these problems. It's
not a shining moment. It's just the fact that in these cases you'd have to be
an idiot to think the cloud somehow elevates you from this problem.

Now what cloud does give you can be valuable, though sometimes marginal in
comparison to problems like having cascading failures outside your control
because everyone happens to be sharing the same network or so on. I'm not
knocking the cloud as much as I'm knocking people who pretend it's actually
elevating computing away from traditional concerns.

------
ollysb
This weeks outage asks an interesting question regarding hosting providers
such as heroku with their "rock-solid ruby platform". I've always taken part
of their offering to be that I don't have to think about infrastructure beyond
the abstraction that they provide. Perhaps incorrectly I've been under the
impression that paying for their service means they will worry about things
like redundancy across availability zones or regions leaving me free to just
build the app.

I'm a huge fan of heroku so I'm really just trying to clarify, are these
services offering to take care of my infrastructure requirements or do I need
to think about adding redundancy/failover etc. myself?

edit: <http://www.heroku.com/how/architecture#routing-mesh> is what set my
expectations.

~~~
cgranade
I think at least part of the point is that the Law of Leaky Abstractions means
you always need to think at least somewhat about the infrastructure, just not
as much as they do.

------
mark_l_watson
I don't disagree with the high level points of the article. Amazon itself was
humming along nicely during the problems in the eastern AWS region.

Decouple using SQS, SNS. Use ELB to split traffic across regions. Rely on S3
and SimpleDB for robust storage.

I am curious about how much of Amazon's internal systems rely on EBS.

~~~
justincormack
ELB will not balance across regions. SimpleDB and SQS are both one region
only, just across AZs. Doing stuff across regions has a whole bunch of latency
and bandwidth tradeoffs, and so most of the Amazon services are tied to a
region.

I suspect Amazon dont use EBS much, avoiding a point of failure, and it is a
newer service.

------
phektus
While one should always anticipate failure and design around it, it makes no
sense to blame the software designer for such a failure. Downtime should be
the failure of the one providing the network resource.
<http://en.wikipedia.org/wiki/Chewbacca_defense>

------
AndyNemmity
It's still down for many people... Reddit came up, but that's not everyone.
Assembla the SVN/Ticketing system is still down.

