Hacker News new | past | comments | ask | show | jobs | submit login
The AWS Outage: The Cloud's Shining Moment (oreilly.com)
44 points by turoczy on April 23, 2011 | hide | past | favorite | 8 comments

This guy is pretty dense to think that his argument is any different for any system. You control your SLA in traditional datacenters by using multiple locations or maybe tier 4 isolation.

Now why would we need to worry about the same things everyone else does that isn't in the cloud? That's right: the cloud doesn't solve these problems. It's not a shining moment. It's just the fact that in these cases you'd have to be an idiot to think the cloud somehow elevates you from this problem.

Now what cloud does give you can be valuable, though sometimes marginal in comparison to problems like having cascading failures outside your control because everyone happens to be sharing the same network or so on. I'm not knocking the cloud as much as I'm knocking people who pretend it's actually elevating computing away from traditional concerns.

This weeks outage asks an interesting question regarding hosting providers such as heroku with their "rock-solid ruby platform". I've always taken part of their offering to be that I don't have to think about infrastructure beyond the abstraction that they provide. Perhaps incorrectly I've been under the impression that paying for their service means they will worry about things like redundancy across availability zones or regions leaving me free to just build the app.

I'm a huge fan of heroku so I'm really just trying to clarify, are these services offering to take care of my infrastructure requirements or do I need to think about adding redundancy/failover etc. myself?

edit: http://www.heroku.com/how/architecture#routing-mesh is what set my expectations.

I think at least part of the point is that the Law of Leaky Abstractions means you always need to think at least somewhat about the infrastructure, just not as much as they do.

I agree and I imagine AWS, will meet this expectation, eventually. Keep in mind that the Cloud idea is still relatively new. Just look at the history of Amazon's Serivces[1] and you can see that this is the direction that they are going. I think this marks a turning point for EBS. I said this same thing on reddit and was mocked but can you really imagine Amazon letting this happen again?

[1] http://en.wikipedia.org/wiki/Amazon.com#Amazon_Web_Services

I don't disagree with the high level points of the article. Amazon itself was humming along nicely during the problems in the eastern AWS region.

Decouple using SQS, SNS. Use ELB to split traffic across regions. Rely on S3 and SimpleDB for robust storage.

I am curious about how much of Amazon's internal systems rely on EBS.

ELB will not balance across regions. SimpleDB and SQS are both one region only, just across AZs. Doing stuff across regions has a whole bunch of latency and bandwidth tradeoffs, and so most of the Amazon services are tied to a region.

I suspect Amazon dont use EBS much, avoiding a point of failure, and it is a newer service.

While one should always anticipate failure and design around it, it makes no sense to blame the software designer for such a failure. Downtime should be the failure of the one providing the network resource. http://en.wikipedia.org/wiki/Chewbacca_defense

It's still down for many people... Reddit came up, but that's not everyone. Assembla the SVN/Ticketing system is still down.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact