Hacker News new | past | comments | ask | show | jobs | submit login

The reason why it is extremely hard to engineer robust large scale AWS cloud apps can be summarized under the umbrella of performance variance:

  - machine latency varies more, you can't control it

  - network latency varies more

  - storage latency varies more (S3, Redshift, etc.)

  - machine outages are more frequent
where more can be an order of magnitude more variation than on bare metal deployments. I am not saying the performance is that much worse, only that it will unpredictably vary for a certain instance. The interference is non gaussian and can happen in bursts as opposed to easy-to-model-and-anticipate white noise.

It's a lot harder to engineer cloud scale software to scale robustly and not degrade in latency when running on a large amount of nodes. For example, see [1]

Most of open-source cloud software does not come with these algorithms batteries included and it is not trivial to retrofit this kind of logic. Just being smart about loadbalancing won't cut it when at any given moment one of your nodes will become 10x slower than others even though your code is sound and in fact does not slow down like that.

In fact, what you lose in AWS convenience and "free" maintenance, you gain in simpler RPC/messaging/fault tolerance/storage infrastructure that can sometimes accommodate an order of magnitude more traffic or users on a machine then if deployed in AWS.

[1] http://research.google.com/people/jeff/latency.html

Great comment! Thanks a lot for sharing this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact