Hacker News new | past | comments | ask | show | jobs | submit login
Has Amazon EC2 become over subscribed? (alan.blog-city.com)
97 points by blasdel on Jan 12, 2010 | hide | past | web | favorite | 15 comments

It really bugs me that Amazon rarely admits the many faults that go on within their cloud. The status page just shows all green with no notes, even when you see multiple major sites drop off the web, report problems on Twitter, etc.

Numerous times I've had EC2 and EBS go out of contact (or simply have huge network latency, essentially the same). Since my instances run off RAID arrays of EBS volumes, essentially everything dies until EBS reappears.

If true, does this sound like Amazon is violating the terms of use? I thought they guarantee to provide a certain level of service, such as CPU percentage, regardless of your neighbors.

I am not a virtualization wizard, but I always thought Xen and other hypervisors have CPU usage caps. If you're not getting built-in protection from noisy neighbors then VPS becomes just a crappier version of good old shared hosting. True? False?

P.S. I've been Slicehost user for 3+ years and all instances I've ever had there exhibited pretty much expected performance.

I thought they guarantee to provide a certain level of service, such as CPU percentage, regardless of your neighbors.

If they guarantee X performance but all along they were providing 2X and now they're only providing X, customers will complain. There are different ways to configure Xen so it's not clear exactly what Amazon is doing.

[...] so it's not clear exactly what Amazon is doing.

Shouldn't Amazon be providing customers with reports to this effect? It can't be that difficult to say that within each 10-minute interval i you got X_i% of a CPU, received N_i packets, etc.

This is the sort of data Amazon should collect themselves to analyze the systems within the cloud, so the customer never (OK, rarely) sees a problem.

The author mentions the "trick" of closing a poorly performing instance and re instantiating it again in the hope of locating away from bad neighbor.

This sounds reasonable, but the method could be improved by instantiating a new instance first and then removing the old one - this ensures you don't instantiate in the same location as before.

I don't know EC2 architecture well enough to know, but there may even be some ways of telling if your new instance is located in the same problematic instance as the original one (perhaps by tracerouting the problem instance and finding it to be v near by). If this happens presumably you can instantiate a 3rd and repeat until you find a suitable instance at which point you kill the others.

Actually, "Hey You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds" (http://people.csail.mit.edu/tromer/papers/cloudsec.pdf) documents "the tendency for EC2 to assign fresh instances to the same small set of machines", and exploits this tendency in order to co-locate a malicious instance with a victim instance. Rapid re-instantiation seems like a bad way to improve performance.

Interesting but I don't see how he can conclude "[Amazon EC2 has] deep rooted scalabilty problems at their end". One of the downsides of cloud computing is that you don't really know what's going on with the lower layers.

Incidentally, not knowing what's going on with the lower levels is also the primary benefit of cloud computing.

There's a difference between not wanting to know (lack of interest and/or lack of education) and not being able to know.

See the link cross-posted by timf.

The author does not provide any attempt at measurable difference between now and then, and just goes by the "feeling" that it is becoming slower. While human feelings are valid for a personal dislike of some product, I don't see how this "article" can be even linked here.

Our "feelings" are different, and since the author does not provide a single number and small ec2s still perform the way they always did I can only conclude that their software or web app is becoming bloated, or they dont know how to measure.

We use EC2 for our site and our system monitoring that checks TCP connectivity on our elastic ip every few minutes reports outages almost every day now. Have tried the forums but no one seems to know how to troubleshoot this.

I have a feeling this is why reddit feels slower nowadays.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact