Hacker News new | past | comments | ask | show | jobs | submit login
Has Amazon EC2 become over subscribed? (2010) (alan.blog-city.com)
141 points by vishal0123 on April 22, 2013 | hide | past | favorite | 72 comments

I was trying to figure out how this guy could be so behind the times - just discovered internal network latency at AWS?

Then I saw:

Published: 9:01 AM GMT, Tuesday, 12 January 2010

Yes, the year of the blog should definitely be present in the HN title.

That said, what is the deal with the internal network latency and the other points the OP touched upon today? Did Amazon get around to resolving things, or have things worsened?

My non-rigorous impression is that latency is roughly the same now as it was in 2010, i.e. not great, but at least it hasn't been getting worse.

Have there been any improvements since then?

No, it's still common to have in-house "find a good node" code that fires up and takes down machines until one that will be reliable is found.

Can you enlighten us? Why is it so slow? Are some people abusing all of that free bandwidth? Is their LAN simply not built to handle much traffic?

I don't have any inside information, but I would guess it's because their san is so sketchy everyone rolls their own DFS and uses other chatty replication strategies, leaving the network running data and disk on the same topology. These days a lot of that traffic will take 4 hops within an AZ.

I'm loathed to perpetuate a 3 year old article, but...

One of the key contributing factors to this kind of network degradation of AWS (/or other cloud vendor) is the abundance of the "bad neighbor test" - where a client performs tests to see if they can achieve a 'preferred' amount of CPU/IO on the host of their new instance.

Resource sharing rules at the host level actually means that if everyone is trying to max out their instance, you would still get the equal share you are entitled to and guaranteed with your instance, so what the bad neighbor test really means is whether you can actually go into your neighbor's CPU allocation due to their under-use.

Well, if everyone does that then the system degrades as someone has to a party using less than their allocation and the amount of instance spots that don't fail the 'bad neighbor test' become non existant.

The overall health of the entire network would actually be better if folks didn't do this practice and instead everyone simply evened out their use across their instances that enjoy additional resources and stuck it out with the %age of their instances that only achieved their guaranteed minimum resource use and no more.

My company uses another "cloud-like" vendor and although we don't perform 'bad neighbor' tests upon new instances, it is fair to say our application benefits from the fact that the majority of the instances on their network are under-utilized and we can push into the max CPU of the host beyond the limits we pay for. Where instances do share 'bad neighbors' (ie we can only get what we paid for and no more - boo hoo, etc) we still keep the instance but simply route around that and distribute less load than on other nodes in our network.

That doesn't become the most cost effective mechanism, but the "savings" of the 'bad neighbor test' are probably negligible and ironically by not doing this we become the "good neighbors".

Actually, if the system is well-isolated and correctly subscribed, at worst you'll get exactly your reservation. It is not necessarily the case that someone has to lose.

Right. The problem is that a 'bad neighbor' test is often defined as when one can only get that guaranteed minimum amount of resource.

That's the part that doesn't scale.

The amazon documentation doesn't let you pin down exactly what that guaranteed minimum is, as half the measures are things like 'IO Performance: Moderate' or 'Compute Units: 4' or aren't specified at all (like EBS performance)

Makes sense from Amazon's perspective, of course - less promises to keep, more flexibility.

Can't blame people for measuring the performance empirically, in the absence of hard guarantees. Just that produces results that happen to be wrong.

Compute units are actually a quantitative unit. One compute unit is, to the best of my recollection as I'm on mobile, the equivalent work of a specific class of 1.7GHz CPU.

Other than the "I/O Performance", I think all of their specs are pretty well defined if you're willing to dig up the appropriate docs.

I suppose it depends on whether your application depends on ephemeral disk random or sequential I/O, EBS I/O, I/O to the internet at large, cpu cache, ram bandwidth, support for AVX instructions and so on.

To be fair it's understandable why Amazon doesn't promise these features will or won't be present - it would make their already-complicated product offering even more complicated. And for a great many applications, customers won't be sensitive to details like CPU cache and disk performance.

Seems like the simplest solution is simply to pay for what you need. How many businesses are really compute-cost-bound these days?

Since you are well-versed with this issue -- I know that Amazon offers tenancy options while creating virtual machines. Is this option not utilized because the price of single-tenancy is higher than the headache of "bad neighbor test"?

So as I mentioned we don't use EC2, we use another cloud service, but the economics and technical issues are the same.

But on EC2 Dedicate Instance (ie single tenancy) my guess is that if your application (or, business model) relies on each of your nodes being able to utilize more than your equal share on the host then in fact you would NOT want more than one of your instances to exist on the same physical host, in order to maximize the chances that each instance can grab all of the resources on it's given host.

If this is your model, having all your instances on the same physical host would be disastrous. In fact, there's (economic) argument for Amazon offering customers the complete opposite - pay to guarantee that no two instances are ever instantiated on the same physical host.

Is there any scale (like xlarge) where you are effectively the only VM on a particular physical machine (thereby obviating the issue with the small instances)?

...and here's part of why I created Uptano.


The flexibility of usage-based billing and instant provisioning is awesome, but it's really not worth giving up dedicated performance IMHO.

I upvoted because I love HN comments that offer solutions. That said . . . how can you charge half as much as Amazon for the same service and make a profit. I assume their margins are not that fat, so what are you cutting that they offer? Or else what secret have you discovered that no one knows? I guess since this is your business there's a chance that you won't answer, but I'm curious.

AWS uses high end, expensive enterprise grade parts (I believe), meanwhile uptano is likely using standard off the shelf parts, likely with bulk account discounts. Each of those servers could likely be put together for $500 - at $100/month, he should be making very good margins after a year or so. I have no idea on rent/electricity/etc, but with enough 1U servers the total cost per server may be only $10/month or so.

The big costs you pay for on AWS are the engineering, networking and UI development. Dedicated servers should be easier to provision and manage and he probably has a much smaller team.

So theoretically, it is possible that his prices are half as much as Amazon and he still makes a decent profit.

Amazon reserved instances cost 2 to 10 times the price of renting a private dedicated server at your random dedicated server hosting service.

Compare youself: https://www.ovh.com/us/dedicated-servers/

That service looks absolutely awesome. Just signed up and had a look - but I miss a major thing: API.

The system is built on an internal API. We just need to expose a version!

I've been looking for this for...ages...

Thank you...so much for posting this.

This still uses virtualization though, correct? So you don't get full dedicated performance?

We're using OpenVZ for this reason. It's very close to bare metal performance, with the advantages of virtualization.

You can pay a (hefty) fee for dedicated tenancy, where your EC2 instances only share servers with other instances in your account.

The largest instances (the quad- and octuple-extra-large) likely are on their own servers but I've never seen that explicitly confirmed anywhere.

You can pay extra for dedicated Instances: http://aws.amazon.com/dedicated-instances/

> Dedicated Instances are Amazon EC2 instances launched within your Amazon Virtual Private Cloud (Amazon VPC) that run hardware dedicated to a single customer.

$10/hour per region? Wow.

To be fair, though:

"An additional fee is charged once per hour in which at least one Dedicated Instance of any type is running in a Region."

So it's not like they are charging $10/hr/instance (so it amortizes over all of the dedicated instances)

Still, that's 100k a year, not including the cost of the actual instances. Not an option for smaller services unless your stuff simply can't be made to work without dedicated instances.

I doubt it would ever make sense to use this over dedicated servers, except at a large scale.

c1.xlarges are thought to be effectively the only VM on a machine. I can't dig up the article that proposed the methodology for determining offhand.

c1.xlarges still underperform far under what I'd expect.

Are VMs ever really performant?

Compared to the 26 ECU m3.2xl, the 20 ECU c1.xl underperforms it by 3x (basic web, JSON struct) in my tests.

Modern hypervisors have an overhead as low as 2%, so yes, they absolutely are. It is often worth virtualizing simply for the flexibility/management that it bring, even if it's a single virtual machine on a very large box.

Somewhere between uptano and EC2 is internap (formerly Voxel.net) agileservers: http://www.internap.com/agile/

Is there a cloud provider that doesn't (yet) have these issues?

Is it possible to run any cloud service at Amazon's scale without these issues?

(genuine questions speaking from a point of ignorance)

Some of the OpenStack providers run their own storage networks using conventional SAN tech. Super expensive but more consistently performant.

My experience with SANs is that they are anything but consistent. Local storage is a better idea: fewer moving pieces to go wrong, fewer moving pieces to understand and debug, fewer possible sources of contention, and the latency is low.

SANs in a cloud environment optimize for the wrong thing. Servers by and large have a high uptime -- since their falling over is comparatively rare, this is simply a problem I've never had difficulty with. What I have had in spades, before I learned better, were database problems due to wild fluctuations in latency to the SAN.

It doesn't help that when SANs kick the bucket, they tend to affect a lot of things.

The context where SANs make sense, IMO, is when you've got a few servers which need to share stuff (VMs, or whatever). So, essentially everything can fit on one $10k 10GE switch. I've personally never screwed with anything >800TB, too.

Rather than "strictly local storage", I'd say "keep storage as local as possible", but there are absolutely times where keeping it in-chassis isn't optimal.

There are some using Ceph for the volume service who shouldn't be terribly expensive. Dreamhost for example.

Which providers are they? Do you have any experience of using them?

One think that I think may be oversubscribed at EC2 is the API layer for controlling things like instances, ELB's and autoscaling. This seems to be most obvious in Virginia. During the lightning storm in Virginia last summer, it seemed like API access fell off a cliff. I'm guessing that was because everyone was trying to move their services from the impacted availability zone.

I've had these issues with the micro instances before. One deployment will be so sluggish it's almost unusable, but starting up another one results in one that is just fine (micro wise).

While I understand computer scientists are not always the best writers, phrases like "Amazon do have a breaking point" make it hard to continue reading this article.

In British English, a company is plural, not singular. So this construct is correct in some dialects. And he also uses 'armour', which is again a British construct, so I'd guess he's just not using American English.

(I wonder if British law considers a corporation to be a person to the degree that US law does, or if this plural view of a corporation is pervasive in law as well as in grammar.)

This is oft-repeated, but not actually true. Most people in the UK who care about these things (for instance, sub editors in the printed press) hold that companies are singular entities. The confusion comes with sports teams, which are commonly referred to in the plural, so "Microsoft is" but "Manchester United are".

Edit: For example, here's what the Guardian's style guide says on the matter:


Of course, with something as flexible and constantly evolving (and as used and abused) as the English language, it is usually possible to find examples and counter-examples for just about anything. There are also edge cases; the Guardian refers to police forces as plural entities, I believe. Suffice to say, when I was running a back bench, singular was the order of the day when it came to company names.

I don't think sub-editors are a compelling example: people in those positions are likely to ignore the language they learned growing up and stick instead to some made up rules they believe are 'more proper'. I have definitely heard British people say 'my bank are' and 'ASDA are'.

Yes, of course you have heard people commit these errors. Others say "haitch" for the lettter 'H' and still others talk about "nu-cu-lar power". We ain't all educated proper, that's for sure.

I'm intrigued that you are ready to rule out the contributions of a class of people who manipulate the written word for a living (and debate usage among themselves to the point of distraction!)

ASDA? Well, I'd use "Asda" since it is pronounced as a word, not four initials. But the company itself still seems to be struggling for consistency on that...

I'm British, and I always thing of groups of people as groups of people.

But then I never won any awards for grammar.

Are you trying to say you disagree but you think you're probably wrong?

I don't think it's a case of right or wrong. It's a writing style.

But for me, when speaking and writing consider companies to be groups of people, and so I use plural. And I think that is common over here in Britain.

I'm British ...

That should be explanation enough.

If it's the plural conjugation you're complaining about, know that that's correct in British English: http://english.stackexchange.com/a/1339/50.

Thanks for the link. It sounded extremely weird to me. I didn't know corp were often plural nouns in British English.

For some more fun with American English, try http://fine.me.uk/Emonds/

Interesting link.

I'm not a linguist but parsing these sentences as an American reader, I feel a bit like this page's examples are playing games with word order and omission. The cited "prestige" grammar sounds less intuitive to me not because of the pronoun used, but because of word order and words that are left out.

Example cited as correct prestige grammar:

> They didn't give anyone that worked less than she a raise.

That sounds a little weird to my American ears, but "worked less than she did" sounds totally correct.

"Worked less than her" (cited as correct non-prestige) sounds a bit casual and informal, not sounding too jarring but not what I'd expect in decent writing. Similar to the other example of "us commuters". If I'm talking to someone I wouldn't blink if they said this, but I wouldn't see it in the New York Times. (Though this also reminds me of phrases like "me too" or "it's me", which despite being inconsistent with distinctions between subject and object in other phrases, you'd hear a lot more than "I as well" or "it is I".)

> Mary and him are late.

Sounds very wrong to me.

Thinking back to my childhood it was pretty common for kids to be a bit "confused" about using pronouns this way before 10 years old or so, so maybe there is something to the author's statement that kids learn the non-prestige form and then the educated ones are "corrected" later.

> Mary and he are late.

This still sounds weird. I'd say "he and Mary are late".

> her and us

> she and we

These sound pretty clumsy regardless of which is supposed to be used.

Your arrogance is astounding. This is why people hate Americans.

An English person, writing English in an English way, and you say you're not going to continue reading. Do you require him to write like an American? Why should an English person, writing their own language, have to follow your conventions?

"I didn't know"

Well then don't start shouting your mouth off! If you don't know, keep quiet.

I had a hard time reading that phrase, and I thought it was because he didn't spend enough time editing the article. I like to think I am a cultured person, but I truly didn't know it was grammatically correct in British English. Don't judge all Americans just because I am naive.

Your misunderstanding and dramatic response is astounding. It's ignorance, not arrogance. Say what you want about his reaction to the perceived poor grammar but it's nothing more than that.

If he didn't understand why it was written like that, how about trying to find out why, before posting a snarky message?

For the same reason you didn't bother to figure out whether it was ignorance or malice on his part before jumping to your own convenient conclusion, I would imagine.

Again sorry I offended you. I wont make this mistake in the future.

Wow you sound extremely angry. There must be a deeper issue at play here because this is way out of proportion to the comment you are responding to.

It's common in the UK to use "do" when referring to a collective. The author is from the UK.

I'm curious what you find wrong with that sentence.

We're from America. We speak Murican.

Might have been using it like "The cattle do graze", or "Black Sabbath do have a new album coming out".

Come on. Quit downmodding people for a common misunderstanding. To any American "Amazon do" really does sound off.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact