

Has Amazon EC2 become over subscribed? (2010) - vishal0123
http://alan.blog-city.com/has_amazon_ec2_become_over_subscribed.htm

======
dotBen
_I'm loathed to perpetuate a 3 year old article, but..._

One of the key contributing factors to this kind of network degradation of AWS
(/or other cloud vendor) is the abundance of the "bad neighbor test" - where a
client performs tests to see if they can achieve a _'preferred'_ amount of
CPU/IO on the host of their new instance.

Resource sharing rules at the host level actually means that if everyone is
trying to max out their instance, you would still get the equal share you are
entitled to and guaranteed with your instance, so what the bad neighbor test
really means is whether you can actually go into your neighbor's CPU
allocation due to their under-use.

Well, if _everyone_ does that then the system degrades as _someone_ has to a
party using less than their allocation and the amount of instance spots that
don't fail the 'bad neighbor test' become non existant.

The overall health of the entire network would actually be better if folks
didn't do this practice and instead everyone simply evened out their use
across their instances that enjoy additional resources and stuck it out with
the %age of their instances that only achieved their guaranteed minimum
resource use and no more.

My company uses another "cloud-like" vendor and although we don't perform 'bad
neighbor' tests upon new instances, it is fair to say our application benefits
from the fact that the majority of the instances on their network are under-
utilized and we can push into the max CPU of the host beyond the limits we pay
for. Where instances do share 'bad neighbors' (ie we can only get what we paid
for and no more - boo hoo, etc) we still keep the instance but simply route
around that and distribute less load than on other nodes in our network.

That doesn't become the most cost effective mechanism, but the "savings" of
the _'bad neighbor test'_ are probably negligible and ironically by not doing
this we become the "good neighbors".

~~~
jamesaguilar
Actually, if the system is well-isolated and correctly subscribed, at worst
you'll get exactly your reservation. It is not _necessarily_ the case that
someone has to lose.

~~~
dotBen
Right. The problem is that a 'bad neighbor' test is often defined as when one
can _only_ get that guaranteed minimum amount of resource.

That's the part that doesn't scale.

~~~
michaelt
The amazon documentation doesn't let you pin down exactly what that guaranteed
minimum is, as half the measures are things like 'IO Performance: Moderate' or
'Compute Units: 4' or aren't specified at all (like EBS performance)

Makes sense from Amazon's perspective, of course - less promises to keep, more
flexibility.

Can't blame people for measuring the performance empirically, in the absence
of hard guarantees. Just that produces results that happen to be wrong.

~~~
nucleardog
Compute units are actually a quantitative unit. One compute unit is, to the
best of my recollection as I'm on mobile, the equivalent work of a specific
class of 1.7GHz CPU.

Other than the "I/O Performance", I think all of their specs are pretty well
defined if you're willing to dig up the appropriate docs.

~~~
michaelt
I suppose it depends on whether your application depends on ephemeral disk
random or sequential I/O, EBS I/O, I/O to the internet at large, cpu cache,
ram bandwidth, support for AVX instructions and so on.

To be fair it's understandable why Amazon doesn't promise these features will
or won't be present - it would make their already-complicated product offering
even more complicated. And for a great many applications, customers won't be
sensitive to details like CPU cache and disk performance.

------
trotsky
I was trying to figure out how this guy could be so behind the times - just
discovered internal network latency at AWS?

Then I saw:

Published: 9:01 AM GMT, Tuesday, 12 January 2010

~~~
ultimoo
Yes, the year of the blog should definitely be present in the HN title.

That said, what is the deal with the internal network latency and the other
points the OP touched upon today? Did Amazon get around to resolving things,
or have things worsened?

~~~
seldo
My non-rigorous impression is that latency is roughly the same now as it was
in 2010, i.e. not great, but at least it hasn't been getting worse.

------
niggler
Is there any scale (like xlarge) where you are effectively the only VM on a
particular physical machine (thereby obviating the issue with the small
instances)?

~~~
staunch
...and here's part of why I created Uptano.

<https://uptano.com>

The flexibility of usage-based billing and instant provisioning is awesome,
but it's really not worth giving up dedicated performance IMHO.

~~~
jamesaguilar
I upvoted because I love HN comments that offer solutions. That said . . . how
can you charge half as much as Amazon for the same service and make a profit.
I assume their margins are not that fat, so what are you cutting that they
offer? Or else what secret have you discovered that no one knows? I guess
since this is your business there's a chance that you won't answer, but I'm
curious.

~~~
RyanZAG
AWS uses high end, expensive enterprise grade parts (I believe), meanwhile
uptano is likely using standard off the shelf parts, likely with bulk account
discounts. Each of those servers could likely be put together for $500 - at
$100/month, he should be making very good margins after a year or so. I have
no idea on rent/electricity/etc, but with enough 1U servers the total cost per
server may be only $10/month or so.

The big costs you pay for on AWS are the engineering, networking and UI
development. Dedicated servers should be easier to provision and manage and he
probably has a much smaller team.

So theoretically, it is possible that his prices are half as much as Amazon
and he still makes a decent profit.

------
rattray
Is there a cloud provider that doesn't (yet) have these issues?

Is it possible to run any cloud service at Amazon's scale without these
issues?

(genuine questions speaking from a point of ignorance)

~~~
rdl
Some of the OpenStack providers run their own storage networks using
conventional SAN tech. Super expensive but more consistently performant.

~~~
dmpk2k
My experience with SANs is that they are anything but consistent. Local
storage is a better idea: fewer moving pieces to go wrong, fewer moving pieces
to understand and debug, fewer possible sources of contention, and the latency
is low.

SANs in a cloud environment optimize for the wrong thing. Servers by and large
have a high uptime -- since their falling over is comparatively rare, this is
simply a problem I've never had difficulty with. What I _have_ had in spades,
before I learned better, were database problems due to wild fluctuations in
latency to the SAN.

It doesn't help that when SANs kick the bucket, they tend to affect a lot of
things.

~~~
rdl
The context where SANs make sense, IMO, is when you've got a few servers which
need to share stuff (VMs, or whatever). So, essentially everything can fit on
one $10k 10GE switch. I've personally never screwed with anything >800TB, too.

Rather than "strictly local storage", I'd say "keep storage as local as
possible", but there are absolutely times where keeping it in-chassis isn't
optimal.

------
rbc
One think that I think may be oversubscribed at EC2 is the API layer for
controlling things like instances, ELB's and autoscaling. This seems to be
most obvious in Virginia. During the lightning storm in Virginia last summer,
it seemed like API access fell off a cliff. I'm guessing that was because
everyone was trying to move their services from the impacted availability
zone.

------
JoshGlazebrook
I've had these issues with the micro instances before. One deployment will be
so sluggish it's almost unusable, but starting up another one results in one
that is just fine (micro wise).

------
rabbidruster
While I understand computer scientists are not always the best writers,
phrases like "Amazon do have a breaking point" make it hard to continue
reading this article.

~~~
pcl
In British English, a company is plural, not singular. So this construct is
correct in some dialects. And he also uses 'armour', which is again a British
construct, so I'd guess he's just not using American English.

(I wonder if British law considers a corporation to be a person to the degree
that US law does, or if this plural view of a corporation is pervasive in law
as well as in grammar.)

~~~
mpclark
This is oft-repeated, but not actually true. Most people in the UK who care
about these things (for instance, sub editors in the printed press) hold that
companies are singular entities. The confusion comes with sports teams, which
are commonly referred to in the plural, so "Microsoft is" but "Manchester
United are".

Edit: For example, here's what the Guardian's style guide says on the matter:

<http://www.guardian.co.uk/styleguide/c#id-3022716>

Of course, with something as flexible and constantly evolving (and as used and
abused) as the English language, it is usually possible to find examples and
counter-examples for just about anything. There are also edge cases; the
Guardian refers to police forces as plural entities, I believe. Suffice to
say, when I was running a back bench, singular was the order of the day when
it came to company names.

~~~
diroussel
I'm British, and I always thing of groups of people as groups of people.

But then I never won any awards for grammar.

~~~
mpclark
Are you trying to say you disagree but you think you're probably wrong?

~~~
diroussel
I don't think it's a case of right or wrong. It's a writing style.

But for me, when speaking and writing consider companies to be groups of
people, and so I use plural. And I think that is common over here in Britain.

