

30% of Servers Are Sitting “Comatose” - r721
http://anthesisgroup.com/30-of-servers-are-sitting-comatose/

======
ChuckMcM
Nice ad, not.

It is interesting to look at server utilization, Google had a very publicized
plan for power driven computing (basically you switched on more machines as
you needed them, turned them off when they were idle). Which kind of works and
kind of doesn't when you consider the complete picture of computational
dynamics. Of the many resources requires (memory, compute, storage, and
networking) only compute is easily "turned off" and "turned on" again. Next
best is memory with many non-server memory systems allowing for a low power
standby mode, disks and networks are both pretty much a massive latency hit
when powered down.

And oddly latency is the fidelity of internet interaction. We think of things
with low latency and rapid response as "good" and things which have high
latency and slow response as "bad". Just as an amplifier with poor frequency
response is 'muddy' and one that has high frequency response 'crisp'. At scale
your idle machine are essentially latency capacitors waiting for that sudden
burst of packets that happens during the commercial breaks.

It is an area that is ripe for improvement but it needs some pretty deep
thinking to go along with that.

~~~
noir_lord
> disks and networks are both pretty much a massive latency hit when powered
> down.

Is that still true with SSD's? I can't imagine they take anything like as long
to come on as spinning rust but I've no idea so now I'm curious :).

~~~
thrownaway2424
Many SSDs have tremendously long startup consistency checks. An unclean
restart of a FusionIO card used to take an insane amount of time (I haven't
tried it in years; it could be better now).

~~~
Ardren
What is 'long' in this context? (milliseconds, seconds, or minutes?)

------
rythie
If you read the findings, it's basically a 3 page "case study" (i.e. advert),
the last page of which is a screenshot of "TSO Logic" \- which claims to solve
this. One reference is from 2008 and another is TSO Logic's own data.

There probably is wastage, though it would nice to see a proper, modern,
scientific study on this though.

~~~
r721
Here is NRDC's study from 2014, looks like it's more solid:
[http://www.nrdc.org/energy/files/data-center-efficiency-
asse...](http://www.nrdc.org/energy/files/data-center-efficiency-assessment-
IP.pdf)

Originally I submitted Computerworld story [1], which is a summary of a few
studies (including NRDC's), but that submission's url got changed to pdf, and
today I got mod-email that it's "a good story to repost", so I posted
announcement this time.

[1] [http://www.computerworld.com/article/2937408/data-
center/1-i...](http://www.computerworld.com/article/2937408/data-
center/1-in-3-data-center-servers-is-a-zombie.html)

------
shiftpgdn
Once upon a time I worked at major hosted server provider. Our accounting and
billing systems were so dysfunctional we had no method of closing out unused
servers to let them be re-used for new customers.

In short we were building new servers every time a new order came in and when
a customer cancelled or stopped paying their server would continue to sit
idle. An audit was eventually run and something like 25% of our tens of
thousands of servers were simply sitting there waiting for a new client.

I think at this point our billing developer was fired but this was so long ago
the details have become fuzzy.

~~~
philtar
I'm surprised by the fact that that was allowed to happen and the company
hadn't gone bankrupt

~~~
toomuchtodo
Depending on the hardware you're buying, and your margins, dedicated servers
were/are highly profitable. More margin = bad business practices can survive
longer.

------
dopeboy
Though not directly related, I wonder how much web hosts like DO and Dreamhost
take idle VPSes into account when thinking about their business. I imagine
they're counting on some percentage of their users with the hobby plans to
just have one lying around just "in case". The idle stats also probably help
them claim "unlimited bandwidth & storage" (in the case of Dreamhost anyways).

~~~
toomuchtodo
I've both worked for hosting companies, and was also a founder of one ~10
years ago. You most certainly oversell your equipment on shared plans, as its
the only way to make money at scale ("we'll make it up on volume!"). Its
difficult with bandwidth, but your typical brochureware/static/joomla/drupal
site is going to sit idle most of the time. Also, very few sites are going to
consume a significant amount of storage space, which is easily mitigate with
shared filesystems (NFS back in the day, more likely Gluster or Ceph now
depending on use case).

Today this lives on of course: you'll notice on AWS' EC2 pricing you can get
memory and CPU time commits, but Amazon doesn't give you any sort of bandwidth
guarantee for your instance (I don't mean transfer in the aggregate sense, but
I mean how many MB/s your instance can sustain).

~~~
jldugger
For the love of god, do not deploy your webapp on gluster. Such regret.

~~~
toomuchtodo
I keep hearing that unfortunately. Distributed POSIX filesystems are _hard_ ,
although in my opinion NFSv4 has come a long way (which is why Amazon is using
it for its new Elastic Filesystem offering).

------
andmarios
I would like more information on how they reached this conclusion. “Comatose
servers are those that have not delivered information or computing services in
six months or more” doesn't tell me enough.

Are these servers unsold resources? Are they hot spares? How many bytes of
traffic and how many CPU cycles count as computing services? How much do they
cost versus the cost of the whole datacenter including employees? Looking at
other industries, is a 70% yield really that low?

~~~
jldugger
The paper makes it clear they're not counting servers waiting to be
provisioned. But they're not so clear on how the tool makes the distinction,
or how they know to exclude hot spares.

------
SloopJon
I'd be surprised if it made a difference, but the definition of a dead or
comatose server would seem to include failover cluster members.

~~~
jpollock
Most organisations I've worked for were scaled N+1 for day-to-day reliability
with 2N for disaster recovery. This meant that the total servers required were
2(N+1). You get even more idle servers when you consider that they were scaled
for New Year's eve traffic, which is typically 10x regular peak traffic (phone
system).

30% idle is easily achievable if you're doing proper disaster recovery
(recovery when the entire office is gone due to earthquake, flood, fire, etc).

------
gukov
This is what being a "green" hosting company should mean: not having a one
third of your servers sitting idle.

~~~
bhauer
Agreed. Plus encouraging developers to use higher-performance platforms and
frameworks in order to meet load requirements with fewer servers.

------
Terr_
In any complex system you'll start seeing "idleness" and "waste"... which are
sometimes a "just in case" capacity and maintenance-cost tradeoff that isn't
immediately obvious.

P.S.: I bet there are some biological analogies are apropos. How long would
you live if everything was redlined?

------
fragsworth
By price? Or by number? This is an important distinction. I have several tiny
VPS servers sitting idle, but they cost less than $10/month each. I do not
have any giant AWS machines sitting idle, though.

------
djchie
testing

------
kaolinite
I wish DigitalOcean would have one or two instances of each type (perhaps only
the most popular distros, to avoid having too many unused instances sitting
around) so that creating a new VPS is instant. Of course, some of the time -
when multiple people are setting up instances - you'd have to wait the normal
60 seconds. But sometimes, it would be instant. That would be great.

~~~
bbcbasic
Your idea is good but not relevant to the posted article, which is about
physical server utilization.

You could probably implement your idea just for yourself with a script though.

~~~
kaolinite
You're right - I should have pointed out that it was only a tangentially
related thought. Something I think about whenever I boot up an instance.

