
A Rare Tour of Microsoft’s Hyperscale Datacenters - Katydid
http://www.nextplatform.com/2016/09/26/rare-tour-microsofts-hyperscale-datacenters/
======
ChuckMcM
_" What came out of this was a realization that we were really building large
air conditioners, that we were not in the IT business but in the industrial
air conditioning business when you build a facility like this."_

This was one of the big secrets that Google learned early on. Every bit of air
you cool that isn't going into a computer is wasted. The issue is that co-
location facilities have to be ready for any kind of user equipment, but if
you own the entire data center and all the equipment inside you can design it
differently and _much_ more efficiently. It stops being computers in a
building and starts being a building sized computer.

------
chirau
Funny story about datacenters, I worked on optimizing algorithmic performance
for 4 years at one the big 4 but I was never allowed in a datacenter because I
didn't have a green card. So my team would go on and explore whilst i had
beers in the lobby.

~~~
avar
Can you elaborate on that? Wasn't it a private facility? Why is your green
card status relevant to that? Were they hosting government assets and that was
a requirement on the government's part? In that case odd that it's "green
card" and not "citizenship".

~~~
optimuspaul
maybe by "green card" they mean a green light didn't light up when he scanned
his access card. <badum-ching>

~~~
user5994461
He likely means the "green card" as in the green card.

Datacenters are sensitive facilities. It's not surprising that they block
entrance to many people.

From what I've seen in Europe, I can remember a couple of places in military,
datacenters and national research centers where it's clearly written "European
Only" on the jobs.

~~~
mynameislegion
Whats the green card?

~~~
jodrellblank
[https://www.usa.gov/green-cards](https://www.usa.gov/green-cards)

"A Green Card (Permanent Resident Card): Gives you official immigration status
in the United States. Entitles you to certain rights and responsibilities. Is
required if you wish to naturalize as a U.S. Citizen"

------
danohu
It will be interesting to see how these improvements trickle down.

At the moment, a few providers have the scale and skills to run datacenters
much more efficiently. But I'm guessing that within a few years there will be
some generic datacenter-in-a-container available, with efficiency not much
inferior to the big four.

At that point, we go back to the hosting market of 15 years ago. Everybody can
offer a datacenter without deep technical knowledge, and sell compute cycles
on an open cloud market.

It's like all the tiny hosting providers, except that it requires more
capital. So it becomes financialised -- if you can get cheap power, low
temperatures and good connectivity, and borrow a few million dollars cheaply,
then you're in business. But margins collapse precisely because nobody can do
it.

In the end, once we get over the transition period of this move to cloud
everything, datacenters end up like utilities

~~~
unoti
One issue with data centers in a box and scaling is physical security. It's
possible to pack these data centers into containers, but physical access
controls and site security is something in which there are cost benefits to
grouping lots of them together. Also there are big benefits to locating these
things next to cheap power. So it ends up being more economical for a variety
of reasons to have these things a bit less tiny and geographically distributed
using today's technologies.

------
eigenvalue
What really struck me from this is the complete move over to software based
networking. If all the big cloud players do this, and if you think a lot of
infrastructure will be moving over to the cloud (because of cost pressures)
over the next few years, what does that mean for Cisco and other big sellers
of hardware based networking gear? Is that whole business going to go away?

~~~
brazzledazzle
I guess they'll still have their office gear like SMB, firewall, WAN, etc. to
lean for a while.

~~~
wmf
People are already starting to talk about whiteboxing the branch office by
running all the "value" as VMs (VNFs) on a generic x86 box.

~~~
brazzledazzle
Even in that scenario they still have a bit of an edge because they can still
take advantage of their lead on software by turning their stuff into virtual
appliances.

~~~
jodrellblank
Which they already do offer:

[http://www.cisco.com/c/en/us/products/security/virtual-
adapt...](http://www.cisco.com/c/en/us/products/security/virtual-adaptive-
security-appliance-firewall/index.html)

[http://www.juniper.net/uk/en/products-
services/security/srx-...](http://www.juniper.net/uk/en/products-
services/security/srx-series/vsrx/)

[https://www.sonicwall.com/products/sra-virtual-
appliance/](https://www.sonicwall.com/products/sra-virtual-appliance/)

[http://www.watchguard.com/products/xtmv/overview.asp](http://www.watchguard.com/products/xtmv/overview.asp)

------
DINKDINK
1.02 Winter PUE is very impressive.

I took a class on design for low PUE implementation and some comments from
government data center technicians who said, in order to comply with the
federally mandated PUE requirements, that people were leaving on or turning on
zombie boxes to up their IT load.

~~~
sithadmin
>I took a class on design for low PUE implementation and some comments from
government data center technicians who said, in order to comply with the
federally mandated PUE requirements, that people were leaving on or turning on
zombie boxes to up their IT load.

AFAIK, the DCOI sets a target PUE of 1.5 _or less_ , so running unnecessary
workloads to meet the target PUE doesn't make much sense. I would bet there
was some other kind of tomfoolery going on (hiding the fact that the DC
overspent on efficiency when building out/upgrading the facility, or something
along those lines).

~~~
jdcarter
I took the parent's comment to mean: the datacenter had a relatively fixed
amount of power going toward the infrastructure, so they'd turn on additional
servers to add more power to the compute side of the ratio. They'd be wasting
power but the ratio of power spent on servers to power spent on infrastructure
would look better.

------
gjolund
relevant xkcd [https://xkcd.com/1737/](https://xkcd.com/1737/)

~~~
mey
I immediately thought of that xkcd with this quote.

My OpEx with the new datacenters is that I have to change the filters, and
that is really the only maintenance I have. And we have moved to a resiliency
configuration where I put more servers in each box than I need and if one
breaks, I just turn it off and wait for the next refresh cycle. The whole OpEx
changes with the delivery model of the white box. So we learned quite a bit
there, but now we have got to really scale.”

------
api
What do they use for SDN or is it proprietary? My impression is that this is a
major part of each large cloud provider's secret sauce.

~~~
doubt_me
[https://azure.microsoft.com/en-us/blog/ocp-2016-building-
on-...](https://azure.microsoft.com/en-us/blog/ocp-2016-building-on-community-
driven-innovation/)

[https://www.opendaylight.org/](https://www.opendaylight.org/)

~~~
kijiki
Microsoft doesn't use OpenDaylight.

I've never heard of any serious non-research uses of it, and I've spent a good
bit of time looking. Every time I've heard a rumor, it had turned out to be
false.

------
dx034
>Microsoft shifted from outside air cooling to adiabatic cooling, where air is
blown over screens soaked with water in the walls of the datacenter to create
cool air through evaporation of the water.

I was wondering about that sentence, isn't humidifying air a bit problematic
in a datacenter? Computers and water usually don't get along that well..

------
daveguy
Is this what gives Microsoft that infinite scalability they are advertising
now? I'm still not sure how that works.

~~~
plandis
Nothing is infinitely scalable. Someone is lying if they tell you otherwise.

------
xorgar831
It would be interesting to know what MSFT's cost to run apps per users is.
That would be a better measurement that captures total cost, rather than just
the PUE.

~~~
dpark
That metric primarily captures efficiency and activity of the app, rather than
efficiency of the DC. A hello world app would look amazingly efficient by this
measure, because it doesn't do anything, so you're basically just capped by
how many connections a box can support. A video processing app would look
amazingly inefficient just because it's doing so much work.

An infrequently-used app also looks better by this metric than a frequently
used app. The service behind the mobile weather app you use might look more
efficient than Facebook, just because you use it once a day instead if a
dozen.

Disclosure: Microsoft employee, not involved in our data center designs.

------
eonw
interesting, i live near Quincy and know a number of people that are helping
build their new DCs and a couple that work there. seen many pictures of the
inside and how the layout works, cool stuff. what amazes me the most if the
crap hardware they run on.

------
z3t4
I find it funny that we can now have a super-computer in our pockets, but we
still build huge expensive data-centers. I wonder if computation is like
roads: The more we can do, the more we need.

~~~
javajosh
It's a different thing reacting to one person than reacting to a million
people, especially when those million people's actions are tightly coupled
with each other in a social graph. But if you could fit a million people's
state into memory AND also handle a very fast flow of messages at one physical
box, well that's probably good enough for government work! But we love
stateless, wasteful architectures (wasting server CPU time, memory, and
network bandwidth) so that means you need 1000 machines to support 1 million
people.

(That's not entirely fair since _locality_ also requires some overhead. One
machine that supported a million users might be great if all of those people
were in one city. But usually that's not the case, so at the very least you
need a box in the top 100 cities (by whatever measure, the simplest being
population), at least.)

~~~
tajen
Interesting remark. I make the same calculation as yours and I also find that
there currently is 1000 servers per million inhabitants on Earth today
(between Amazon, Rackspace, MS and DO). In other words, one server serves
1,000 inhabitants. But given that not everyone has access to technology, let
alone pay for Cloud services, I've estimated that we've provisioned 1 servers
to serve all online services (bank, electricity, govt, OSS, Netflix,
Volkswagen software, etc) for 100 citizen in developed areas.

It's an extremely bad, resource-intensive architecture ;)

The number of servers in private companies might be around 1 per citizen, and
the number of processors per human around 100x (incl. mobile phone, tv, smart
lamps). Given a proc has 5m transistors and humans have 100m neurons... our
architecture is so bad that we're already outnumbered by machines by a factor
of 20 at least.

