
The Future Google Rackspace Power9 System - jonbaer
http://www.nextplatform.com/2016/04/06/inside-future-google-rackspace-power9-system/
======
nl
_TheNextPlatform_ is a pretty bad site. They rehash - badly - information
available elsewhere, and add a hyperactive spin on it all.

Here's the truth:

Google uses lots of compute power (insightful!)

Google isn't shifting to Power.

Google does have an active R&D program looking at Power.

 _TheNextPlatform_ misses the whole point here: That Zaius board has 32 DDR4
slots (commercially available servers from eg Dell max out at 24) and it has
_2 NVLINK slots_! (!!)

Those NVLINK slots are what Intel should be worried about, because that's
where Google is prepared to pay money. They are building computers that lock
themselves into NVidia and doing it gladly.

Intel better find a way to compete with NVidia on deep learning.

~~~
exhilaration
For anyone else wondering what NVLINK is, some links to save you a Google
search:

[https://blogs.nvidia.com/blog/2014/11/14/what-is-
nvlink/](https://blogs.nvidia.com/blog/2014/11/14/what-is-nvlink/)

[http://www.tomshardware.com/news/nvidia-nvlink-boosts-
perfor...](http://www.tomshardware.com/news/nvidia-nvlink-boosts-
performance,28989.html)

I'd be curious to hear what Intel is developing to compete with this.

~~~
oneweekwonder
To add to the above, From Wikipedia[0]:

NVLink is a communications protocol developed by Nvidia. NVLink specifies a
point-to-point connection between a CPU and a GPU and also between a GPU and
another GPU. NVLink products introduced to date focus on the high-performance
application space.

and [1]:

NVLink – a power-efficient high-speed bus between the CPU and GPU, and between
multiple GPUs. Allows much higher transfer speeds than those achievable by
using PCI Express; estimated to provide between 80 and 200 GB/s.

[0]:
[https://en.wikipedia.org/wiki/NVLink](https://en.wikipedia.org/wiki/NVLink)

[1]:
[https://en.wikipedia.org/wiki/GeForce#PASCAL](https://en.wikipedia.org/wiki/GeForce#PASCAL)

------
voltagex_
What's the current state of Power development in the Linux kernel like? I
thought it was only IBM holding the fort (via ozLabs) but this could be a big
boost.

------
cpeterso
Why does Facebook's Open Rack use a nonstandard rack size? That seems like an
obvious barrier for adoption of hardware that was designed to be a commodity.

~~~
DiabloD3
Racks were designed to fit telecom hardware originally. The Open Rack size is
designed around common computer hardware sizes.

It isn't a barrier for adoption because swapping racks out of a datacenter is
easy, and they fit on standard datacenter floor tiles.

What _is_ a barrier is that damned 48V.

Disclaimer: I run a hosting company.

~~~
rdtsc
> What is a barrier is that damned 48V.

Interesting what is the issue with 48V, I saw equipment for it seemed to be
overpriced. Remember pricing out some stuff and as soon as 48V option for
power came into play then price rose quite a bit.

Or is it that voltage is not high enough to be efficient for a large data
center?

~~~
DiabloD3
The voltage is high enough. The way Facebook's solution is, is you have
triplets of racks: left, right racks hold computers, middle rack holds
network, power distribution, and UPS.

The computers have extremely simplistic power supplies that basically can't
fail, and just DC->DC transform from 48v to 12, 5, and 3.3; and the large
scale power supplies that convert three phase 240v (or whatever you're
supplying it with) from the datacenter to 48V are much higher efficiency than
the ones that would have been in the server (which you would have fed them,
usually, something like single phase 208v).

Redundancy is supplied by just hooking multiple transformers in the middle
rack to the + and - terminals on each PSU, instead of a convoluted multi-
module redundant PSU (which always uses a single backplane, and backplanes in
redundant PSUs fail surprisingly frequently).

The total round trip efficiency of this system is about as high as you can
realistically get. 80Plus Titanium is 90-95% efficient (depending on load),
but has efficiency losses in rack level distribution which 48V tries to
correct.

However, 48V DC can be very dangerous to work with, and a lot of tech workers
refuse to work with it. Now, if you believe it is dangerous or not (I've seen
arguments stating that it is no more dangerous than single phase 208v) is
immaterial, this is the opinion of a lot of workers.

The cost of 48V gear is expensive if you're not in a datacenter already setup
to handle it. Facebook obviously doesn't have this problem because they build
entire datacenters from scratch.

I personally don't believe in it because it doesn't buy me anything that
single phase 208v doesn't give me, I do not pay enough in electricity to have
the overhead of dealing with it.

~~~
raverbashing
48V danger is debatable, but it might be a good idea to start adding RCD
protection to the power sources (yes, this will bring your rack down (or just
one output), but better than having people getting stuck to wires or tools
causing short circuits)

~~~
jermy
Is this possible? Fault currents are much more obvious at a higher voltage
than lower, and I thought most RCDs would not be made to be sensitive enough
to catch what might be a dangerous DC leakage.

~~~
raverbashing
Yes (not sure if there are existing products), because they work based on the
difference of currents leaving and returning. (between 48V and 110V there
isn't an insurmountable difference in detection capability needed)

Theoretically you should have zero leakage, and you should also trip on bigger
currents (but a rack wouldn't use more than 10A maybe?)

------
ksec
They are going up against the coming Xeon E5 Broadwell + FPGA. Power9 do offer
more memory per Rack. But I dont see how Intel cant adopt with better memory
controller.

To simply put, what are the incentive to switch over to Power9 platform?

~~~
petra
So this raises all sort of questions: Can Intel can be fast enough in
integrating Altera(sw+hw+corporate...) ? What is the better FPGA development
environment, with more developer share, etc ? FPGA's can be cannibalistic to
Intel's business - will they have an incentive problem ? Do some companies(say
in china) prefer an open processor, like POWER, and this will create some
ecosystem advantage ? Are there any advantageous startups to buy like kandou-
bus(faster interconnect) and who will buy them ?

So it's not certain Intel will win.

~~~
homero
Fpga is trash compared to asics

~~~
Symmetry
Only to the extent that you can afford to spend $1 million+ and a year every
time you change your algorithm. For bitcoin mining or encryption or decoding
popular video formats then yes, ASICs are absolutely the way to go. But there
are many cases where the algorithms you're using aren't so fixed or where
you're not willing to put up with such large lead times.

------
virtuallynathan
I wonder if the inclusion of NVLink in Power 8+ will cause Power to excel in
ML applications. It could well be quite a bit faster than x86 just due to the
memory/interconnect bandwidth.

~~~
PeCaN
NVLink and CAPI[1] both have _huge_ potential for machine learning. However, a
lot of the benefits of NVLink for ML come from GPU-to-GPU NVLink, which
doesn't require CPU support.

1\. CAPI doesn't seem to get mentioned to much around here, but imagine an
FPGA directly accessing some shared system memory. It's neat.

~~~
mikehollinger
Yeah, it's neat. (I work on stuff that exploits this). We open-sourced the
software side of our first flash IO accelerator last year. [1]

You can do some pretty cool things from a HW designer's perspective inside the
accelerator, and in the main application. Since the accelerator is cache-
coherent, and able to map the same virtual addresses as a given process (and
attach to multiple processes' address spaces) the device can do "simple"
things like follow pointers, which used to require building a command / data
packet, DMA'ing it to the device, and then waiting for a response packet.
This, effectively, frees up the main CPU to do other things, rather than
wrangle data. It also means that bottlenecks move.

[1] [https://github.com/open-power/capiflash](https://github.com/open-
power/capiflash)

~~~
bogomipz
So the idea is to present nand a memory device rather than a block device?

------
bluedino
IBM reps love to throw around the "Google is switching to IBM" line. Can they
possibly compete with IBM on price? Why isn't AMD trying to reach this market?

~~~
rdtsc
AMD would still have an AMD64 architecture though? Or are you thinking they
should come up with a new competing architecture.

~~~
derefr
These POWER chips are an open design; thus, anyone who wants Google as a
customer and has a chip fab ready to go (like, say, AMD) _could, in theory_
just fab some up to sell to Google.

~~~
WoodenChair
AMD doesn't have a chip fab - they are fabless now.

~~~
monksy
Oh wow. Is there a good reason for this?

~~~
snuxoll
Maintaining a fab when you can't justify enough orders for chips to keep it
running around the clock is expensive. As part of the spinoff they had
penalties because they weren't purchasing enough wafers from GF anyway, but it
resulted in less bleeding than owning and managing it as a subsidiary.

------
xiaopingguo
Wasn't Google at one point all about commodity/consumer level hardware for
their servers? Seems a huge turnaround.

~~~
sargun
This is still largely at commodity prices / performance points. It's been
quite some time since any of their hardware has looked consumer-oriented, but
comparing this to what enterprises buy, it's apples and oranges.

[1]
[http://shop.oreilly.com/product/0636920041528.do](http://shop.oreilly.com/product/0636920041528.do)

[2]
[http://research.google.com/pubs/pub35290.html](http://research.google.com/pubs/pub35290.html)

------
transfire
I am surprised. I thought 64-bit ARM was the newness headed to the server
farms.

~~~
DannyBee
It will be there eventually. It is definitely not there now, despite what some
may have you think :)

~~~
StreamBright
Do you have some data to back this prediction? What is the biggest advantage
over x86 server processors?

~~~
DannyBee
Which, that it isn't there now? Or that it will be there eventually?

The x86 server processors have too much legacy they can't get rid of, and that
limits how far they can push it.

------
mozumder
Kinda amazing that they can fit 2 Power9's as well as 2 FHFL PCIEx16 slots,
along with 15 drives and 2TB of memory in 1 rack unit.

------
nickpeterson
Hey Google, sell these to other companies :)

~~~
chronid
Well, these are not power9, but in theory... :P

[http://www.penguincomputing.com/products/rackmount-
servers/o...](http://www.penguincomputing.com/products/rackmount-
servers/openpower-servers/)

~~~
nullc
Anyone have any idea on the rough prices of these systems?

Or is it the "if you have to ask, it's too much for you" as seem to be the
case with the IBM power systems?

~~~
loeg
Talos plan to, if demand allows, sell you a bare bones POWER8 (CPU, heatsink,
and mainboard) for $3,700 USD.
[https://raptorengineeringinc.com/TALOS/prerelease.php](https://raptorengineeringinc.com/TALOS/prerelease.php)

~~~
nullc
Sadly that's $1000 more than they were originally talking about; still low
compared to IBMs prices, but much harder to justify for most applications.

~~~
loeg
Yes, very hard to justify when a high-end 4-core Xeon outperforms on some
benchmarks[0] and costs half as much or less. Not to mention vastly mature
open source floating point, compiler, etc, support. As well as existing
(invalid, but still) programs that assume x86isms.

[0]: [https://www.phoronix.com/scan.php?page=article&item=talos-
wo...](https://www.phoronix.com/scan.php?page=article&item=talos-
workstation&num=3)

