
Google's POWER8 server motherboard - self
https://plus.google.com/u/0/111282580643669107165/posts/Uwh9W3XiZTQ
======
nkurz
In case it helps, the larger context of this story is that IBM has spent a
couple billion dollars developing a new server CPU (POWER8) that is just about
to come on the market: [http://www.forbes.com/sites/alexkonrad/2014/04/23/ibm-
debuts...](http://www.forbes.com/sites/alexkonrad/2014/04/23/ibm-debuts-new-
power-servers-and-new-open-platform-partnership-with-google/)

They've also formed a consortium to promote this processor, of which Google is
a flagship member
([http://openpowerfoundation.org/](http://openpowerfoundation.org/)). The
expectation (or hope, or fear, depending on your point of view) is that Google
may be designing their future server infrastructure around this chip. This
motherboard is some of the first concrete evidence of this.

The chip is exciting to a lot of people not just because it offer competition
to Intel, but because it's the first potentially strong competitor to x86/x64
to appear in the server market for quite a while. By the specs, it's really
quite a powerhouse: [http://www.extremetech.com/computing/181102-ibm-
power8-openp...](http://www.extremetech.com/computing/181102-ibm-
power8-openpower-x86-server-monopoly)

~~~
raverbashing
However, the "Google model" of computation involves a huge amount of cheap
"light" servers, instead of a few "big" servers (on which the Power model was
based)

Well, the Power architecture had some success in Apple products, but ended
with the inability of IBM to scale production and produce parts that consumed
less power

~~~
justincormack
Google's servers are not that light, and this is a dual socket one, so 20-32
cores or so, rather than a huge Power 16 socket board which are the real scale
up ones, so it is not that much more scale up. You get more IO bandwidth out
of Power than Intel.

~~~
rincebrain
[citation needed]? I've not seen any useful IO benchmarks of POWER in a long
time, if ever, and they've never been remotely comparable to more commodity
systems, since POWER almost always gets used in the huge systems you
mention...

~~~
valarauca1
Citation Given: [http://www.extremetech.com/computing/181102-ibm-
power8-openp...](http://www.extremetech.com/computing/181102-ibm-
power8-openpower-x86-server-monopoly)

Power8 has 230GB/s of bandwidth to ram compared to a Xeon's 85GB/s. That's
nearly triple (270.5%) a XEON's I/O speed.

~~~
dekhn
ummm... I see STREAM copy benchmarks for Xeon reporting at least double the
number you cite.

Further, a benchmark like this is complicated. How is RAM divided between
sockets? What's the bandiwdth between a CPU and memory in another socket? etc
etc

~~~
valarauca1
Unable to respond, citation not given.

But I'll respond anyways. It looks like you have GB and Gb per second
confused. One is 8x the other.

admin-magazine: claims 120Gb/s [1]

intel claims: 246Gb/s [2]

Independent claims on intel forums range from 120-175Gb/s [3]

Intel's cut sheet for their own latest generation xeon states it only supports
25GB/s (200Gb/s) memory bandwidth [4]

[1] [http://www.admin-magazine.com/HPC/Articles/Finding-Memory-
Bo...](http://www.admin-magazine.com/HPC/Articles/Finding-Memory-Bottlenecks-
with-Stream)

[2]
[http://www.intel.com/content/www/us/en/benchmarks/server/xeo...](http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e7-v2/xeon-e7-v2-4s-stream.html)

[3] [https://software.intel.com/en-
us/forums/topic/383121](https://software.intel.com/en-us/forums/topic/383121)

[4] [http://ark.intel.com/products/75465/Intel-Xeon-
Processor-E3-...](http://ark.intel.com/products/75465/Intel-Xeon-
Processor-E3-1285-v3-8M-Cache-3_60-GHzz)

~~~
dekhn
Reporting memory bandwidth in Gb is misleading.

The page you cite here:
[http://www.intel.com/content/www/us/en/benchmarks/server/xeo...](http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e7-v2/xeon-e7-v2-4s-stream.html)

shows a triad bandwidth (STREAM) of 246,313.60 MB (megabytes) per second which
is 240 GB (gigaBYTES) / sec.

If I'm making a mistake I'm sure we can work it out.

~~~
justincormack
That 240GB is on 4 sockets, so corresponds roughly to the 85GB/socket cited
above, while the Power7 compared was about half the Intel, so about
50GB/socket, while Power8 is allegedly at 240GB/socket.

~~~
dekhn
Thanks for the clarification. HN took ~30-40 minutes before it would show a
reply option for this post, so I went ahead and acknowledged the performance
difference in a reply to my original reply (ugh). Anyway, that's great to see
such high memory bandwidth per socket, rather than summed over the whole
machine.

~~~
justincormack
If you click the "link" button you can reply sooner...

------
ksec
So Presumably, Google will manufacture their own POWER8 CPU. But Who made
them? TSMC? GloFo? Not IBM since IBM will be exiting Fab business in the near
future.

I am going to guess this Dual CPU variant will be aiming at Intel Xeon E5 v2
Series. The 10 - 12 Core version cost from anywhere between $1200 - $2600.
Although Google do get huge discount for buying directly from Intel and their
volume.

Assuming the cost to made each 12 Core POWER8 to be $200, that is a
potentially cost saving of $1000 per CPU, and $2000 per Server.

The last estimate were around 1 - 1.5 Million Servers at google in 2012 and
2M+ in 2013. May be they are approaching 3M in 2014/15\. Even with most of
those are low power CPU for storage or other needs. One million CPU made
themselves could be savings of up to a billion.

Could this, kick start the server and Enterprise Industry to buy POWER8 CPU at
much cheaper price? And Once there are enough momentum and software
optimization ( JVM ) it could filter down to Web Hosting industry as well.

In the best case scenario, this means big trouble for Intel.

~~~
hershel
In the link here they estimate the price for a single power8 cpu at $5000,
based on real server price.

[http://www.extremetech.com/computing/181102-ibm-
power8-openp...](http://www.extremetech.com/computing/181102-ibm-
power8-openpower-x86-server-monopoly/2)

~~~
wmf
That may be list price and keep in mind that a similar Xeon costs around
$4,600.

------
listic
I wonder if POWER8 based servers will be available for the mass market? I'm
not sure whether Google is interested in commoditizing POWER8 servers or just
participates in the OpenPOWER foundation to ensure that POWER-based servers
will suit their needs. The fact that Google is open about their new
motherboard hints at the former, but it's not much.

I wonder how non-Google-scale developer could even potentially get to use
POWER-based servers. Will they be available from the regular dedicated server
hosting companies? What OS could they run? RHEL does support POWER platform,
but for a hefty price:
[https://www.redhat.com/apps/store/server/](https://www.redhat.com/apps/store/server/)
CentOS doesn't, presumably because all the POWER hardware CentOS developers
could get is either very expensive or esoteric. That likely means I don't have
to consider using POWER-based servers for at least 3 years, right?

~~~
jcastro
> What OS could they run?

Since POWER8 is little endian now it's pretty easy to get things running on
it. We had Ubuntu ported in one cycle and 14.04 runs sweet on POWER8. All the
compilers and the entire toolchain is ready to go. Everything in the archive
works, just apt-get install.

All of your Linux workloads will probably just work on a POWER8 server. I
started working on this server about a month ago, and have never used power-
anything before. I just ssh'ed in did my work, and unless I did a uname or
noticed the URLs with the arch in them when upgrading, it acts just like my
Ubuntu x86 machines.

Yesterday at IBM Impact we deployed SugarCRM /w MariaDB and Memcached, a
Websphere petstore, and Hadoop (using IBM's Java), all at once from zero to
fully deployed and serving in _173 seconds_. These machines are _fast_.

Disclaimer: I work at Canonical and helped run the demo backstage during the
POWER announcement.

~~~
listic
Wow, thanks! It's great to hear the information first-hand from someone "in
the trenches". So, I guess, Ubuntu 14.04 is running, but it's due to some kind
of "last moment" special porting effort by IBM, not an official version from
Canonical? Or else, why isn't POWER support mentioned anywhere on Ubuntu's
website?
[http://www.ubuntu.com/download/server](http://www.ubuntu.com/download/server)

~~~
jcastro
Not last minute, we've been working with IBM on this as part of 14.04. It's
officially a supported platform for 5 years, here are the ISOs, you'll be able
to get support for the entire thing from IBM and Canonical:

[http://cdimage.ubuntu.com/releases/trusty/release/](http://cdimage.ubuntu.com/releases/trusty/release/)

It's not mentioned on the website because the hardware is not publicly
available yet.We have announced it on the blog though:

[http://insights.ubuntu.com/2014/04/28/the-ubuntu-scale-
out-a...](http://insights.ubuntu.com/2014/04/28/the-ubuntu-scale-out-and-
cloud-partner-ecosystem-expands-with-ibm-power8/)

When the machines start shipping in real life (I think they said June?) it'll
be more obvious on the main site.

All the surrounding ecosystem bits around Ubuntu will also get POWER8 support,
so PPAs will start building POWER8 binaries, and all of the deployable
services available on jujucharms.com will be available as well.

------
bhouston
Can someone explain the benefits of POWER8 as compared to Intel? I though the
volume of POWER8 chips being low (as compared to the exceedingly powerful
Intel and Arm chips) would mean that innovation in that area would be low as
well.

~~~
rwmj
POWER is _fast_. I have remote access to a 64-way POWER7 server through work
and it really rocks.

~~~
AnthonyMouse
That's the interesting thing about POWER. Uses 250 watts? No problem. Costs
$5000? Whatever. They only seem to have one design criteria: It has to be
fast.

------
cdi
Large photo of this motherboard:
[https://www.flickr.com/photos/ibmevents/14051347355/sizes/o/](https://www.flickr.com/photos/ibmevents/14051347355/sizes/o/)

They've masked all the chips with something black. Are they hiding chips they
are using, or is this something for thermal dissipation?

~~~
wmf
Looks like typical Google "secret transparency". You can look but you won't
learn anything.

------
mark_l_watson
Two things. First, slightly off topic: is there anyway this could be a
negotiating position with Intel, on price?

Second: while many CPU cores (with enough IO) is great for large Borg map
reduce jobs, I am curious to see if Google will develop/use better software
technology for running general purpose jobs more efficiently on many cores.
Properly written Java and Haskell (which I think Google uses a bit in house)
help, but the area seems ripe for improvement.

~~~
sp332
Google is a flagship partner in the Power8 consoritium, so I doubt it's just
for leverage against Intel.

~~~
fludlight
Google's position as a major backer of the competition gives them credibility
at the negotiating table. Intel won't give them better terms unless Google can
demonstrate a viable alternative.

------
mrweasel
Funny layout. I would like to know why the PCI slots a spread out like that.

I know Google don't have a standard rack setup, but still, it would make seens
to have all the expantion ports the end of the board... No?

~~~
nuriaion
Maybe these Connectors are for daughterboards with Centaur Chips + RAM. (A
Power8 can connect to 8 Centaur Chips where the RAM is connected)

~~~
justincormack
They must be, as there are otherwise no RAM chips on the board...

------
z3phyr
It would be great if somebody could list modern computers for personal use
which are still based on Power architecture?

~~~
msiebuhr
Both Xbox360 and PS3 use modified POWER-designs.

Edit: Also, it's notable that both Xbox One and PS4 switched to use x64.

~~~
riffraff
Wii and WiiU also have POWER-based designs IIRC.

~~~
carey
Both are PowerPC 750-based according to Wikipedia, which is part of why the
Wii U runs games released for the Wii.

Note that PowerPC, as used in the Wii, Wii U and old Macs, is not exactly the
same as POWER, as in this announcement. The POWER architecture is used by IBM
AIX and AS/400 servers, and by the PS3 in its Cell variant.

~~~
DCKing
There is contradictory evidence whether the Cell's PPE was 'PowerPC' or just a
'Power' core. In any case, it could run PowerPC software. Incidentally, three
of those exact same cores are used as the CPU of the Xbox 360.

For reference, the PowerPC 750 derivatives in the GameCube/Wii/Wii U are in
the same family as the PowerPC G3 used in Macs around the turn of the century.
It is also related to the CPU running Curiosity on Mars. So yeah, although
Nintendo is a big customer of the Power architecture at the moment, they're
not really breaking new ground.

~~~
DiabloD3
The Cell is the world's first in order execution PowerPC. Its similar in
design to the G3 family but has a very high clock speed. They stripped a lot
out of the chip design (such as the out of order execution pipeline, a lot of
the cache brains, etc) to get the core as small and as low power as possible
while relying on modern compilers to make the magic happen.

I'm not entirely sure they succeeded in their goals, but with how well SPEs
are used in PS3 games, I'm not sure it matters.

~~~
bodyfour
When you think about it, it makes sense for a CPU in a game console to not
include things like out-of-order. 100% of the software it's running are
program compiles for that _exact_ machine. Therefore you can just tell
developers to use a particular set of compiler flags and get acceptable
instruction scheduling.

Contrast this with the software a PC runs -- mostly compiled to be optimized
for a "generic" x86 CPU. In fact, it may have been compiled many years before
the CPU was even designed. There is a lot more scope for runtime re-ordering
to improve execution unit utilization.

If the whole world ran Gentoo, commodity CPUs probably would be in-order too.

~~~
DiabloD3
Well, if you look at modern IBM System/360 descendants (z/arch, etc), this is
almost what they do. Programs are compiled to an IL bytecode, and then
recompiled during install to produce a CPU-specific binary. Its largely the
same concept.

------
fh973
This is significant. With POWER back in the game, and ARM server CPUs
arriving, Intel will again have competition.

~~~
ithkuil
I think currently there are few big customers that can afford the overhead of
porting their code and dependencies to a different architecture.

For example I don't expect cloud providers to have a huge marked soon for non
x86 architectures. Well there are JVM or other VM users which in theory could
not care, as long as you don't need some native library.

In the past the battle with intel had to be played by providing an alternative
implementation of the x86 instruction set for precisely the same reason:
legacy.

The mobile market proved you can achieve good performance with ARM and
especially better power per performance. I really can't wait to see some more
fights in this arena.

~~~
TillE
Writing architecture-agnostic C++ or C code is mostly just a matter of
avoiding some silly tricks which you shouldn't be doing anyway, and using the
correct types and sizeofs.

Porting an operating system is hard. Porting a compiler is hard. But most
applications on top of those are fairly straightforward to port, if not
completely trivial.

~~~
thrownaway2424
It's possible but not trivial. If a programmer has never been exposed to any
memory model other than the Intel PC model, they are in for big surprises when
they start using POWER. When will write from thread A be visible to thread B?
Can this atomic load pass that other load? For multithreaded C++ programs
these are tricky. For other languages that always communicate between threads
using messages, maybe not, but also maybe you're favorite language runtime
doesn't exist on POWER.

~~~
mikeash
They're only tricky if you're communicating data between threads without using
locks. Since this has always been a "here be dragons" area even with x86's
convenient memory model, code that gets it right for x86 but not other
architectures is pretty rare.

------
foxhill
250W TDP in a package that size.. as the article correctly states, it's about
how many FLOPs you can get inside a rackmount case. that TDP alone is going to
mean that you wont be able to put that many in a single case.

a dual socket board, 500W on CPUs, 600W with everything else.. the power
supply would have to be something special, but the biggest challenge there
would be getting the energy (ala heat) back out of the box..

GPUs have similar TDPs and issues - that's why the HSFs on top of them are so
massive (and hence GPUs have a bit of an advantage here - they have the entire
PCIE board to fit their cooling hardware on)

finally, 4.5ghz? what the hell? in one clock cycle, a beam of light wouldn't
even get half way across the board (EDIT: not chip). branch/cache/TLB misses
may literally kill any reasonable performance you might hope to get out of it.
intel get around this by having years of market leading research in branch
predictors, caching models, etc. and it's going to be no mean feat to match
that.

i know IBM aren't exactly new to this game. but AFAIK x86 has always been
faster, clock for clock, than POWER.

that said, i hope my concerns are misplaced. i'm hoping intel get some
competition in the server room. it will be of benefit to everyone.

~~~
yaakov34
Light would travel about 660 millimetres in 0.22 nanoseconds, and the chip is
about 25 millimetres on the side, so a beam of light could run a few laps
around the chip in one clock cycle, or bounce off the sides 20-30 times. Maybe
you wanted to say across the motherboard?

I don't think 4.5 GHz is somehow ridiculous when 3 GHz is routine (and POWER7
was 4.2 GHz). Hundreds of cycles of latency when accessing anything off the
chip is now routine - that's the world we live in now. I think that the
biggest problem is that IBM is not able to make the investments (especially in
semiconductor manufacturing) to match Intel's rate of bringing technology to
market. The current POWER7 is a 45-nm device if I remember correctly, and this
22-nm POWER8 is not yet on the market. Intel has been selling 22-nm Haswells
for how long now? And of course the POWER7 chips have been up against next-
generation semiconductors for most of their life.

EDIT: I see that IBM started selling POWER8 systems a few days ago. That's
close to a year later than Haswell, and what's more, this chip is likely to
compete against 14-nm processors for most of its lifetime.

~~~
sbierwagen

      Light would travel about 660 millimetres in 0.22 nanoseconds
    

Important note: the wave propagation speed in copper can be as bad as .42c

------
cliveowen
Between people shifting from pc to arm-powered phones and major data-center
users doing their best to cut costs this is shaping up to be a tough decade
for Intel.

~~~
fidotron
In all seriousness, I would not want to be leading Intel right now as I can't
imagine what they could actually do to escape this.

Hindsight makes Itanium look like even more of a disaster, when that energy in
that era should have gone into evolving the x86 platform for the future.
Without AMD doing what they did (x86-64) I wonder where Intel would actually
stand in the server market today.

~~~
orbifold
From what I heard modern Intel chips basically only keep up the x86
instruction set as a facade and the architecture beneath is different (much
larger number of registers etc.). Wouldn't it potentially be a good idea to do
a clean redesign of the "frontend" and eliminate all the legacy support?

~~~
zhemao
Yes, modern Intel CPUs use a RISC-like architecture underneath. The CPU
contains a decoder unit which converts x86 instructions to RISC-like micro-
ops. Getting rid of x86 support would not be a good idea. It's their "legacy
support" which has allowed them to dominate the desktop and server market.
Porting your software to a new architecture can be a real pain.

~~~
orbifold
Couldn't the decode unit be translated into software that converts legacy
software on the fly? It seems to me that the outdated instruction set is
detrimental to innovation and power efficiency. Also how high is the overhead
in terms of chip space and power that the decoder incurs, vs. one that has to
decode a simpler instruction set? My limited understanding is that the number
of instructions issued per cycle heavily depend on the instruction set and
decode speed.

~~~
dazam
Transmeta tried to do exactly what you are proposing i.e, have a software
layer (Code Morphing) translate the x86 instruction stream to their native
VLIW instruction set. They didn't go very far.

~~~
orbifold
In the case of Intel the underlying architecture is really fast, whereas my
understanding is that Transmeta failed because the VLIW architecture did not
really work out.

If Intel Cpus use a Risc like architecture nothing would prevent static
translation to it, which was not feasible for Transmeta cpus. The x86 software
layer would then only be there to preserve backwards compatibility.

------
zurn
So they're saying it's easier to use a brand new incompatible little endian
Linux personality, with associated new toolchains and new ports of low level
stuff etc compared to the standard Linux PPC64 stuff...

Sounds kind of surprising even if IBM did some of the bringup work ahead of
time, but maybe they've got little endian assumptions baked in many internal
protocols/apps.

~~~
rbanffy
Linux has supported POWER for ages. Is endianness such a big issue? Why?

~~~
sparkie
Endianness is an issue because programmers ignore it - they think "undefined
behavior" is a synonym for "not yet standardized", and the mentality of "works
on my machine" typically trumps concerns of portability.

This isn't a concern for low level developers, such as the kernel developers -
they understand the concerns and take care to implement code in portable ways.

The issue is with user-space developers who think C and C++ are a good choice
of language, and they have no qualms using bitfields, unguarded compiler
pragmas, violating strict aliasing rule, and failing to specify the endianness
their protocols use in the protocol itself (BoMs are not universally used) -
also there is often a failure to provide the endianness conversions in
implementations of such protocols where necessary. Not to mention a complete
lack of standard way to test the endianness of the current machine, which
typically requires violating the strict aliasing rule to check.

~~~
rbanffy
I believe it's safe to assume having more endian-diversity is then a good
thing. Bugs in software and protocols will be exposed and eventually
corrected.

Since most Linux distros fully support a very diverse set of machines,
endianness is usually not a problem with most of the software that's already
part of a Linux distro.

As for software developed inside Google, they hire smart people. They'll
manage.

------
sp332
Does that say "little-endian _support_ "? Like you just set a flag and all
your math switches from big-endian to little-endian?

~~~
termain
I believe Power has long been a bi-endian architecture. I gather it's a switch
thrown (in either software or hardware) at startup.

~~~
klodolph
The old PowerPC processors did this by flipping the low bits of memory
addresses when in little-endian mode, but the data lanes had to be reversed to
make this work. So back in the day, it only meant that you could use the same
chip for a little-endian design but not the same motherboard. I don't think
that's how newer POWER processors work, though.

~~~
KMag
Correction: a modified memory controller wasn't necessary as long as all of
your memory accesses were naturally aligned. As I remember, in little-endian
mode, unaligned accesses would also trap to the kernel, so kernel authors
could include code that would fix things up, at a huge performance penalty for
unaligned access.

Most architectures that support unaligned access have a small penalty for
unaligned access anyway, and some architectures (Does anyone remember Netscape
Navigator on Solaris SPARC crashing with SIGBUS much more often than the same
Navigator release crashing on x86? At least Solaris 6/7 didn't include kernel
code to emulate support for aligned memory access on SPARC.) don't support it,
so it's best to avoid unaligned memory access in C code.

I don't recall the JVM specification forcing a particular object layout on an
implementation, and I believe most JVM implementations naturally align all
object fields rather than packing them for minimum space usage. I believe an
implementation could reorder the fields in order to optimally pack them while
avoiding unaligned accesses, at the cost of breaking any hand optimization of
locality of reference made by the programmer. However, I think the space
savings for almost all programs would be very meager.

------
teepo
Would these be too pricey as hypervisors for cloud compute? It seems to me to
be ideal for CPU thread intensive applications like databases, on-demand
transcoding.

What are some use cases for a server like this for Google? I'd love to see
these available in the IBM Cloud (SoftLayer) but I think they will be too
pricey and reserved for enterprise.

~~~
huslage
You can also logically partition these beasts into multiple real servers. Who
needs a hypervisor when you can have 96 "real" servers sharing the same
hardware?

~~~
sp332
Memory bandwidth would be a nightmare, not to mention every other kind of I/O.

~~~
mzs
There are some tricks for IO:
[http://www.redbooks.ibm.com/redpieces/abstracts/redp5065.htm...](http://www.redbooks.ibm.com/redpieces/abstracts/redp5065.html)

------
Corrado
I think its interesting that they didn't include the "traditional"
mouse/keyboard/VGA ports. Not particularly surprised since this is a server
motherboard, but still interesting. I think I do see an HDMI connector in the
lower right next to a tall silver port (possible USB connector).

------
jmnicolas
It's a bit short on details imo : where are the specs, the benchmarks etc ?

~~~
jacquesm
It's actually quite impressive that google would open up this much of their
secret sauce, a lot can be gleaned from looking at this board. You can bet
that this is not exactly revision one (and you can bet as well that this is
likely not their latest and greatest, no need to show off more than you have
to, competitive edges are pretty thin).

When I see stuff like this it is painfully clear that from a technological
perspective a company like duck-duck-go has a huge amount of defensible moat
to cross before they can begin to be a serious contender. Think about it for a
second: the company that you're trying to compete with is operating at such
economies of scale that it can afford to have its own custom motherboards +
non-standard expansion boards made.

------
peterfisher
I love when google announces something through Google+

