

A look beyond x86: OpenPOWER and AArch64 - fcambus
http://lvalsan.web.cern.ch/lvalsan/processor_benchmarking/presentation/

======
AnonNo15
\- Intel Xeon provides the best performance.

\- Power consumption of Intel systems at off, idle and full power states are
less then of ARM64 and POWER8 ones

\- ARM 64-bit and Power 8 are currently providing a significantly lower
performance / Watt than Intel

Damn.

~~~
ksec
I think Power 8 and the ARM are still on older Gen node. Xeon is on 22nm, with
the latest Xeon on 14nm Broadwell.

So Intel wins on Fabs Tech. And this will not change for another 5 years.

~~~
lvalsan
The Intel chips (Xeon E3-1285L v3 - Haswell and Atom C2750 - Avoton) are both
built on 22 nm. The Power 8 processor is using the same process size as the
Intel parts, 22 nm. The ARM 64-bit Applied Micro X-Gene 1 SoC tested is on 40
nm. X-Gene 2 is on 28 nm and X-Gene 3 will be build using a 16 nm process.
X-Gene 2 is apparently sampling now but we didn't get access to it yet, nor
have we seen any publicly available platform using it. Once we'll get access
(which should happen soon) to Xeon-D (Broadwell-DE) using a 14 nm process
size, the results will be added to the repository at
[http://lvalsan.web.cern.ch/lvalsan/processor_benchmarking/](http://lvalsan.web.cern.ch/lvalsan/processor_benchmarking/)

------
kev009
Little bit disappointed by the POWER8 data. I guess it would start paying
bigger dividends in scale up systems, but the power and thermals are a little
shocking - I think there may need to be some tuning and AFAICT Linux may not
be running on bare metal.

Same for ARM, but I guess the sweet spot there is scale out.

Intel has been putting out amazing chips since Haswell.

We need the others to do well to keep them honest and innovative.

~~~
m_mueller
The one big advantage of power architecture is memory bandwidth. Power8 has
230GB/s sustained on a 12 core chip while Xeon Haswell-EP has 102GB/s on 12
core. This means you need more parallelism (=harder to program) and more
sockets to get the same speed on Xeons vs. Power8 for memory-bandwidth-limited
problems (which >50% of HPC applications are). Power8 has I believe ~50%
higher TDP than Xeons, so even in performance per watt for memory bandwidth
limited problems, you are still ahead, especially considering all the
components you need to power besides the socket.

TLDR; These things are for a specific segment of HPC, albeit the one that
doesn't care too much about Linpack performance.

[1] [http://en.wikipedia.org/wiki/POWER8](http://en.wikipedia.org/wiki/POWER8)
[2] [http://www.pugetsystems.com/blog/2014/09/08/Memory-
Performan...](http://www.pugetsystems.com/blog/2014/09/08/Memory-Performance-
for-Intel-Xeon-Haswell-EP-DDR4-596/)

~~~
StillBored
You should be careful on the memory bandwidth. The more recent xeons (2011v3,
ddr4 2133) are rated 68 GB/s per socket. That is 136GB/s in a dual socket.

But, what I really came here to say, is that POWER8 has nearly twice the
single thread memory bandwidth available. From some of my tests a single
thread can access ~40GB/sec of bandwidth vs the intel at ~20GB/sec. But this
doesn't translate into clear cut wins even on codes which at first pass appear
to be bandwidth limited. Mostly because the processor cores themselves are
about 1/2 the speed in tight loopy code (think nbench) running out of cache.

So, in our in house benchmarks the POWER ran anywhere from 75% as fast to
roughly as fast. Only on a few tests did it actually manage to best the E5
xeon we were testing it against, and then never by more than 150%.

Plus, the dynamic thread controls are cool, but beyond 4 threads/core I could
never actually improve performance, and beyond 2x it was major diminishing
returns.

~~~
userbinator
_the processor cores themselves are about 1 /2 the speed in tight loopy code
(think nbench) running out of cache._

I wonder where the bottleneck is there, since on paper the POWER8 specs beat
Haswell: it's a 10-issue core, 224-entry reorder buffer vs. a 4-issue core
with 192-entry reorder buffer for Haswell. But if it takes several POWER
instructions to do the work of one x86 instruction, or icache bandwidth is
insufficient, the advantage goes the other way.

------
ajdlinux
Will shortly be starting at a new company where I get to do things with
OpenPOWER. I've never even touched a POWER architecture system apart from the
occasional PowerPC Mac, so this will be an interesting learning experience for
me.

~~~
e12e
That sounds like fun! Any more details? Why? How? What?

~~~
ajdlinux
I won't go into too much detail as it's still a while before I start - systems
programming at a firm that is quite heavily invested in POWER, and does a fair
chunk of work out of its Australian offices, of all places.

~~~
e12e
Ah, I see. Thanks for the update -- I must have missed the _at_ -part. Thought
you were "starting a company", not starting at a company :-)

It'd be interesting to hear which company it is -- I don't really know who
uses POWER these days... email in profile in case you're more comfortable
sharing that privately. I of course understand if you feel you can't do that.

------
akuma73
X-Gene 1 is in ancient 40nm technology. Haswell is 2 generations ahead in 22nm
Finfet technology.

The gap should close as the process technologies get closer.

~~~
tw04
How do you see the process technologies getting closer? Intel has an insane
R&D budget that they spend on their process technology. Nobody has managed to
close the gap yet, what makes you think that's going to change?

~~~
Sanddancer
Samsung is making 14nm chips in volume and has licensed that tech to
Globalfoundries, and has demoed 10nm. TSMC's got 16nm, and is planning on
starting low scale production of 10nm later this year. So yeah, process tech
is pretty similar between the various big chip makers.

------
WildUtah
Page is unreadable on Mac Chrome. Content comes up and then turns into a plain
title page with no additional content. Page does not scroll or respond to
clicks anywhere.

~~~
Sanddancer
They use a (imo silly) scrollless, etc setup to move through frames. Hit up
and down to see the other pages.

