
Momentum is Building for ARM in HPC - Katydid
https://www.nextplatform.com/2017/06/30/momentum-building-arm-hpc/
======
deepnotderp
Is it really though?

I'm a strong believer that the future of supercomputing (well, if anyone
adopts it, HPC can be notoriously un-innovative, borderline corrupt even in
some countries...) lies in manycore processors like GPUs or their MIMD
counterparts like the Rex Neo and the Adapteva Epiphany.

I have very few kind words for the Xeon Phi. It's relatively poorly executed,
is not competitive with GPUs, etc.

I'm not really sure why people think ARM can do much better though. If you're
going to go for a new ISA because supercomputing can recompile often, then you
might as well start with a clean slate ISA.

~~~
marmaduke
I agree about Xeon Phi, but how is arm not many core enough?

Back of the envelope.. a GTX 1080 has 2560 cores, TDP 300W, 1.6 GHz, so ~12
compute thread GHz/W. Dual 48-core ARM boards, 4x32 SIMD, TDP 100W, 2.5 Ghz,
so ~10 compute thread Ghz/W. (this is assuming IPC and price are similar)

So, for SIMD enabled workloads, there's potential to be in the same ballpark.
For non SIMD or mixed workflows, GPU isn't even an option. For HPC in general
(not deep learning specifically), arm may be a better long run bet that GPU.

~~~
deepnotderp
There's a variety of even more efficient MIMD processors. Eg, rex neo,
adapteva epiphany, etc.

The reason I don't see ARM being very competitive is that ARM processors have
no discernible advantage compared to a clean slate ISA design. What exactly
does ARM offer that the Rex Neo doesn't?

By contrast, the Neo is far more efficient, doesn't waste energy on OoO, has
no caches, making it far more efficient (especially in terms of data movement,
which is a big deal) and has the advantage of having a simpler decoder.

~~~
marmaduke
I haven't heard of that stuff to be honest, so it's more likely in my case to
get stuff into an FPGA than considering about a new ISA...

------
silotis
Note that this "article" is an advertisement written by two ARM employees.

~~~
Katydid
It isn't an advertisement. From bottom of article: Editors Note: This non-
sponsored article was written by two participants from ARM who were present at
the workshop at the request of The Next Platform since editors were otherwise
engaged during the ISC event.

~~~
jcbeard
I can confirm...I didn't make a dime :( just reporting on on the goingarm.com
workshop.

------
marmaduke
I tried a scaleway arm8 server and performance is ok. 35€/mo for 16 cores at
500 Mflops on scimark2 whereas my mbp i7 does 2200 one core, 1700 all four, so
on such a parallel workload, arm server outdoes a quad core i7. Ec2 pricing
for similar performance would be higher Id expect.

That said, single core performance is far lower than x86. With high core
counts it makes sense for HPC work.

I'd really like to see a good OpenCL implementation for arm.

~~~
moonbug22
You will never see a good OpenCL implementation. Period.

------
baq
2017 is the year of ARM in servers just as it is the year of Linux desktops,
as the meme goes.

------
jabl
Seems like ARMv8 + SVE could provide some decent competition for the Xeon Phi.
In a way both are heirs to the IBM blue gene line (lots of low power cores)
and "old-school" long vector systems like old Crays or NEC SX.

Interesting times.

~~~
arcanus
Anything with a pulse can provide decent competition for Xeon Phi. It is well
know these 'accelerators' are barely providing speed-ups compared to existing
CPU solutions.

While Phi are easy to port, they aren't close to GPUs for vectorization-
friendly problems. This is likely why the Aurora supercomputer has been
delayed/canceled.

~~~
jabl
I wonder why noone (?) has tried to bolt on a GPU style SIMT extension to a
general purpose CPU architecture. It's a more flexible programming model with
a decently(?) low additional cost compared to vector style SIMD.

IIRC Andy Glew had some slide deck where he advocated that. _Edit_ :
[https://drive.google.com/file/d/0B5qTWL2s3LcQNGE3NWI4NzQtNTB...](https://drive.google.com/file/d/0B5qTWL2s3LcQNGE3NWI4NzQtNTBhNS00YjgyLTljZGMtNTA0YjJmMGIzNDEw/view?ddrp=1&hl=en)

~~~
Arelius
My understanding is the instruction set isn't as far removed as you seem to
imply, and the huge differences are in programming model/compiler.

AVX-512 has many of the primitives required, and ISPC ports the GPU
programming model to CPU SIMD.

~~~
jabl
Yes, AVX-512 IIRC adds scatter/gather, and masking/predication, bringing it
substantially closer to classical supercomputer long vector ISA's compared to
the short vector ISA's mainstream CPU's have been equipped with for the past
decade or so. Still, even with these improvements, as explained in Andy Glew's
slide deck I linked to, it's less flexible than a SIMT style approach.

------
calafrax
Every real world performance/watt benchmark I have seen shows Intel winning so
I am not sure how ARM will breakthrough.

The only competitive opportunity that Intel leaves open is by artificially
limiting its CPUs to jack up its margins and it seems that AMD Epyc has been
designed from the ground up to exploit this opportunity.

Between Intel and AMD I don't see any niche left for an ARM solution to get
off the ground.

------
hpcjoe
Honestly, I've been down this road a number of times before. With ARM
specifically and the Calxeda experiment. Here is what I wrote in 2013 [1]

"2013 was not the year of ARM. It was the year of ARM hype. But for reasons
having nothing to do with ARM itself. Basically ARM was the anti-Intel.
Everyone was looking for an anti-Intel, in order to maintain pressure on
Intel. I can’t tell you how many times I’d heard “Intel is done” or “Intel is
in trouble” in meetings with customers, or partners. This wasn’t true, though
there was a great deal of wishing it were true on the part of some. We hedged
a bit by trying to work with Calxeda. Unfortunately, as we rapidly discovered,
the claims about ARM as a viable replacement for Intel were simply not true
(specifically talking about Calxeda’s chips) at this time. Calxeda were
positioned in customers and partners minds as being “just like Intel’s maybe a
little slower and much less power”. Neither of these statements were true. The
chips were badly underpowered, and when you aggregated enough of them to do
interesting work, they ran … HOT … ."

What we found was a strong misperception in end users minds, what I will say
... willfully ... placed there by interested parties, that 1 ARM core == 1
Intel core. This is most definitely untrue (then, and as far as I can tell,
now).

I know that people really want a viable alternative to Intel, and I understand
why. But, honestly, ARM isn't it, and doesn't look like it will ever be it.

There are many reasons for this, but it boils down to actual real world
application performance, and toolchain issues. The former is the main reason
why I am betting against ARM now. The latter is made even more difficult by a
fractured ARM market. Which ARM ABI should you write for, and do you have good
optimizing compilers for?

So I am very skeptical at this moment. I've seen the song and dance before.
Then I did the testing. And it was embarrassingly bad. 200 ARM cores [2] were
not as fast as 16 Intel cores. All while the 200 ARM cores consumed the same
if not greater power, had a smaller memory address space, and were slower.

I'd be happy to be proven wrong on this. Though I wouldn't bet on this
outcome.

[1] [https://scalability.org/2013/12/the-evolving-market-for-
hpc-...](https://scalability.org/2013/12/the-evolving-market-for-hpc-
part-1-recent-past/comment-page-1/)

[2] [https://scalability.org/2013/02/a-lightly-armed-
jackrabbit-6...](https://scalability.org/2013/02/a-lightly-armed-
jackrabbit-60-bay-unit/)

------
geogra4
Risc-V?

~~~
DeepYogurt
Years away at best. Risc-v is where arm was in the early 2000s.

