Hacker News new | past | comments | ask | show | jobs | submit login
Momentum is Building for ARM in HPC (nextplatform.com)
55 points by Katydid on June 30, 2017 | hide | past | favorite | 31 comments

Is it really though?

I'm a strong believer that the future of supercomputing (well, if anyone adopts it, HPC can be notoriously un-innovative, borderline corrupt even in some countries...) lies in manycore processors like GPUs or their MIMD counterparts like the Rex Neo and the Adapteva Epiphany.

I have very few kind words for the Xeon Phi. It's relatively poorly executed, is not competitive with GPUs, etc.

I'm not really sure why people think ARM can do much better though. If you're going to go for a new ISA because supercomputing can recompile often, then you might as well start with a clean slate ISA.

I agree about Xeon Phi, but how is arm not many core enough?

Back of the envelope.. a GTX 1080 has 2560 cores, TDP 300W, 1.6 GHz, so ~12 compute thread GHz/W. Dual 48-core ARM boards, 4x32 SIMD, TDP 100W, 2.5 Ghz, so ~10 compute thread Ghz/W. (this is assuming IPC and price are similar)

So, for SIMD enabled workloads, there's potential to be in the same ballpark. For non SIMD or mixed workflows, GPU isn't even an option. For HPC in general (not deep learning specifically), arm may be a better long run bet that GPU.

There's a variety of even more efficient MIMD processors. Eg, rex neo, adapteva epiphany, etc.

The reason I don't see ARM being very competitive is that ARM processors have no discernible advantage compared to a clean slate ISA design. What exactly does ARM offer that the Rex Neo doesn't?

By contrast, the Neo is far more efficient, doesn't waste energy on OoO, has no caches, making it far more efficient (especially in terms of data movement, which is a big deal) and has the advantage of having a simpler decoder.

I haven't heard of that stuff to be honest, so it's more likely in my case to get stuff into an FPGA than considering about a new ISA...

Note that this "article" is an advertisement written by two ARM employees.

It isn't an advertisement. From bottom of article: Editors Note: This non-sponsored article was written by two participants from ARM who were present at the workshop at the request of The Next Platform since editors were otherwise engaged during the ISC event.

I can confirm...I didn't make a dime :( just reporting on on the goingarm.com workshop.

I tried a scaleway arm8 server and performance is ok. 35€/mo for 16 cores at 500 Mflops on scimark2 whereas my mbp i7 does 2200 one core, 1700 all four, so on such a parallel workload, arm server outdoes a quad core i7. Ec2 pricing for similar performance would be higher Id expect.

That said, single core performance is far lower than x86. With high core counts it makes sense for HPC work.

I'd really like to see a good OpenCL implementation for arm.

You will never see a good OpenCL implementation. Period.

2017 is the year of ARM in servers just as it is the year of Linux desktops, as the meme goes.

Seems like ARMv8 + SVE could provide some decent competition for the Xeon Phi. In a way both are heirs to the IBM blue gene line (lots of low power cores) and "old-school" long vector systems like old Crays or NEC SX.

Interesting times.

Anything with a pulse can provide decent competition for Xeon Phi. It is well know these 'accelerators' are barely providing speed-ups compared to existing CPU solutions.

While Phi are easy to port, they aren't close to GPUs for vectorization-friendly problems. This is likely why the Aurora supercomputer has been delayed/canceled.

I wonder why noone (?) has tried to bolt on a GPU style SIMT extension to a general purpose CPU architecture. It's a more flexible programming model with a decently(?) low additional cost compared to vector style SIMD.

IIRC Andy Glew had some slide deck where he advocated that. Edit: https://drive.google.com/file/d/0B5qTWL2s3LcQNGE3NWI4NzQtNTB...

My understanding is the instruction set isn't as far removed as you seem to imply, and the huge differences are in programming model/compiler.

AVX-512 has many of the primitives required, and ISPC ports the GPU programming model to CPU SIMD.

Yes, AVX-512 IIRC adds scatter/gather, and masking/predication, bringing it substantially closer to classical supercomputer long vector ISA's compared to the short vector ISA's mainstream CPU's have been equipped with for the past decade or so. Still, even with these improvements, as explained in Andy Glew's slide deck I linked to, it's less flexible than a SIMT style approach.

Where do you get the information that Aurora is canceled?

Officially I think the only thing that can be said, has been said by Paul Messina: "I believe that the Aurora system contract is being reviewed for potential changes that would result in a subsequent system in a different time frame from the original Aurora system. But since that’s just early negotiations, I don’t think we can be anymore specific on that.” source: https://insidehpc.com/2017/06/told-aurora-morphing-novel-arc...

Big question is that as a developer how I get access to one of these.

If you're interested in seeing more about what people are currently doing with ARM, check out the slides at http://www.goingarm.com/#2017schedule

For buying a machine, it depends on what you want. Desktops are coming out now, for example: https://softiron.com/products/overdrive-1000/

Server/HPC, there are quite a few coming online. The Cavium presenter @goingarm listed (https://www.packet.net) on their slides, might be decent for getting a cloud instance to try out. There are many others. A good place to start finding them is: http://arm-hpc.gitlab.io.

A nice thing about ARM is that you get lots of different micro-arch under a single ISA. Once the ball gets rolling there will be many processor choices to choose from, which I think is a great thing (however, in full disclosure I'm one of the authors, and I work for ARM Research for our HPC program..although all statements above are my own opinion and not those of ARM).

Thanks for pointing softiron out. This might be the only shipping amd arm solution that I have seen.

At ISC17 they had these at the Gigabyte booth....with multi-GPU configs (see GPGPU on top).


Indeed. Heck, even if we forget about SVE, where can I get my hands on an armv8 box that I could use as a desktop so I could dogfood while developing, without paying an arm and a leg for some server hardware?

Every real world performance/watt benchmark I have seen shows Intel winning so I am not sure how ARM will breakthrough.

The only competitive opportunity that Intel leaves open is by artificially limiting its CPUs to jack up its margins and it seems that AMD Epyc has been designed from the ground up to exploit this opportunity.

Between Intel and AMD I don't see any niche left for an ARM solution to get off the ground.

Honestly, I've been down this road a number of times before. With ARM specifically and the Calxeda experiment. Here is what I wrote in 2013 [1]

"2013 was not the year of ARM. It was the year of ARM hype. But for reasons having nothing to do with ARM itself. Basically ARM was the anti-Intel. Everyone was looking for an anti-Intel, in order to maintain pressure on Intel. I can’t tell you how many times I’d heard “Intel is done” or “Intel is in trouble” in meetings with customers, or partners. This wasn’t true, though there was a great deal of wishing it were true on the part of some. We hedged a bit by trying to work with Calxeda. Unfortunately, as we rapidly discovered, the claims about ARM as a viable replacement for Intel were simply not true (specifically talking about Calxeda’s chips) at this time. Calxeda were positioned in customers and partners minds as being “just like Intel’s maybe a little slower and much less power”. Neither of these statements were true. The chips were badly underpowered, and when you aggregated enough of them to do interesting work, they ran … HOT … ."

What we found was a strong misperception in end users minds, what I will say ... willfully ... placed there by interested parties, that 1 ARM core == 1 Intel core. This is most definitely untrue (then, and as far as I can tell, now).

I know that people really want a viable alternative to Intel, and I understand why. But, honestly, ARM isn't it, and doesn't look like it will ever be it.

There are many reasons for this, but it boils down to actual real world application performance, and toolchain issues. The former is the main reason why I am betting against ARM now. The latter is made even more difficult by a fractured ARM market. Which ARM ABI should you write for, and do you have good optimizing compilers for?

So I am very skeptical at this moment. I've seen the song and dance before. Then I did the testing. And it was embarrassingly bad. 200 ARM cores [2] were not as fast as 16 Intel cores. All while the 200 ARM cores consumed the same if not greater power, had a smaller memory address space, and were slower.

I'd be happy to be proven wrong on this. Though I wouldn't bet on this outcome.

[1] https://scalability.org/2013/12/the-evolving-market-for-hpc-...

[2] https://scalability.org/2013/02/a-lightly-armed-jackrabbit-6...


Years away at best. Risc-v is where arm was in the early 2000s.


ARM has some distinct problems for consumer use:

1. Historical tight coupling of components (necessary for SoCs in embedded products such as Smartphones) means there is no standard to build an ARM PC. This means vendor lock-in, difficulties upgrading, and even more proprietary blobs than you get with PCs.

2. No desire by vendors to change #1. PCs being customizable and piece meal upgradable is something that vendors would like to go away. Samsung and Apple would love it if ARM desktops took over, they make their own ARM chips, they would sell you a computer with 16GB of soldered on memory, 256GB of storage built in, and if you wanted to change that you'd have to buy a new machine. Want a new graphics card? New machine. Imagine all the nightmares of laptops, except that every computer is like that. For corporate, professional, and enthusiast users, this is very bad.

A lot of the scenarios you talk about are actually possible today, indeed demos of those scenarios have been floating around for ages. The CPU architecture is not the hard part, both Google and Microsoft's app stores have packages that run on both x86/x64 and ARM just fine. Real problems are things like real time syncing and collaboration, real time migration of entire app state from one device to another, and creating a good user experience around this.

Making a cool demo where one app transitions from a smartphone to a PC screen is not that hard, and you can see many tech demos of that uploaded to YouTube. Making that work across all apps, in general, and making an app UI that works just as well on Mobile as on Desktop and maybe also on Tablet, that gets harder.

With apple the ram and storage is soldered in already. That being said If I am buying an apple I know what I'm getting my self into. A very well built computer with excellent vertical integration with my iphone and iwatch.

Well, since you mentioned Minecraft, all that already exists with UWP, even the hologram part.

Full disclosure: I work for Microsoft.

I've got my own dreams of this being done through process portability enabled by CRIU.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact