
Can Vector Supercomputing Be Revived? - jcbeard
https://www.nextplatform.com/2017/10/26/can-vector-supercomputing-revived/
======
jcbeard
Better programmable gather/scatter (like described here:
[https://www.nextplatform.com/2017/09/14/shedding-light-
dark-...](https://www.nextplatform.com/2017/09/14/shedding-light-dark-
bandwidth/)) can definitely open up a wider range of applications to
vectorization.

~~~
CyberDildonics
What do you mean by programmable gather/scatter. GPUs already do efficient
gather and scatter operations. I think the knight's landing AVX-512 even has
efficient gather and scatter.

~~~
jabl
Read the linked article, and the paper linked from there. Basically the idea
is that gather/scatter can be very inefficient from a cache and BW
perspective. In the worst case you're using only a single element per cache
line. So the idea is to "move" the scatter/gather engine to the memory
controller, and pack the vectors already in the cache rather than in the
register file.

Will it work in reality? No idea, but it's an interesting idea certainly worth
exploring.

------
jabl
More generally, I'd really like "real" vector ISA's to become mainstream [1].
We've suffered from the scourge of packed-SIMD long enough, thank you very
much.

ARM SVE, RISC-V V extension, and indeed to an extent AVX-512 look pretty good.

[1] Not saying that every chip must dedicate a huge portion of the die area to
a monster vector unit, I'd just like the ISA to be there so programmers and
compilers can target it.

------
payne92
It’s very very very tough to compete with the economies of scale and ecosystem
with modern GPU computing.

Vector supercomputing doesn’t need to be “revived”, it is already here.

~~~
jcbeard
Depends. Right now they're in two different spaces. One still hanging on to
graphics/gaming, the other coming from dense compute space. We'll see how long
it takes them to converge.

~~~
payne92
For all intents and purposes, they've already converged: the underlying GPU
microarchitectures have been fairly general purpose SIMD-ish for a long time
now.

And (as one example), CUDA happily runs on any gamer "consumer" GPU.

------
davidad_
Those mining memory-bandwidth-hard cryptocurrencies, like zcash, may consider
evaluating these. According to the article, they’ll have 1200 GB/s, vs 900
GB/s for the top nVidia Volta card. (Of course, it’s quite likely that this
increase in memory bandwidth isn’t worth it for reasons of cost, ISA
suitability for the particularities of Equihash, etc., but hard to say without
a lot of thinking-through.)

------
jdboyd
I found it fascinating that NEC's new vector machine is now vector
accelerator's on PCIe cards. First, this reminds me of how early vector
processors were add-ons to existing processors. I wonder if that changes the
programming model (compared to the sx-9/ace or cray j90s) and how.

~~~
wohlergehen
And judging by Intel's Xeon Phi story, the next iteration from NEC will again
be a full processor...

------
mmarx
Vector engines really suffer from numerical stability problems, so for certain
kinds of problems, you'll get wrong answers (but at least you'll get them
fast).

~~~
justincormack
Citation? They are generally IEEE compliant; there can be some differences in
denorm handling?

~~~
mmarx
IEEE compliance only gives you guarantees on individual arithmetic operations.
Vector machines take away control over the order in which operations are
performed, but floating-point addition is not associative. Changing the order
in which the terms of a sum are added may lead to vastly different results,
due to, e.g., truncation.

