
New GPU-Accelerated Supercomputers Change the Balance of Power on the TOP500 - conductor
https://www.top500.org/news/new-gpu-accelerated-supercomputers-change-the-balance-of-power-on-the-top500/
======
improbable22
What's the definition of "one supercomputer" for the purposes of TOP500?

For example, why doesn't one of Google's warehouses qualify? Or the whole of
Google, for that matter. A bit of googling didn't find my anything very
satisfactory.

~~~
bbatha
One thing google et all are missing from a typical super computer is
infiniband style interconnects. They provide integrations with parallel data
libraries like mpi and offer “3d” networking that will take into account
physical distance between nodes and can do single rack mesh networking to
avoid the overhead of switching. Despite google having lots of compute power
they probably can’t leverage it in the way that the LINPACK benchmarks need.

~~~
improbable22
Thanks, this is interesting. It would be somehow satisfying if their LINPACK
benchmarks would actually not be beaten by Google et al. (And their real
workloads too.)

But how tightly can you really connect 27000 GPUs? Would be curious if anyone
has a more technical article handy about what's different.

~~~
bbatha
> But how tightly can you really connect 27000 GPUs?

Not all that well currently, NVidia and others are working on GPU specific
interconnects[0] but they don't have anywhere near the scale of traditional
interconnects which have supported hundreds of thousands of nodes by the late
90s. On of the big challenges in modern super computer programming is in fact
keeping the GPUs hot, which can often mean offloading work that needs high
memory usage to CPUs.

Unfortunately my knowledge here is a little dated, I interned at Los Alamos
National Lab from 2008 - 2012 when they were doing a lot of rearchitecting of
old codes for RoadRunner, the first peta-scale computer. It used Cell chips in
accelerator cards and predicated a lot of the challenges in GPU programming,
but did not fully elucidate them. For instance we didn't have CUDA!

If I had to take my guess the first exa-scale computer is going to be the one
that solves the GPU interconnect problem at scale.

0: [https://www.nvidia.com/en-us/data-
center/nvlink/](https://www.nvidia.com/en-us/data-center/nvlink/)

~~~
shaklee3
Exactly. First there was nvswitch, which dramatically increased the bandwidth
over pcie. But that didn't scale to a large number of GPUs. Then there was
nvswitch, which solved the scaling problem inside a node. I wouldn't be
surprised if the next leap is something like nvlink cables between nodes that
don't need traditional routing capabilities.

~~~
shaklee3
First there was nvlink, rather.

------
mrb
The number one supercomputer on the TOP500 list, Summit, is able to majority-
attack about 95% of cryptocurrencies that are GPU-mined:
[https://twitter.com/zorinaq/status/1007005472505978880](https://twitter.com/zorinaq/status/1007005472505978880)
That's one advantage that ASIC-mined currencies have over them. Specialized
chips raise the security bar so high that the pre-existing installed base of
GPUs cannot attack them.

~~~
Arbalest
That sounds good in theory, but I can't help but wonder if this has actually
contributed to the huge power draw. Sure they're more efficient, but due to
their limited availability, it encourages the big players to consolidate,
knowing the barriers to entry are very high for would be competitors. This
results in a technological arms race among the biggest players, confident that
there will be no added competitors who will come out of nowhere. As they add
to their holdings, they also necessarily increase their power consumption,
making for an opportunity cost of that otherwise cheap power to other uses.

~~~
mrb
The trend of power consumption is no different between GPUs and ASICs. Either
way, miners will always be competing to add more and more capacity.

~~~
Arbalest
Well my point was that, the confidence of not having serious new competition
encourages even further investment, beyond the alternative. This is because
their future returns are more predictable.

------
bmer
At least for Intel processors, one can run the LINPACK benchmark using pre-
built binaries provided by Intel: [https://software.intel.com/en-
us/articles/intel-linpack-benc...](https://software.intel.com/en-
us/articles/intel-linpack-benchmark-download-license-agreement/)

There must be something similar for AMD processors too, but I can't find it
with some quick duckduckgo. Perhaps someone else can link it?

Just a silly thing to compare your PC with the big dogs.

------
67_45
I wish that you could buy a simple computer where all processing is
integrated. The cores form a pyramid with a few really fast ones on top and
tons and tons of slow ones below. All is exposed, with no speculation, in a
very low level raw API. All abstractions like speculation etc are layers on
top ala vulkan

------
zackmorris
This might be a good time to ask: my main reservation about TensorFlow is that
it's a subset of general purpose computing, so will always be limited to
niches like AI or physics simulations or protein folding. If we look at
something like MATLAB (or GNU Octave) as general-purpose vector computing,
then we need some kind of bridge between the two worlds. I couldn't find much
other than this:

[https://www.quora.com/How-can-I-connect-Matlab-to-
TensorFlow](https://www.quora.com/How-can-I-connect-Matlab-to-TensorFlow)

Does anyone have any ideas for moving towards something more general?

~~~
joe_the_user
I don't know about tensor flow in particular but are little-known methods of
running "general purpose" parallel programs on GPUs. Specifically, H. Dietz'
MOG, "Mimd on GPU". It's a shame the project hasn't gotten more attention imo.

[http://aggregate.org/MOG/](http://aggregate.org/MOG/)

See:
[https://en.wikipedia.org/wiki/Flynn%27s_taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy)
for explanations of terms.

~~~
zackmorris
Sorry for my late reply, I just wanted to thank you because your comment is an
exceptionally good example of what I was trying to get at with my longwinded
explanations. Compiling MIMD to SIMD is the future of programming, although it
seems that companies will try every other course of action before they realize
that.

------
davrosthedalek
Since TFA talks about deep learning so much, I wonder how many of the
applications run on these machines actually are deep learning, or can make use
of the tensor cores in some other way.

~~~
godelski
A lot of people are using GPUs for many other things than ML. The big
advantage is the number of cores, and people that run on super computers write
algorithms that are highly parallelized (otherwise what's the point). GPUs are
getting fast enough that the number of cores they share is gaining an edge.
Also the memory on them is MUCH faster than that on a CPU, but the cost is
that you have less (20Gb compared to 256Gb).

As far as the TPUs, one big advantage for ML is that they are float16/float32
(normal being f32/f64)(in ML you care very little about precision) and are
optimized for tensor calculations. For anything that you don't need that
resolution and are doing tensor stuff (lots of math/physics does tensor
stuff), then these will give you an advantage. (I'm not aware of anyone using
these for things other than ML, but I wouldn't be surprised if people did use
them) But other things you need more precision and those won't use the TPUs
(AFAIK).

~~~
garmaine
All modern gpus support f16.

~~~
shaklee3
This, and lots of things besides ML don't need high precision.

