
Running TensorFlow at Petascale and Beyond - rbanffy
https://www.nextplatform.com/2019/02/04/running-tensorflow-at-petascale-and-beyond/
======
fizixer
Petascale used to be a buzzword 10 years ago. I think it's outdated for many
years now.

Case in point. A gaming GPU from a few years ago, GTX 1080 Ti, easily does
more than 10 TeraFlops. So you only need 100 such GPUs (about $70k worth in
January 2019) to do PetaFlop computation. This doesn't even include high-end
GPUs especially upsized for deep-learning computation. Furthermore, those DL
GPUs are dwarfed by what ASICs like Google TPU, and nVidia's DL ASIC can do
(more than 100 TeraFlops a board I think, though arguably those ASICs are
mainly for inference not learning).

DeepMind routinely utilizes a few PetaFlops for its cutting edge DRL systems
(AlphaGo, AlphaStar, etc).

IMO, In the next 5 years or so, and somewhere between 0.1 and 1 ExaFlop, we'll
probability hit human-level AI.

~~~
KenoFischer
Traditionally the HPC community means double precision when they talk about
FLops, which consumer GPUs are not super great at (somewhere around 400GFlop
or the GTX1080 Ti). The V100 is better at 7 or so TFlops, but also an order of
magnitude more expensive. The smaller of the two machines in the article
(Cori) is a ~20 PF (Float64) scale machine. However, even then it is quite
challenging to get anything close to peak performance out of these systems (a
bit easier if you're doing DL will is more amenable to hand-optimized vendor
libraries).

Of course for DL, people routinely use reduced precision as you have alluded
to. There, we're looking at exascale at the moment. The second machine
mentioned in the article (Summit) has ~3EF of Float16 performance (of which
their application reached ~1EF). For comparison (and assuming about 50% MXU
utilization - which is about the maximum I've seen), you'd need about 200
TPUv2 pods (at $384/hour/pod for the general public) to reach the same scale
on this problem. Now Google does have that scale, but it's certainly non-
trival.

Half-precision petascale is fairly routine these days. You can get that on the
cloud for < $100/hr and I do routinely spin up such systems for ML training.
However, petascale fp32/fp64 and exascale (b)f16 as discussed in the article
are still fairly rare and usually preceded by months of planning to make sure
things go right and the computer power is used usefully.

Disclaimer: not involved in the work, but working closely with the folks who
are and I've used both the mentioned systems myself.

~~~
fizixer
Thanks for the additional info.

And I don't disagree with anything you said. However, for any concerns with
computation between 10 and 800 Petaflops, I would consider them
"super"-petascale (with below 10 being Petascale, above 800 being Exascale).

Summit is clearly an Exascale machine. I looked at Cori and I was surprised to
find out that it has no mention of GPU in the specs, despite the project
starting as late as 2015. I personally think trying to scale up CPU-only
supercomputers in a super-Petascale era is a losing battle, but that's just my
0.02.

~~~
dekhn
cori uses Knights Landing Xeon Phi accelerators, not GPUs

------
pedro_hab
I got interested in what parameters they were computing, I guess cosmological
constant or anything space expansion related.

Anyways, cool stuff, wish they had a peak in the results.

------
collyw
Ok so they are running huge amounts of data through tensorflow.

I am more interested in what the data is and what can be gained by using such
a large amount. I was under the impression that the benefits of training data
plateaued off after a certain amount.

~~~
dagw
Sure for a fixed dataset, more training plateaus after a while. However with
more power we can work on different types of data. For example instead of
using 2D CNNs on images, we can now use 3D CNNs on high resolution pointcloud
data. Also as the speed of data acquisition grows being able to train a given
model on a given type and size of dataset to a fixed 'level' as fast as
possible becomes even more important.

