
Nvidia will build 700-petaflop supercomputer for University of Florida - elorant
https://venturebeat.com/2020/07/21/nvidia-collaborates-with-the-university-of-florida-to-build-700-petaflop-ai-supercomputer/
======
OffensiveTomato
One of Nvidia's confounders is actually a UF alumnus. Good way to give back!

------
boulos
Sigh, I wish people wouldn’t say “peta _flop_ ” for these.

[https://www.nvidia.com/en-us/data-center/a100/](https://www.nvidia.com/en-
us/data-center/a100/) is the most official reference. If you scroll to the
bottom, you’ll see that an A100 part can do ~20 Teraflops (either FP32 or FP64
in little-matrix aka tensor mode). When they say “each A100 can do 5
petaflops”, they mean each DGX which has 8 such cards and thus they mean
600-ish something-ops per card. The generous assumption is that’s FP16 or
bfloat16 for sparse ops, and therefore they are “flops”.

The reality is that if someone says “supercomputer” in the general sense, they
mean scientific computing and so mean a double-precision LINPACK benchmark.
The 1120 A100 parts (8x140) doing 20 “real” teraflops each has an absolute
peak of about 22 Petaflops (and older code without tensor mode would be half
that on FP64).

tl;dr: ML isn’t scientific computing, and those are different flops, but “10
petaflops” just doesn’t sound as impressive.

~~~
fizixer
I agree with you as well as with the response to your comment, in that both
nVidia and Google are engaging in misleading claims about their TPUs. But I'd
like to add two points:

\- I don't think they can get away with it when it comes to Top-500 rankings.
Those rankings are based on LINPACK scores and this supercomputer would end up
not scoring high enough and will be placed in the right spot in the list. So
it's not a big concern to me.

\- "ML isn't scientific computing" goes both ways. Sure tensor-tera-ops are
not teraflops, they're specific to operations involving artificial neural
networks (ANNs). But when you take the claims from the past about how much
computing power it'll take for rivaling that of a human brain, and folks came
up with 100 petaflops, or 1 exaflop. Well those should not be called "f"lops
either. Because when it comes to brain inspired computing, ANN tensor-tera-ops
are a more reliable number than FP32 or FP64. And if it's 100 peta-ops of ANN
compute, well we're already past that, and can easily create a supercomputer
with 1 tensor-exa-ops. And that means we have reached the hardware capacity
required to emulate human-level intelligence in a machine (i.e., the only
thing missing is the right set of algorithms).

Conclusion: Tensor-tera-ops are not FP ops and should not be used for
placement in the Top-500 list. But tensor-tera-ops have enough significance
that it warrants creating a new list Top-500-Tensor and ranking Tensor-
supercomputers on that list.

~~~
fluffy87
Fuly agree - TL;DR pick the right benchmark.

If you are doing AI the Top500,is meaningless and if you are doing classical
HPC these ExaFlop int8 supercomputers are meaningless

------
fluffything
I'm a bit baffled.

How high are the tuitions at the university of florida?

This supercomputer is more powerful than many at the national labs and must
cost a fortune (multiple 100 millions of dollars).

~~~
Blammar
It's $50 million. That's all. See
[https://blogs.nvidia.com/blog/2020/07/21/university-of-
flori...](https://blogs.nvidia.com/blog/2020/07/21/university-of-florida-
nvidia-ai-supercomputer/) for the numbers. It's probable the hardware is being
offered at a deep discount.

The building and infrastructure will cost an additional $20m, covered by the
University.

~~~
mhh__
[https://lambdalabs.com/blog/demystifying-
gpt-3/](https://lambdalabs.com/blog/demystifying-gpt-3/)

According to ^, that's about 11 GPT-3s trainings worth in the cloud.

~~~
nl
No one who's training GPT-3 pays those prices.

------
evancox100
Great name for the computer: “HiPerGator“

~~~
svalto
That's an awesome name. Just like hipermercado or KluckKluckBell.

------
ja27
Just baffling that they were on the verge of closing their CS department 8
years ago.

[https://www.forbes.com/sites/stevensalzberg/2012/04/22/unive...](https://www.forbes.com/sites/stevensalzberg/2012/04/22/university-
of-florida-eliminates-computer-science-department-increases-athletic-budgets-
hmm/#78eb901b56a1)

~~~
bubblethink
I think this was a bit of a clickbait headline, and the real story was bit
more nuanced. They were trying to do some reorgs (kind of like
merging/splittling different CS, ECE, Information Systems etc. depts.).

------
abainbridge
Does anyone know why everyone is still buying Nvidia instead of custom AI
accelerators from other vendors? For example, on paper the new Graphcore
machines look like an easy win, or at least a risk worth taking. (I see this
particular supercomputer was funded by Nvidia but my question is about the
general trend).

~~~
Cthulhu_
Risk mitigation, maintenance, resell value (?), support, reliability. The
custom AI accelerators you mentioned, how long have those been in business
for? How many units have they moved? How many generations of hardware have
they produced? Will they still be around to replace or upgrade units in 5-10
years? How flexible are they in their workload?

That's a lot of factors to keep in mind when you're spending millions on a
supercomputer. I'd go for an established hardware provider as well. I'm not
knocking the custom AI chips, but I wouldn't try and max out my budget with
those - keep them for smaller applications for now.

~~~
sillysaurusx
TPUs are perfectly positioned to capture this market. There are an endless
number of reasons why, but to keep it short: Wanna see a magic trick?
[https://twitter.com/theshawwn/status/1286426454171975680](https://twitter.com/theshawwn/status/1286426454171975680)

GPT-2 117M training at 1 _million_ tokens/sec.

Now, I don't have experience with DGX clusters, so I'm not going to make a
firm statement. What I will say is that I, as an outsider, managed to achieve
a performance level that is ~unheard of for GPT-2 training. And you can too;
TPUs are pervasive.

A TPUv2-512 isn't even as far as the gas pedal goes, either. v3-512 can train
all of imagenet to 75.9% accuracy in 4 minutes:
[https://twitter.com/theshawwn/status/1223395022814339073](https://twitter.com/theshawwn/status/1223395022814339073)

v3-1024 can do it in 2 minutes:
[https://twitter.com/theshawwn/status/1234654848114520065](https://twitter.com/theshawwn/status/1234654848114520065)

I once attached a debugger to a training run during startup, after the infeed
loop began (meaning it was feeding inputs to the TPU, but no training was
happening yet; it was "winding up") and was shocked to discover that when I
hit c to continue, it trained on all of imagenet in like 54 seconds. That
blows the lid off of every perf result here (under "image classification"):
[https://mlperf.org/training-results-0-6](https://mlperf.org/training-
results-0-6)

(It's not a fair comparison, but it was quite astonishing to see the raw
horsepower in action.)

So, nVidia has some catching up to do. And I don't know if they'll be able to.
The TPU ecosystem may be clunky at the moment, but boy is it effective. Your
options are to invest your time in this ecosystem, which will likely be around
in ten years, or in DGX-cluster-type knowledge, which ... might be less
pervasive in 10 years.

The distinguishing feature of a TPU is that it has a CPU on board. In fact, it
has a CPU with 300GB of memory _for every 8 cores_. Friggin' love these
things.

~~~
xxxtentachyon
Last time I worked on TPUs, a lot of very pervasive LA operations (e.g.
Cholesky decomposition) were unoptimized and slow compared to the NN-style
operations. I’m sure that can be fixed, but for the time being, it seems
inappropriate for anything beyond the obvious NN operations.

------
etaioinshrdlu
It seems like supercomputers are still generally many times more powerful than
what AI researchers at top universities or orgs are using.

Has any well known AI research been done on supercomputers?

It seems like the case that literally just throwing money at the problem is a
solid idea nowadays.

~~~
ydau
GPT-3 was another buckets worth of evidence in favor of the scaling
hypothesis. Performance kept improving (and cost to train kept increasing) as
more parameters were added. Even with 175 billion parameters, the performance
had not yet plateaued. One take-away is that throwing a lot of compute at the
problem helps tremendously :).

You can read more about GPT-3 here:
[https://lambdalabs.com/blog/gpt-3/](https://lambdalabs.com/blog/gpt-3/)

~~~
sitkack
Are you alluding to "The Bitter Lesson" [1] by Rich Sutton [2]?

[1]
[http://incompleteideas.net/IncIdeas/BitterLesson.html](http://incompleteideas.net/IncIdeas/BitterLesson.html)

[2] [http://incompleteideas.net/](http://incompleteideas.net/)

------
codecamper
I predict this supercomputer will be a flop.

~~~
unnouinceput
Yup, all almost 11 of them :)

------
shmerl
_> It will also benefit from Nvidia’s suite of AI application frameworks_

Sounds like lock-in.

~~~
HeWhoLurksLate
Is it lock-in when it's the only thing or the best thing on the market?

~~~
shmerl
It is lock-in when it makes it very hard to move to anything else even if it's
better. That's Nvidia's point. They don't play fair. They give this "for free"
with hard strings attached to Nvidia.

------
VadimPR
How bad is this for the environment?

~~~
abainbridge
Seems like it has about 700/5 = 140 GPUs. And they are about 0.4kW each. Let's
say it is 50% utilized for a year. That is 140 * 0.4 * 0.5 * 365 * 24 = 245280
kWh.

Let's say electricity generation creates 0.5 kg CO2e per kWh. So 122640 kg
CO2e per year.

For comparison, driving a cars creates about 0.2kg CO2 per km. Or 0.32 kg per
mile. So about the same as driving 122640 / 0.32 = 383250 miles per year. Call
it 38 cars.

Edit: Corrected the above CO2 per km figure to fix a factor of 100 error.

Obviously there are a million things wrong with this analysis. For example, we
don't know if they're going to use the machine to do climate modelling that
will lead to a headline in the media that causes the green party to be elected
;-)

~~~
lorenzhs
HPC system utilization is typically upwards of 90%, these things are not idle.
There's almost always a queue of jobs waiting to run, at least in my
experience.

~~~
rubatuga
Definitely, also it makes you hate those who hog up the resources. A good
scheduler will take into account the past usage however.

------
swiley
All these big companies giving free stuff to university students to hook them
into their proprietary technology. I remember some of my friends graduating
and realizing marlin wasn’t free.

~~~
juancampa
So you're suggesting they should charge them instead?

------
fortran77
Maybe they can use it to simulate load and stress on pedestrian bridges for
some of the other Florida Universities:

[https://en.wikipedia.org/wiki/Florida_International_Universi...](https://en.wikipedia.org/wiki/Florida_International_University_pedestrian_bridge_collapse)

------
m0zg
This being NVIDIA, University of Florida would be wise to hire an independent
company to benchmark their "700-petaflop".

------
xvilka
They should build proper Linux driver first without enormous signed blobs. One
of the worst companies for FOSS community.

