
Summit Supercomputer Up and Running, Claims First Exascale Application - rbanffy
https://www.top500.org/news/summit-up-and-running-at-oak-ridge-claims-first-exascale-application/
======
fermienrico
This is a poorly written article.

Here is a better source: [https://www.top500.org/news/summit-up-and-running-
at-oak-rid...](https://www.top500.org/news/summit-up-and-running-at-oak-ridge-
claims-first-exascale-application/)

Interesting bit about nVidia Tesla V100 GPUs:

 _Assuming all those nodes are fully equipped, the GPUs alone will provide 215
peak petaflops at double precision. Also, since each V100 also delivers 125
teraflops of mixed precision, Tensor Core operations, the system’s peak rating
for deep learning performance is something on the order of 3.3 exaflops.

Those exaflops are not just theoretical either. According to ORNL director
Thomas Zacharia, even before the machine was fully built, researchers had run
a comparative genomics code at 1.88 exaflops using the Tensor Core capability
of the GPUs. The application was rummaging through genomes looking for
patterns indicative of certain conditions. “This is the first time anyone has
broken the exascale barrier,” noted Zacharia._

~~~
yread
I wonder what the application was. There is not that much deep learning in
genomics.

~~~
alfalfasprout
The tensor cores are simply matrix multiplication + accumulation. They don't
need to be used purely for deep learning.

------
tempdeadbeef
What’s not inside the Summit Supercomputer speaks volumes: Intel.

Knights Landing/Hill/Mill is simply not compelling; Omni-Path was created as
an infiniband knockoff that doesn’t beat Mellanox. The Cray Gemini/Aries
interconnects can be found all over the top of the list (and the Intel
acquisition of those interconnects happened in 2012), but you don’t see Omni-
Path replacing anything.

Meanwhile, Nvidia comes out with NVLink and begins to build small clusters of
GPUs connected by larger networks containing IBM and Mellanox. A vacuum was
created, and IBM and Mellanox moved (back) in.

~~~
davidmr
The DOE (and DOD, but with whom I’m less familiar) tends to spread out these
purchases over multiple vendors to keep multiple US-based providers able to
build and support these machines (and I imagine to keep costs competitive).

The last few acquisitions by ORNL and LANL have been Crays while ANL and LLNL
were buying IBM Blue Genes. With this generation, it looks like things have
switched. As another poster mentioned, it certainly seems like ANL’s next one
will be Cray/Intel. It was going to be based on Knight’s Hill, but Intel
cancelling that sort of put the architecture up for grabs.

~~~
rbanffy
I would love to see Intel tweaking the Phi line with asymmetric cores like
some ARMs do. Having a couple brawny proper Xeon cores and a bunch of smaller
4-thread cores, all coupled with local HBM (and maybe some dedicated HBM for
each core) would make it a very versatile part that, with some tuning in
number of cores, cache sizes, HBM size, etc, could cover from low-end server
all the way to supercomputing.

I don't think there is much doubt core count will increase on all segments and
that asymmetric core tech that's currently used in ARM is pretty cool.

~~~
gnufx
Do HPC-oriented ARMs do that?

I don't see the advantage of mixing Phi and SKX cores. Just use an appropriate
balance of different nodes (maybe not all Intel).

~~~
rbanffy
It makes sense for multi-node machines, much like we do some tasks mostly on
CPUs and others on GPUs within a single node. A processor like this makes much
more sense on desktops and general-purpose servers, as most of the time my
Xeon cores are doing things an Atom would be perfectly capable of doing at a
fraction of the power consumed. This translates into more heat and more
cooling. If you consider a Xeon Phi uses 300 Watts for 256 threads, this
translates roughly to 1.2 W per thread, which is well within what I would
expect from a very puny Atom core. Being able to power down most of my
computer while, say, I write this, would be a very nice feature.

------
taliesinb
> 27,648 Nvidia Tesla V100 GPU modules

holy shit

~~~
dogma1138
I always wondered what’s the bulk discount on these since it’s north of
$200,000,000 for GPUs alone.

~~~
tempdeadbeef
Bulk discount when currency miners will pay full price?

~~~
quadruplebond
Nvidia was probably under contract for this before mining blew up to where it
is today. Provisioning for these machines starts 5-10 years before they ever
come online.

------
kristianp
Why do they need expensive Power 9s when the bulk of the processing is done by
GPUs?

~~~
bryanlarsen
Because Power 9's have on-chip NVLink ports.

~~~
JohannFlobuster
This. CPUs are hungry.

------
code_duck
I drive by a site with large signs declaring it to be an Exascale computing
construction project at Los Alamos, but I'm not sure what they're building.
I'm sure they've announced it somewhere.

------
agumonkey
I think I still have my nv3 graphics card. Fun to think of how Nvidia grew.

------
newnewpdro
"Competition remains stuff."

------
newnewpdro
Is there anything preventing them from mining crypto as a burn-in to help
subsidize the costs?

~~~
qop
Yeah, common sense.

Also, actual science work that they have to do.

~~~
newnewpdro
> Yeah, common sense

Could you explain

> Also, actual science work that they have to do.

Presumably these clusters aren't 100% utilized 100% of the time. They
certainly weren't at the national lab I had access to ages ago...

My question was more inquiring if there's any red tape preventing the
laboratory from paying the bills with mining, if it could do so profitably.
Let's just assume there's idle time where this could hypothetically occur.

I'm not even trying to suggest that they should do that, it's just an
interesting relatively new possibility and these things are quite expensive.

~~~
mabbo
The cost of the energy to do so would not be worth the bitcoins mined. Maybe
some alternative crypto-currencies would break even, but it wouldn't be easy.

Sitting idle, the machine is not using nearly the power draw of when it's
running full-tilt.

~~~
jakeogh
I'd be surprised if it sits idle at all. Someone correct me, but it's gotta
have a job queue, and a priority for each. There are plenty of problems (every
single one submitted) where it's energy consumption is irrelevant.

~~~
ianai
[https://slurm.schedmd.com/](https://slurm.schedmd.com/)

It still depends on scientists to produce code that can run during off hours.
(And how well are scientists known to code?)

