
Frontier: ORNL's 2021 exascale supercomputer will run on AMD CPUs and GPUs - espeed
https://www.olcf.ornl.gov/frontier/
======
hydroreadsstuff
Here is a recent training from AMD for Frontier: Link:
[https://www.exascaleproject.org/event/amd-gpuprogramming-
hip...](https://www.exascaleproject.org/event/amd-gpuprogramming-hip/) Video:
[https://youtu.be/3ZXbRJVvgJs](https://youtu.be/3ZXbRJVvgJs) Slides:
[https://www.exascaleproject.org/wp-
content/uploads/2017/05/O...](https://www.exascaleproject.org/wp-
content/uploads/2017/05/ORNL_HIP_webinar_20190606_final.pdf)

It will be very interesting to see what AMD came up with to convince ORNL to
switch from 7 years (or more?) of NVIDIA to something else. I don't think it's
just a lower price. Perhaps AMD is doing a closer tie-in between CPU and GPU.
Since they own both.

~~~
simonbyrne
I seem to recall that part of the DoE's selection criteria is to ensure a
competitive market for future contracts (since they will always be buying more
supercomputers). Given the recent dominance of Nvidia in the market (see
[https://www.top500.org/lists/2019/06/](https://www.top500.org/lists/2019/06/)),
it probably made sense for them to ensure some contracts went to other
suppliers (AMD for Frontier, Intel for Aurora) to ensure that Nvidia doesn't
establish a monopoly on future tech.

Additionally, these sorts of supercomputers are also a way for governments to
implicitly subsidise their tech industries: when viewed through that lens,
spreading these contracts around makes a lot more sense.

------
xvilka
Anything that is bad for NVIDIA is good for the FOSS. This company is the
enemy of opensource, with its stance against nouveau, monopoly for GPU
calculations with CUDA, etc.

~~~
gnufx
Indeed. For instance, you can't even package applications with GGPU support
for anything other than opencl for free software distributions like Fedora
because the libraries are proprietary. At least unless someone knows of dummy
libraries that you could against and substitute at run time -- I assume more
than what GCC does? That seems the most important issue for free software HPC.

------
gigatexal
If I recall correctly the AMD hardware is competitive or better in raw compute
than their nVidia counterparts but Cuda makes developing software so easy so
I’m curious what ORNL’s take on that is.

~~~
petschge
I don't work for ORNL and don't know what their take in this is, but a lot of
codes that the DoE uses are being ported to the Kokkos framework out of the
Sandia National Lab. The application developer at that point basically doesn't
have to care if the code runs on KNL, on Nvidia cards using Cuda or on AMD
cards using ROCm. The work of tuning Kokkos well on AMD cards is cheap
compared to getting stuck on a single vendor.

~~~
gigatexal
Is it open source? It’d be huge to have something compete with cuda so that
the best hardware may win

~~~
petschge
Kokkos (as in the C++ template frame work and a lot of tools to get insights
into bottle necks) is opens source. See
[https://github.com/kokkos/kokkos](https://github.com/kokkos/kokkos) and
[https://github.com/kokkos/kokkos-tools](https://github.com/kokkos/kokkos-
tools) . They take pull requests and react to bug reports. They even plan to
upstream some things from kokkos (such as multi-dimensional arrays) into
future C++ standards.

The mission related codes based on kokkos are very much not open source.

~~~
desertrider12
There are some non-weapons codes like Albany and NaluCFD based on Kokkos and
Trilinos that are open source.

[1]
[https://github.com/SNLComputation/Albany](https://github.com/SNLComputation/Albany)

[2] [https://github.com/NaluCFD/Nalu](https://github.com/NaluCFD/Nalu)

------
boulos
For those discussing it, ORNL explicitly calls out the need to rewrite and
retune in the CUDA => HIP transition on Page 3 of the spec sheet [1].

Edit: I assume getting folks to test on Summit is a big part of the de-risking
plan.

> The OLCF plans to make HIP available on Summit so that users can begin using
> it prior to its availability on Frontier. HIP is a C++ runtime API that
> allows developers to write portable code to run on AMD and NVIDIA GPUs. It
> is essentially a wrapper that uses the underlying CUDA or ROCm platform that
> is installed on a system. The API is very similar to CUDA so transitioning
> existing codes from CUDA to HIP should be fairly straightforward in most
> cases. In addition, HIP provides porting tools which can be used to help
> port CUDA codes to the HIP layer, with no loss of performance as compared to
> the original CUDA application. HIP is not intended to be a drop-in
> replacement for CUDA, and developers should expect to do some manual coding
> and performance tuning work to complete the port.

[1] [https://www.olcf.ornl.gov/wp-
content/uploads/2019/05/frontie...](https://www.olcf.ornl.gov/wp-
content/uploads/2019/05/frontier_specsheet.pdf)

