
Supercomputers for Weather Forecasting Have Come a Long Way - jonbaer
https://www.top500.org/news/top500-meanderings-supercomputers-for-weather-forecasting-have-come-a-long-way/
======
gbrown_

        Currently the most powerful weather forecasting computer is the UK Met Office
        machine, a Cray XC40 machine that delivers 6.8 petaflops and sits at number 11
        on the TOP500. 
    

But this does not run the production forecast, that is done on one of two
smaller machines which are capable just over 3 petaflops.

Source: I work for Cray at the UK Met Office.

~~~
popobobo
Is it running fortran? Which version?

~~~
batbomb
It runs Cray Linux (based on SUSE) and assuredly somebody is running Fortran
on it in some form.

Many people think these large machines are intended for a small handful of
users that run extremely large jobs. That's typically not the case; they
typically have tens of users per day up to hundreds per week, depending on
institution. With diverse users comes diverse software and the challenges
based on that.

------
__mp
A reason for the slow adoption of accelerators is that these code bases are
pretty old and mostly Fortran based. I don't think we are going to see a shift
to accelerator-based weather models any time soon. The code bases are pretty
old and most scientists working on these models don't have the experience or
do not want to work in a programming methodology that works well with
accelerators (at least that's my experience). Also in a lot of cases people
just like to work with an already existing code because they are just doing a
PhD and don't want to translate everything to a new language.

The COSMO model which I'm working on is 18 years old. We ported the compute
intensive part of the model to C++ using a stencil library and integrated
OpenACC pragmas (think OpenMP, but for GPUs) in the rest of the code base.

I'm not a big fan of OpenACC, because it requires the user to make assumptions
on the underlying hardware and quite a bit of thinking to get high-performance
code. It is quite time consuming to integrate the pragmas so that the code
performs. OpenACC capable compilers used to be very unstable, we regularly got
compiler breakages and regressions. It got a bit better since we send the code
to the vendors, but we still see regressions from time to time. All (usable)
OpenACC implementations are proprietary, so we have a vendor dependency. The
PGI OpenACC binaries used to be pretty slow, but it is now almost up to par
with the Cray binaries.

The successor model to COSMO, ICON, developed at DWD and MPI is also written
in Fortran using OpenACC pragmas (and I think OpenMP pragmas because they plan
to use it on XeonPhi). The code is interesting in the sense that they are
using a icosahedral grid, which stands in contrast to the square grid COSMO
uses: Data accesses are not straightforward. Keep in mind that Fortran does
not have easy abstractions for data accesses outside of square grids, so you
have to use a couple of macros/functions/etc.. to get to the fields you need.
Disclaimer: I have not seen the code myself, but the DWD certainly knows how
to write high-performance code, so I'm certain most of the code is optimized.

~~~
m_mueller
Hi! I'm actually working on an OSS transpiler project that is made to convert
CPU-only array based Fortran applications such that they can run on both CPU
and GPU. It's been in the works since 2012 and I'm just now gearing up for a
1500x1300 2km run on a Japanese supercomputer, results will get submitted
soon.

[https://github.com/muellermichel/Hybrid-
Fortran](https://github.com/muellermichel/Hybrid-Fortran)

~~~
__mp
Oh that's pretty neat. A colleague of mine is working on a similar project
applying directives to allow loop reorderings for OpenACC and OpenMP code
(CLAW project). He's using the Omni-Compiler tough.

~~~
m_mueller
You mean VC? Yup, he's visited me in Tokyo, we've talked about the two
projects ;-)

~~~
__mp
Haha, yes exactly I meant him. The world is small :)

------
njharman
Can we take a moment to reflect on how awesome and revolutionary (in changing
world, not its tech) the Linux Kernel (and *nix ecosystem around it).

I remember when the TOP500 was being taken over by Beowulf Clusters
[https://en.wikipedia.org/wiki/Beowulf_cluster](https://en.wikipedia.org/wiki/Beowulf_cluster)
Linux made that happen. At the time (and today) it ran on small embedded
systems, to some of the world's fastest supercomputers and practically
everything in between. Incredible compared to the alternative contemporary
OSs.

~~~
clappski
For all it's warts, in my opinion Linux (and *BSD's) are some of the greatest
software engineering efforts of our time. I think we'll be lucky to ever see a
FOSS kernel reach the same level of ubiquity in our lifetime as Linux.

------
lucaspiller
I've always wondered how meteorology works in terms of computer models. As
someone knowing nothing, what's some good material to start learning how it
works?

Now we have powerful computers and lots of data freely available (well maybe
less now thanks to Trump), I've always dreamed of running an old computer
model say from the 1980s at home.

~~~
aeroman
You can do better than the 1980s - the WRF (Weather Research and Forecasting)
model [1] is up to date and will run on a PC (albeit not at a super high
resolution). You can provide boundary conditions with freely available data
from NOAA [2]

However, if you want to run the kind of forecasts that a weather center would
do, you big issue would be getting input data. Some of the biggest advances in
forecasting have come from improving the amount of satellite data being
'assimilated' into weather forecasts - look how much better the southern
hemisphere became compared to the northern hemisphere during the early
satellite era (1980-2000). Getting this data in a timely fashion is more of a
challenge to do at home.

[1] -
[http://www2.mmm.ucar.edu/wrf/users/](http://www2.mmm.ucar.edu/wrf/users/)

[2] - [https://nomads.ncdc.noaa.gov/](https://nomads.ncdc.noaa.gov/)

[3]Page 3 in -
[http://www.ecmwf.int/sites/default/files/elibrary/2012/14553...](http://www.ecmwf.int/sites/default/files/elibrary/2012/14553-background-
and-history.pdf)

~~~
semi-extrinsic
Also, I believe most weather forecasting makes good use of ensemble runs -
take your initial data set from observations, introduce N sets of
appropriately sized random variations in the initial data, and run N
simulations. Then you look e.g. at the the spread of results, giving you
information about the certainty of your prediction.

~~~
amelius
I suppose these perturbations would need to be extremely small, because of the
butterfly effect.

~~~
semi-extrinsic
I'm guessing they use something like Gaussian noise with a width approx. equal
to the uncertainty in the initial conditions.

~~~
mturmon
"Bred vectors" are an example:
[https://en.wikipedia.org/wiki/Bred_vector](https://en.wikipedia.org/wiki/Bred_vector)

Note that these vectors are generated not by just sampling the uncertainty in
I.C.'s. Of course, this is because the space of I.C. perturbations is too
high-dimensional to cover. In the method above, selection of perturbation
direction is based on the adjustments implied when new data is sync'ed to the
model.

There are other techniques.

------
robert_tweed
Kinda disappointed about the lack of historical data given about the sort of
supercomputers used for weather modelling over the years. Mainly because I
want to know which era of weather forecasting could conceivably run on my PC,
my phone, or the latest Raspberry Pi.

~~~
ghaff
As I recall from when I followed the supercomputer space in more detail,
weather was one of the areas that tended to use specialized supercomputers
like IBM BlueGene rather than big clusters because the weather models were
harder to parallelize. Go back in time and I'm sure you'll see quite a few
high-end IBM pSeries. Go further back and there will be things like Cray
vector machines.

~~~
angry_octet
No it is inherently a well parallelizible problem, it is just that code was
originally written for the original vector supercomputer CPUs (Cray, NEC SX)
and had to be rewritten for MPI style clusters.

It ironic because now we have to rewrite everything to use SIMD/vector
accelerators (GPUs).

------
olegkikin
I do windsurfing, and wind prediction is absolutely horrible, especially in
the areas with complex terrain. I understand why it's hard, it's a
3-dimensional problem, and pretty much all weather stations are sitting on the
surface, and the wind ones are quite sparse.

We need 10x-100x more stations/sensors for the data to be good. I've been
thinking of ways to create a cheap solution for a simple weather station that
would ping the data once a minute to the central server over something like
GSM, but I'm more of a software guy. Determining wind strength/direction is
not that hard, but making it all weather proof is.

~~~
brandmeyer
There is a startup that has designed an ultrasonic wind field sensor that can
make many measurements of the wind field above the instrument. The instrument
is on a smallish trailer, towed by a pickup to the test site. Once there, it
can take measurements of the wind field up to 200m or so above the trailer.
The idea is to make a cheaper replacement to the met towers that wind turbine
site surveys use today.

Its much more expensive than a ground-level mechanical anemometer, but also
much less expensive than a tower with an anemometer.

~~~
Helmet
Which start-up?

~~~
brandmeyer
My information was out of date. The startup _was_ Second Wind. They have since
been bought by Vaisala, and their product currently ships under the Triton
name.

------
Odenwaelder
Here's a great (paywalled) Review on numerical weather forecasting:
[http://www.nature.com/nature/journal/v525/n7567/full/nature1...](http://www.nature.com/nature/journal/v525/n7567/full/nature14956.html)

------
Retric
It seems like weather forecasting has actually gotten really good over the
last 30 years, except for snow. Are there any good statistics for relative
accuracy out say 1-3 days?

~~~
Afforess
Weather forecast accuracy is a statistic anyone can measure and generate. Put
the forecasts and the actual weather outcome together in a spreadsheet,
examine the results each day for the 1,3, and 5 day forecasts. NOAA and the UK
Met Office both do this for all of their forecasts, it's called
"verification", and they measure their forecasting bias. In general, the
statistics I've seen suggest the 1 day forecasts are 95% accurate, 3 day
forecasts are ~80% accurate, and 5 day forecasts are ~70% accurate. This may
vary if you live in a hard to forecast area.

For example, surface temperature forecast verification for Jan 2017:
[http://www.mdl.nws.noaa.gov/~verification/ndfd/index.php?mo_...](http://www.mdl.nws.noaa.gov/~verification/ndfd/index.php?mo_id=201701&ty_id=pt&rr_id=ptreray&re_id=co&wx_id=tt&pr_id=003&md_id=nd&mr_id=Array&cy_id=00&pd_id=ma)

~~~
philippnagel
What constitutes a hard to forecast area? Any examples?

~~~
ghaff
A lot of variability in weather. Multiple weather systems tending to come
together. It's a lot harder to forecast Boston than it is Las Vegas. It also
depends what you're trying to forecast. If you live in a very dry area, you're
going to be right about precipitation the vast bulk of the time.

------
tmsldd
Could someone please give some references/publications for the algorithms
models and implementation strategies that are currently running weather
forecast predictions on this machines?

~~~
brandmeyer
[https://mpas-dev.github.io/](https://mpas-dev.github.io/)

This is a model that didn't quite make it as the replacement for the GFS in
the US. It had broad backing from the research community, and if you follow
their references back you'll find a wealth of research.

------
deepnotderp
Apparently perfect simulation will require zettaflops!

We can't even achieve exascale right now! (But we'll get there)

~~~
dekhn
There is no such thing as perfect simulation.

And, it's more likely that when people realize how incredibly hard it is to
build exascale machines, and how unproductive they will be, they'll just end
up training deep learning models which approximate the perfect simulations
well enough and cost-effectively enough on commodity GPUs that nobody will buy
or build supercomputers any more.

~~~
deepnotderp
Baidu estimates it takes ~20 ExaFLOPs to train a deep net. Consider that it's
not uncommon to churn through ~100 hyperparamter combinations (let alone
architectures), especially for something high end as this, and you realize
that deep learning is no escape from the compute problem!

There's a reason why the big DL players hire lots of HPC guys for deep
learning. And also, parallelizing SGD is a highly non-trivial task, until very
recently, you had to linearly lower the learning rate as you added new
workers, and keep in mind that bandwidth is far more expensive (both
economically, temporally and energy-wise) than compute.

That being said, many people are already exploring deep learning for weather
simulation (such as Yandex I believe) and it has worked very well, and is
near, or beating SOTA iirc, so I definitely think there's a future there.

For the record, I think exacsale is achievable, but not with the current
architecture (admittedly I'm biased, as a deep learning chip and
supercomputing chip startup founder), but I think my objective evidence is
pretty strong that with a new architecture, exascale is possible. On the other
hand, zettascale may be the end of supercomputer scaling. Moore's law, Dennard
Scaling, et al are no saviors either, since communication and scheduling logic
now dominates the cost of computation.

~~~
dekhn
You mean "20 ExaFLOP". "20 ExaFLOPs" is a per-second measure.

FWIW, I work at Google, have a background in parallelizing simulations, and
built an execution system that runs huge hyperparameter combinations. We
called it "exacycle" because it exceeded 1 exaflop / second (no communication
between tasks) using only idle cycles.

I think you probably missed my point: it's now pretty well established that
for any physical simulation process, you can train a net using far less energy
and get an equivalent quality result. The training itself doesn't require lots
of communication unless the model is enormous.

~~~
deepnotderp
Yes, exaflop, I'm so used to writing "FLOPs" that I wrote it xD

You may have a background in parallelizing simulations, but I beg to differ
about parallelizing deep learning training. There is a lot of communication
involved. For one, your model is very likely to be enormous (perhaps even
necessitating model parallelism) and second, there is a LOT of communication
involved even with data parallelism.

~~~
dekhn
I guarantee that communication is small compared to what the weather
simulators require. Deep learning systems typically work fine on large-
bandwidth, medium-latency systems. They are highly tolerant of async updates.

Most parallelism in DL uses coarse exchange, although I agree wrt to large
models. That's similarly true for every supercomputer application (it's the
only reason people use supercomputers now; it's much easier and faster if your
system fits on a single system).

