
New Weather Service supercomputer faces chaos - luu
http://www.washingtonpost.com/blogs/capital-weather-gang/wp/2013/06/25/new-weather-service-supercomputer-faces-chaos/
======
ahelwer
Tracking down numerical errors is brutal. I worked at a company re-
implementing some MATLAB research code into optimized C++ (with
parallelization where possible), where output had to match exactly bug-for-
bug[1]. The algorithms were obviously very simple compared to weather
forecasting on supercomputers, but it was still incredibly tedious dumping &
comparing MATLAB debug values at dozens of points across multiple scenarios,
each of which took hours to run. We even calculated one of the datasets they
gave us would take 86 years(!!!) to process in the original MATLAB code.
Sympathy for the team working on this. Fond memories of getting enormous
arrays of NaNs after a fix.

[1] Bug-for-bug _means_ bug-for-bug. There was a clear attempt to sort an
array in the research code, which was implemented as a single pass through the
data just deleting elements that were smaller than their direct predecessor.
Research code is weird. It's one of the few places in software where "hack it
till it works" actually produces valuable results.

------
khm
This post is years old, and those systems went into production in July of
2013.

------
e0m
As explained well by Nate Silver's "Signal and the Noise", when the weather
forecast model says a 50% chance of rain, that doesn't mean it "might" rain,
that means that in 50.001293% of millions of simulations precipitation above
some threshold developed in the spot in question.

A news station can't get away with saying 50% though. The public assumes
that's a cop-out answer. As such they will say 60% even if the real number is
50%. They will never say 40%. Better to seem over-prepared then rain on you
when it wasn't supposed to.

Even if we had the perfect computer, at the end of the day it's got to get to
your average person through a local news weather reporter.

~~~
ianlevesque
> Even if we had the perfect computer, at the end of the day it's got to get
> to your average person through a local news weather reporter.

An average person maybe, but not us if you use Dark Sky for iOS or its web
equivalent [http://forecast.io/](http://forecast.io/)

I'm simplifying but it essentially plugs your GPS coordinates into a weather
model and gives you the output.

~~~
batbomb
I have to say I was using a feature like this on the NOAA site well before
there was an App Store or .io domains.

------
DanielBMarkham
I do not understand why this is news. All CA (cellular automata) simulations
tend to diverge in various ways as they run forward. In fact, the unusual
situation is where they have some sort of stability. With the degree to which
CA is being used, why hasn't this become common knowledge? All these folks
going on and on about "mathematical models" of this or that, and they don't
even understand the nature of the math? Huh?

More counter-intuitiveness follows. The more complicated you make the model,
oddly enough, the less likely it is to be stable. (I'm looking at you, weather
model)

 _I suspect that differences are random so that on average they are performing
statistically about the same._

Oddly enough, nope! Nice try, though. The rate of divergence is unpredictable
and not guaranteed to average out to anything. In fact, that's the definition
of chaotic. Unless you've discovered an attractor, you just have unpredictable
noise. Add that with CA and you have systems that diverge in various ways at
various rates. Sure if you're sample size was _huge_ , it'd probably all work
out. But 2-3 or even a few dozen? Not so much.

~~~
zackangelo
Discretizing your domain into a grid doesn't necessarily imply you're solving
the problem in question using cellular automata. I'm not familiar with the
specifics of GFS, but it's common to model physical phenomena using the finite
difference method.

It's newsworthy and interesting because they're running the same exact model
on two different computers and it's yielding wildly divergent results. It'd be
as if you took the same CA rules with the same initial state and it ran
differently on two different computers.

It's tricky because these discrepancies are probably due to subtle differences
like the way the new CPUs are handling the floating point calculations.

------
spitfire
So assuming the two programs running are in fact the exact same, it seems like
there's some error in one of the systems' floating point implementations.

Either the old computer (and systems that came before it) were producing the
wrong results, and the new one is more accurate.

Or the old system (and systems that came before it) produced the right
results, and the new system has a bug.

These things do happen, but it's very expensive to fix hardware.

~~~
kfcm
Or both systems are wrong, but just wrong in different ways.

~~~
ithkuil
Or both are right, and the software is hitting some undefined behaviour in the
compiler. The article does in fact mention that a different compiler version
was used.

With C is quite easy to happen, don't know about fortran though.

~~~
SolarNet
I would call an error placed by the compiler due to failure to exactly follow
standards a problem in the system.

We are talking totality of systems here, the compiler that produces the
programs _is part of the system_ that uses the program.

~~~
ithkuil
I wasn't referring to failures of the compiler to follow the standards, but
rather to failures of the _software_ to follow the standard.

"So assuming the two programs running are in fact the exact same,"

The source code for the programs might as well be the same, but the actual
(binary) program is not necessarily the same.

Or to put it in other words, both systems might be perfectly correct, but
there is a class of software bugs that get revealed only when incarnated after
being compiled in one of the several perfectly legit ways. The dimension of
this set of possible alternative optimisations can be quite high if your
language offers many areas for undefined behaviour.

Does anybody know if fortran is plagued by it as much as C ?

------
kfcm
M&S is one of the areas I work in, and there are quite a few things to
examine. Hardware floating point implementations are one area. Compiler
(internals and/or flags) and assembly language differences (as author alluded
to) are others. Heck, even a thread race condition might be showing up now for
whatever reason.

------
jpistrue
I think this merely points out that both prediction systems were basically
B.S. to begin with. Ian Malcolm would agree.

~~~
SolarNet
I don't see how you got there. Numerical analysis is a whole branch of
computer science which regularly deals with slight differences in computation.
The usage of a constant (like pi) that is one bit different could cascade into
massive differences in the results between the two programs.

Neither are completely wrong (as their results are similar) they just aren't
exactly the same, especially when dealing with 7 day forecasts, this isn't to
be entirely unexpected. We are trying to approximate massive systems we don't
fully understand. A physics simulator is decent at getting a broad sense of
things, but is not precise enough to get it exactly right, even though we use
them to fly airplanes, same thing here.

~~~
mturmon
"The usage of a constant (like pi) that is one bit different..."

That brings back memories of debugging a Fast Fourier Transform code that had
a slightly miscoded version of pi. Someone typed it in from memory, to a dozen
or so places (single precision), and got it slightly wrong.

Talk about weird and hard to track down. Like being transported to the
parallel universe where the cars ride on hexagons.

~~~
Terr_
We could do that here... More like a universe where you can tile a floor with
pentagons.

------
acd
Mathematician has proven by the Lorenz equations that you can do weather
predictions up to 3 days adding more computing power on it won't help since
its a chaotic system. Did you know that the weather forecast for the next day
is abort 45% accurate?

~~~
mbq
It is more-less globally true; locally it depends on the weather itself --
sometimes you can make up to few weeks and be confident about it from ensemble
self-consistency (think desert in dry period), sometimes it blows up few hours
into future. Also there is no way of defining single "weather prediction
accuracy"; this one you cite is probably some meaningless mixture of various
factors (;

