
Accuracy of three major weather forecasting services - rhiever
http://www.randalolson.com/2014/06/21/accuracy-of-three-major-weather-forecasting-services/
======
joelthelion
This curve is not enough to evaluate the value of a weather forecasting
service: if it rains, say 30% of the days in a specific area, you could
forecast a 30% chance of rain every day and have good "accuracy". And yet that
would be of no practical value.

I think a better metric would probably be something from information theory
like mutual information, but I'm not sure which one exactly.

~~~
bo1024
Right. The curve demonstrates "calibration". A separate question is daily
accuracy, or what might be called "sharpness".

One way to measure accuracy is to use a "scoring rule" each day. The scoring
rule takes the prediction (a probability of rain), e.g. p=0.3, and the actual
outcome, e.g. "rain" or "no rain", and returns a numerical score, where higher
is better. You'd expect a reasonable scoring rule to have properties like, the
highest score is for predicting p=1 and it rains (or p=0 and it doesn't), the
lowest score is for p=1 and it doesn't (or p=0 and it does), and so on.

The _log scoring rule_ is a popular choice: If it rains, your score is log(p);
if it doesn't, your score is log(1-p). (Don't let it bother you that these are
negative numbers for p in [0,1]. Closer to zero is still better.) Notice that
your expected score for predicting p is precisely plog(p) + (1-p)log(1-p),
which is the negative of the entropy of the distribution, that is,
-H(Bernoulli(p)).

Now, what we could do to measure accuracy is to average, over all the days,
the log scoring rule applied to your prediction and the outcome. A perfect
score would be if you predicted 1 every time it rained and 0 every time it
didn't, which would give you a score of zero; if you're not perfect, you'll
have a negative score, with more negative being worse.

So the average score will be (the negative of) the average entropy of your
predictions, which makes sense as entropy corresponds to "uncertainty". (This
is assuming your predictions are calibrated. If you are uncalibrated, i.e. you
make predictions with little uncertainty but they are _wrong_ , then you will
pay a penalty in proportion.)

We can connect this to your idea of mutual information, if we had a more
detailed setup. For instance, suppose that each day, nature picks a true
probability of rain p', then you observe some information related to p' and
make a prediction p; then it rains with probability p'. Here, the best you
could possibly do is to observe all the information and predict p'. Then your
average score will be the average entropy H(Bernoulli(p')). But if you are
making poor predictions, then the theory of proper scoring rules tells us that
your average score will be the average mutual information between the
distributions Bernoulli(p') and Bernoulli(p). (If we are using some scoring
rule other than the log scoring rule, this mutual information, which comes
from KL-divergence, is replaced with a "Bregman divergence".)

Unfortunately, we don't know this true p' each day, so we can't actually use
mutual information to evaluate forecasts, but we can still use the entropy, or
log scoring rule.

For a survey on proper scoring rules (not very fun for the layperson I'm
afraid), see
[http://www.eecs.harvard.edu/cs286r/courses/fall12/papers/Gne...](http://www.eecs.harvard.edu/cs286r/courses/fall12/papers/Gneiting07.pdf)

------
emkemp
The plot in the article is an example of a "reliability diagram" frequently
used in weather forecast verification. See, e.g.,
[http://www.bom.gov.au/wmo/lrfvs/reliability.shtml](http://www.bom.gov.au/wmo/lrfvs/reliability.shtml).
Reliability is considered separate from accuracy in meteorology -- the former
evaluates success conditioned on what was forecasted, while the latter is an
unconditional evaluation of success or failure.

Other facets of forecast "goodness" exist and are often considered in
meteorology. A seminal paper on the subject was penned by Allan Murphy, who
identified three types of "goodness" (consistency, quality, and value) and ten
subsets of quality (including reliability and accuracy). See
[http://www.glerl.noaa.gov/seagrant/ClimateChangeWhiteboard/R...](http://www.glerl.noaa.gov/seagrant/ClimateChangeWhiteboard/Resources/Uncertainty/Mac1/murphy93PR.pdf).
[PDF warning]

A popular companion to the reliability diagram is the Relative Operating
Characteristics (ROC) curve. Here different forecast probability thresholds
are tested to calculate likelihood of success if the event occurred, and
likelihood of error if the event did not occur. This evaluates what Murphy
calls discrimination (forecast quality conditioned what was observed) which
complements reliability. See, e.g.,
[http://www.bom.gov.au/wmo/lrfvs/roc.shtml](http://www.bom.gov.au/wmo/lrfvs/roc.shtml).

Curiously, accuracy tends to take a back seat in forecast verification to
other aspects of quality, particularly in rare event situations. This trend
first began in the mid 1880's with the "Finley Affair", a series of published
articles debating how to evaluate tornado forecasts issued by the US Army
Signal Corps. Murphy published a fascinating literature review on the subject
and showed that many of the skill scores and debates born during the Finley
Affair are still active today. See
[http://www.nssl.noaa.gov/users/brooks/public_html/feda/paper...](http://www.nssl.noaa.gov/users/brooks/public_html/feda/papers/murphy96.pdf).
[PDF warning]

------
bcl
If you're interested in the science (and some of the problems we have in the
US) read UW Professor Cliff Mass' blog -
[http://cliffmass.blogspot.com/](http://cliffmass.blogspot.com/)

------
jloughry
It would be interesting to repeat the comparison for many specific locations,
then plot the measured variance geographically as a "heat map" or landscape of
error [1]. Are there patterns visible that could be attributed to local
geography, population density, or other factors?

[1] It could be done, for weather.gov [2], using only data available from the
web site [3].

[2] I don't really care about the other forecast sources.

[3] To trust rainfall observations obtained from weather.gov in order to
compare them to predictions made by weather.gov seems vaguely wrong but there
is no other comparable source of observations. They are physical measurements,
after all.

[4] Some geographic areas probably have a coarser net of observation points.
In some places, e.g., San Diego, the weather is inherently easier to predict.
Some local forecast offices may be more skilled than others.

~~~
Shivetya
how wide of an area is used to test against the predictions? I notice he calls
out local weathermen, well a thirty percent chance of rain given out by the TV
guys for the metro Atlanta area is either dead accurate or dead wrong
depending what part of metro Atlanta I am in.

It would be really cool to see weather forecasts done up like topography maps,
with +/\- symbols denoting changes in precip and the like

------
kator
I have enjoyed [http://darkskyapp.com/](http://darkskyapp.com/)

In my totally un-scientific opinion weather.com has slowly gotten worse over
the past several years. I think it's become a lot hard to monetize weather and
the weather service has really stepped up the game on providing public outlets
that are digestible by the general public.

My guess is in the early days what weather.com and the weather channel where
really doing is translating the difficult to understand NWS messages and
helping the general public quickly understand: "Do you I need a rain coat
today?". That said over time as NWS has stepped up their public interfaces
that value add is sliding backwards and getting harder to maintain.

That's my $0.02CPM worth.. :-)

~~~
MBCook
I liked DarkSky too (I actually use Weather Line [1], which is powered by
them, switched a while ago, better UI), but I've seen it take a noticeable dip
in accuracy over the last 2 years or so.

On PC I use they site Forecast.io. I could never use Weather.com or any of
those other sites, too many ads and other junk making it hard to use. I want
to know tomorrow's weather, not the pollen count forecast fort this summer.

[1] [http://weatherlineapp.com](http://weatherlineapp.com)

~~~
kator
Looks interesting I'll check it out!

------
Intermernet
"The further you get from the government’s original data, and the more
consumer facing the forecasts, the worse this bias becomes. Forecasts “add
value” by subtracting accuracy."

As interesting as this is, isn't it obvious that the further you get from the
primary source, and the more consumer facing a report is, the less accurate
it's going to be? This is the only opportunity for interested parties in the
info-chain to massage the data to suit their own ends (Not necessarily
nefarious, just selfish). If it's going to happen, this is where it will
happen.

After all, we take the news we see on TV with a cellar of salt, so why would
we believe the weather?

~~~
angry_octet
Actually a meteorologist isn't just massaging the data. Models often have know
biases, eg under predicts rain when there is an offshore wind, or doesn't
model complex local topography and gets mixing wrong, etc. Forecasters use
their local knowledge to tweak the forecast, and achieve significant accuracy
gains. But people are expensive, so they are affordable for high impact cases,
like aviation weather.

------
simmons
For a while, I've been thinking about doing exactly this sort of analysis.
Thanks for putting in the effort!

Does anyone have experience getting a feed of the raw NWS forecast data for
many points in a large region (e.g. a state or the whole country)? I was
thinking the other day that it would be great to have a web site that showed
the forecasted chance of precipitation across a region, to answer questions
like "Where in the Colorado high country should I go camping this weekend?"

~~~
ByronT
[http://www.hpc.ncep.noaa.gov/pqpf/conus_hpc_pqpf.php](http://www.hpc.ncep.noaa.gov/pqpf/conus_hpc_pqpf.php)

------
Theodores
This is a quite cynical take on how weather forecasting works written by
someone that quite clearly does not know one single weather forecaster.

First of all there are only two agencies on the planet that do the number
crunching to work out some reasonable forecast data that encompasses the whole
globe. These are the NWS and the UK Met Office. As well as having to have a
lot of big computers these agencies also need source data, this data -
observations - comes from airports and plenty of other places where things
like wind speed, precipitation temperature and so on is actually measured. At
times the observations are wrong - imagine the baking tarmac of that big
airport and how that differs from the tranquil yet noisy houses close to a
nearby river.

The NWS differs from the Met Office in that they don't charge for the GRIB
data. The tax payer has paid for it already in the USA so they don't have to
pay for it again. Hence the proliferation of things like The Weather Channel
that use NWS rather than Met Office data.

One thing that outsiders to weather forecasting do not realise is what it is
that weather forecasters actually do. They imagine them to be very scientific
- which they are - but they don't realise that they are essentially in the
'betting shop' business. To take an automotive example, if you had perfect
knowledge of every car that is entering tomorrow's F1 race and you had perfect
knowledge of the well-being of every single driver, mechanic and tea lady
involved in the event, can you actually predict which of the 22 drivers is
going to win? Will it be the guy on pole? The guy who has one most of the
races so far? The guy who consistently comes second? Or some random outsider?

The GRIB data is far from perfect knowledge, it is a forecast of what is going
to happen and the accuracy depends on the time window going into the future.
The data is fully 3 dimensional, think of it as lots of onion layers going
around the whole planet. Data points are on a grid - what happens if your town
is next to some huge mountain with 'your' data point on that grid being
several thousand feet higher than where your town is? The GRIB data for your
town is not actually for your town, it is for the mountain. A meteorologist
will have rules of thumb plus the science to arrive at a more accurate guess
than the GRIB gives - this is interpretation of the data, not some sixth
sense, however, it is still nonetheless a gamble/guess.

As well as the GRIB data there are things like satellite images - from lots of
different flavours of satellite - plus there is radar data. This can all be
layered up on top of GRIB data and pretty maps to create an interpreted
forecast. The 'wet bias' is more likely to be rookie mistake meteorology
rather than devious ploy to get viewers watching. Look at any satellite image
and see the low level haze from things like jet plane 'contrails' plus coastal
fog etc. There is an awful lot of it on satellite images and it is very easy
to permanently be predicting rain from seeing such cloudy greyness. Hence at
the local weather station this is more likely to happen. On the Weather
Channel where they have excellent interpretation tools for their forecasters
this is less likely to happen, not so much because of the tools but because of
the forecasters - they are more experienced gamblers.

The other thing to remember with weather forecasting is that today's
predictions can be checked against tomorrow's observations. Things can be
consistently wrong for a given town/area due to the way the GRIB data works
(i.e. does not factor in local topology), and it can take a while before this
error in the model is discovered and fixed. There may not be observation data
available for smaller towns so some errors might never be fixed.

The weather prediction industry is fairly ripe for disruption. The tools that
meteorologists have used to require big workstations to run, nowadays a Google
Earth type of app would suffice, if someone could be bothered to write it.

Amongst themselves meteorologists know a lot more about the current factors
influencing the big picture of the weather. For instance, the storms that
start off on the west coast of Africa, cross the Atlantic and 'bounce back' to
the UK, losing energy on the way to end up as mere rain. Clearly such weather
patterns take weeks to do there thing, however, for a gardener in the UK it
would be good to know if rain was on its way over the next few weeks. Yet the
demands of forecasting format mean that the forecaster has to tie that down to
'rain expected teatime next Tuesday' (or whenever). Returning to the 'app'
idea, it would be great for everyone if they could explore the raw data and
have these bigger events pointed out by an expert, so that the raw data can be
interpreted in a meaningful way. Instead we have banal 'insights' such a this
article (that probably did not intend to be banal or naive but that is the way
things sometimes happen despite trying hard).

~~~
shoyer
You missed the biggest player in numerical weather forecasting -- the European
Center for Medium-Range Weather Forecasts (ECMWF). Everyone agrees their
models are the world's best. Regrettably, the American models are pretty far
behind: [http://cliffmass.blogspot.com/2014/04/the-us-slips-to-
fourth...](http://cliffmass.blogspot.com/2014/04/the-us-slips-to-fourth-place-
in-global.html)

The ECMWF forecasts are indeed quite expensive, but all the major players in
weather forecasting, including The Weather Channel and the National Weather
Service, buy them.

~~~
Theodores
Very interesting! It has been a while since I worked in weather and as far as
I was concerned there was the U.S. 'free' data and the British 'paid for'
data. ECMWF does ring a bell though, I think we did have selected access to
some of their forecast products but it has been a while. The centre is very
much based in the UK, more specifically Reading, which is the place where
anyone with a meteorology degree studies.

I think that a history of weather forecasting would be quite fascinating as it
is tied in to the growth of the British Empire, the needs of Britain 'ruling
the waves', the development of the telegraph, the development of aviation and,
of course, satellites and supercomputing. For many decades meteorology has
been in complete denial about climate change, this too would make an
interesting chapter to the story.

~~~
angry_octet
I say old chap, I'm afraid you are being rather anglo-centric and what not. I
have it on good authority they have weather in the heathen lands to the Far
East. Why the Japanese built an 'Earth Simulator' with 10TB of RAM back in the
last century.

And apparently the Indians launch their own weather satellites into space.
Gosh.

------
samirmenon
I'd love to see this done with temperatures, other kinds of precipitation,
etc. I think I'll have to make it a weekend project...

------
jloughry
Anecdotally, I can corroborate the observation: weather.gov slightly
underpredicts rain; a "30% chance" is associated with rain more than half the
time.

