If anyone is interested in doing your own thing with weather data, check out MADIS [0]. There are various levels of access, some of which require NOAA approval. But if you're serious about making weather predictions, it's a good thread to pull on. I once set up a MADIS node, and our server was shut down very quickly by Amazon for "suspicious traffic", so beware of that - there's a lot of data that gets pushed through the system. If I remember correctly, it was kind of a pain in the ass to get set up/configured, but it was pretty cool.
For those interested in more background on NOAA and making money from it, I highly recommend reading The Fifth Risk[0] by Moneyball author, Michael Lewis[1]. It details how a couple private companies make a lot of money using NOAA data in interesting ways (e.g.: crop insurance (now acquired by Monsanto)). Another of those companies is AccuWeather whose CEO was appointed to head NOAA by Trump[2].
P.S.: Anyone notice that monitoring "Climate" was absent in the government announcement?
I honestly don't remember any more. At the time, we were working with NOAA, and I remember a problem that was solved by talking to an admin at NOAA (our IP needed to be on some official whitelist or something), but that may have been for a restricted data set. We didn't end up using it for long because the client said so.
But I dug around for some information to maybe get you started.
When I was working on this stuff, I found that a DFS on various government subdomains (like MADIS) was the best way to find information. It was tedious, but it worked.
It's also helpful to put on your fortran hat. For example, I once attended a Haskell meetup where someone wrote a parser to deal with parsing binary files from NOAA. I also was in a meeting (with some NOAA folks) once where I was asked if I "would prefer an ASCII file, or a binary one". This is not a world that operates on JSON or XML. Expect binary blobs with flags (bits) that change the meaning of other flags in fun and exotic ways. The binary nature of the data can help with data throughput limits, but boy is it a pain to deal with.
> This is not a world that operates on JSON or XML. Expect binary blobs with flags (bits) that change the meaning of other flags in fun and exotic ways.
That bring back memories... As a government contractor I've had to work with sensor data (seismic, radar, etc) in various formats that were developed well before the rise of XML and JSON :(
My favorite was a mixed ASCII and binary format, where each data record in a file had an ASCII header that described the format of the following block of binary, and pretty much anything could be different between records, even within the same data file (Time units? Integers? Floats? 16 bit integers? 64 bit? Big/Little Endian?).
The most "fun" I've ever had was decoding command and telemetry from piece of equipment for a ground station. The box would spit out this massive frame of data. It was a very long ASCII string that you would turn into binary to break into 6bit BCD values (no clue why they didn't use 4bit...). There were random flags of odd bit lengths (sometimes just a single bit, sometimes 5bits) thrown in between numbers for arbitrary reasons rather than just having all the binary flags up front. My python script was this ugly mess of slicing up the frame to turn it all into a very nice struct I could pass to the rest of the system. The manual with this piece of hardware was some old scan that must have been xeroxed a million times over so some portions of the document were just unreadable and you had to guess what those bits did. Other parts of the frame were just undocumented. Commands were send one by one as single letter with the actual ASCII representation of the numerical command parameter.
When I started the project, I looked online to see if anyone had done any previous work on this thing. A vendor was selling a GUI for the thing for $2000, I scoffed at the price and started working it myself. By the time I was done, it had probably cost my employer more than that but at least we had our own code that could connect to whatever you wanted rather than a GUI with a no API.
The Europeans are putting big money into weather forecasting, "The goal is to be able to provide, by 2025, reliable forecasts up to two weeks in advance."
Judging by last week's forecasts for my region: pretty far off. Sudden rain in the regions of 10mm/m² and a lot of clouds when sunny weather was predicted. The exact opposite happened as well. Temperatures were off by 5-8°C, too.
All of this for same day forecasts, not even 2 days in advance.
rain is pretty hard to predict though. I guess what matters the most is if you can reliably predict major events like heavy storms rather than if you will get a bit wet later today.
You might be onto something... met.no has been more accurate than the usual domestic weather forecasts for as long as I can remember. Literally, it can be hailing outside, and thundering, and NOAA says like 20% chance of rain and sunny, where met.no might say 70% chance of storms, cloudy, etc...
I use ECMWF but never even knew what was different - def more reliable where I am for cloud / sun. Haven’t paid attention enough to see if it’s wind predictions are better
I once had lunch with a senior meteorologist at the WMO, who was (5 years ago) convinced that the European model was 5 to 10 years ahead of the US model and that it would be hard for them to catch up. Not sure if that's still the case.
Somewhat related, what are some good weather sites for storm monitoring? I've been using ventusky[0] for rain forecasts and mrms[1] for storm and hail conditions. Is there anything better?
How interested are you in getting this done? Below is a link to a phased array demonstrator (SPY-1A) that was dismantled and replaced with a newer version in 2016. Might find out where SPY-1A is sitting (the phased array may have been returned to the US Navy), and since it'll perform both weather and aircraft surveillance, might be easier to sell to stakeholders for the coverage gap.
Alternatively, Roberts Field appears to be a major commercial air hub in central Oregon. You might argue from a safety perspective to your Congressional representatives (perhaps in concert with local air carriers and AOPA) that the airport needs a TDWR station (cost will be ~$4MM-8MM), which could also provide NOAA with the necessary weather surveillance data. Thunderstorms aren’t common on the West Coast though, hence the lack of TDWR stations in West Coast states. If you pursue this route, you'd want to get funds for this into some sort of federal transportation bill, as part of enhancing the safety of the air transportation system.
I've mentioned it to our representative, Greg Walden, in the past, but he doesn't seem interested, which is a pity, because this is the kind of non-partisan stuff that they ought to be getting done for their constituents.
Some meteorologists are not in love with the new model. My local forecaster/weather blogger suggests that the v3 model tends to overestimate cold snaps and move storms too fast in the mid-latitudes:
https://blogs.mprnews.org/updraft/2019/06/milder-with-spotty...
I was researching weather prediction not long ago. From my nieve perspective it seems that dispute all the increased gpu computational power and advances in machine learning, there have not been any great advances in weather prediction. Is this true?
Edit: Downvotes for simply asking a question. sigh.
I worked as a forecaster for a bit but never made it to the research world (studied theoretical pde instead of computational)... however at the time huge gains had been made in data assimilation. One fact that has stuck with me was that ~1/3 of the computation time for the UK Met global model run was consumed by data assimilation. I don't remember statistics anymore but data assimilation schemes were a big driver of improved forecast skill.
I also recall the ECMWF had surprisingly accurate long range forecasts based on ensembles. It could predict 500mb heights out two weeks, no sweat.
Re: your comments... My guess is that a gpu isn't suited for use in an operational model due to data access patterns (and possibly not even helpful with the solver). But again, I'm not a computational pde guy. Also, perhaps machine learning would be useful but that would be post-processing or perhaps parameterizing sub-grid phenomenon. There's already a process called model output statistics (MOS) for adjusting raw fields from a weather model.
The physics is pretty well known at this point, and there's only so much you can gain by increasing from second to third order approximation. The errors in the initial conditions are just larger. Most of the action has been on data assimilation and better parameterizations because of that.
I've been out of the field for ten years now, but it's really nice to see improvements to the core physics to this degree.
I'm still skeptical of your supposed two week 500 heights forecast from the ECMWF model. I live near the western Pacific (i.e. the data hole) and it's really easy to find crazy model solutions after 7 days. And I'm pretty sure you weren't looking at the Southern Hemisphere.
> I'm still skeptical of your supposed two week 500 heights forecast from the ECMWF model.
You're probably right to be skeptical, for the record I was only a forecaster for a short period of time over ten years ago... didn't even serve my full four year commitment as I volunteered to get out under the Air Force "force shaping" at the time. I was stationed near Rammstein and we created forecasts for Europe. I was referring to the ECMWF ensemble products, specifically.
I've done computational physics at the grad level. There, PDEs are converted to finite difference which basically leads to giant sparse linear matrices. These are solved using SOR or even more advanced numerical techniques. These techniques tend to be quite GPU friendly.
Well, if you're just doing a standard finite difference method, and you have to keep shuffling your matrices between CPU and GPU because other operations don't work well on GPUs, you actually won't have any speedup.
Where GPUs shine for PDEs is if you have a lot of extra work for each node, for instance if you have complex chemical reactions or thermodynamics, or if you have a high-order method that requires lots of intermediate computations.
If you don't believe me, you can download the PETSc code and test the ViennaCL solvers versus the regular ones.
> A modern 5-day forecast is as accurate as a 1-day forecast was in 1980, and useful forecasts now reach 9 to 10 days into the future (1). Predictions have improved for a wide range of hazardous weather conditions, including hurricanes, blizzards, flash floods, hail, and tornadoes, with skill emerging in predictions of seasonal conditions.
> ... Data from the NOAA National Hurricane Center (NHC) (13) show that forecast errors for tropical storms and hurricanes in the Atlantic basin have fallen rapidly in recent decades.
I don't know where I picked up that particular factoid, but if you look at the trend lines in [1] you can see the improvement claimed.
In [2] there is a slightly different claim, "A modern 5-day forecast is as accurate as a 1-day forecast was in 1980, and useful forecasts now reach 9 to 10 days into the future."
Chart 3.2 of [3] shows this; by 2001 the 5th day forecast improved to be as good as the 3rd day of 1980, establishing the trend line.
Googling about yields a few other studies and articles in a similar vein.
It is important to note that forecast improvement is not linear in effort. It takes more complete and accurate sensor data and far more computation to extend the forecast on the out days due to the chaotic nature of the mechanisms modeled.
This was on here a while ago: https://news.ycombinator.com/item?id=19765700 and says that "[m]odern 72-hour predictions of hurricane tracks are more accurate than 24-hour forecasts were 40 years ago"
edit: I see that neuronexmachina found the same article. It's a good read if you want an overview of how weather prediction has changed.
I have heard the opposite - predictions have been significantly increasing in accuracy over time.
On the other hand, I suspect it might not be noticeable if the forecast you always read just says "40% chance rain, high 80, low 50". It might be more noticable if you look at the hour-by-hour forecast for a specific location and see when the rain is predicted to start and end.
Forecasts have improved dramatically. A 7-day forecast is now somewhat useful when deciding to have your party indoors or outdoors, whereas in the 90s next-day forecasts could hardly compete with the naive assumption of "the weather will stay as it is".
But by mentioning machine learning, I'm guessing you are looking at a different timescale, i. e. "within the last two years". And any progress in the short term will be slow compared to what we have seen in other domains such as image recognition etc.
I'm no expert on weather forecasting, but I believe the explanation may be that forecasts have long been (among the) best financed "big data" problems out there. That means they incorporate lots and lots of domain-specific work. As a result, naive machine learning models currently still lag all the specialised work, which in turn isn't structured in a way to easily take advantage of progress in, say, GPUs.
For hurricane forecasts issued by the NHC, you can see the official error statistics here [0]. Note that 96 and 120-hour forecasts were so poor prior to year 2003 that they were not issued.
Note these error statistics do not represent true model error as the official track and intensity forecast -- while informed by model output -- are determined by human forecasters.
Since then, I spent hours trying to find this page again, unsuccessfully. All I have is this URL for San Jose. (Replacing "sjc" with other airport codes doesn't always work, since "mtr" is a region code?)
The data comes from weather stations all across the US, which I assume are managed or operated by NOAA. This project is at the World Meteorological Organization, an international organization of weather organizations - of which the NOAA is a member. Presumably this means it's official NOAA data.
If I remember correctly the data came from a CD-ROM with historical data. When we put it online the data was already 20 to 25 years old. It was nevertheless the most recent data that we had available, I don't remember the reason why we didn't have anything more recent. The PHP source is (or was) a CakePHP application, and honestly isn't that interesting. There was not more data in the PHP application than what is presented here.
EDIT: It was already a messy application when I got there, I cleaned it up as well as I could, but after I left it seems to have gone downhill again. Ah well. Not my problem anymore.
> Working with other scientists, Lin developed a model to represent how flowing air carries these substances. The new model divided the atmosphere into cells or boxes and used computer code based on the laws of physics to simulate how air and chemical substances move through each cell and around the globe.
> The model paid close attention to conserving energy, mass and momentum in the atmosphere in each box. This precision resulted in dramatic improvements in the accuracy and realism of the atmospheric chemistry.
Wouldn't this kind of problem be a perfect match for machine learning? It would have a huge dataset to learn from. Why isn't it happening or what prevents AI tech from forecasting the weather?
It is because there is an understanding, from first principles, of the dynamics that drive weather (e.g. conservation of mass, momentum and energy). The current models are build upon these principles to make predictions, and conform to expectations of how physics operates. The method that these models are based on (finite volume) is efficient and adaptable if modifications need to be made.
Using AI and ML to make predictions about weather will likely not account for the conservation principles and might lead to ridiculous results (in some sense). Creating an accurate AI/ML model of a complex and chaotic system might lead to wrong predictions under extreme circumstances (e.g. predicting the weather >5 days out for an extreme hurricane) or under conditions where some implicit assumption has changed. One can at-least attempt to grapple with these issues when using finite volume. Under AI/ML you just have to hope your model is properly trained.
I could recommend this paper:
Schneider, Tapio, et al. "Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high‐resolution simulations." Geophysical Research Letters 44.24 (2017).
It is, and the research applying ML in this area is starting to ramp up. For example, last year I worked on a project using ML to identify tornado vortex signatures in Doppler weather radar scans. It also turned out that a couple other groups published similar research at the same time. I would say to expect to be much more growth with ML in meteorology hoping it will all eventuality be applied in the field.
> The retiring version of the model will no longer be used in operations but will continue to run in parallel through September 2019 to provide model users with data access and additional time to compare performance.
As far as I understand, these are still hand designed algorithms using a tiny fraction of possible weather data. Impressive for old school methods. Would be even more awesome to see how far ML could take the state of the art.
Weather data is a system where we have a really good understanding of the underlying physics but can’t do enough computation to simulate them in a way that’s detailed enough to make truly accurate predictions.
Machine learning is all about finding an unknown function that underlies known data. This is sort of the opposite issue: we know the underlying function but can’t compute it.
One fundamental problem we have with weather forecasts is that our input data for the starting point is fairly sparse. GFS calculates the forecast on a grid with 13 km horizontal resolution and 64 vertical layers. We don't have accurate weather information at that resolution from all over the globe, so the starting point for the forecasts is a combination of previous simulations and interpolated observational data.
So even if we had a forecast engine that would perfectly simulate everything given some start state, we wouldn't have enough input data to have an accurate start state.
[0] https://madis.noaa.gov/index.shtml