I wonder if GenCast's 15 day forecast is really the right indicator for forecasting? I could imagine that such long term ML forecasts tend to get closer to yearly averages, which are kind of "washed out", but of course look good for benchmarking and marketing reasons. But they are not so practical for the majority of weather forecast users. In short, it still smells a bit like AI snake oil to me. [1]
If that was the only thing it was trained on that could be the case but they’re predicting everything and being graded against more than just their success 15 days out. I think that’s just one of the flashier value adds for weather-dependent businesses like wind production that they can have vs traditional models
Although it's great to see these advancements, I would like to see it integrated with the Google Weather results that show up in search and on Android devices before I get excited. Spinning up the model on my own hardware and feeding it data manually is a decent amount of work, and I'm too lazy to do that.
It's a medium-range global model, while Google Weather (which I don't have) is mostly about local short-range weather? But Google Weather is already based on an AI prediction on most cities: https://research.google/blog/metnet-3-a-state-of-the-art-neu...
Google says GenCast forecasts will later be available from them too.
Also ECMWF runs a very similar diffusion model, it's not operational but run a couple of times a day with results available on their graph site (and as data files too I guess): https://charts.ecmwf.int/
I'm not surprised. This is the sort of problem machine learning is really good at solving. There's a lot of quality training data, and the results are governed by physics.
That's not really 100% true. A lot of the data this is trained on is ERA5, which appears to be highly dense in both time and space, but is assimilation data inferred from much sparser observations. I wouldn't say it's inaccurate, but I see pretty large deviations between assimilation datasets and private weather observations (I work on this problem).
The results are governed by physics up to some level, but we can't simulate at that fine a level, so there's some inherent aleatoric uncertainty (i.e. noise). And I would generally say that physics-simulation-ML is not moving as fast as say, inference on images or text. For example, if you see a picture of a car, there's very little inherent uncertainty on what the answer is. If you see the world simulator state there's a lot of uncertainty on what happens next.
That all being said, I think is basically the best model out there , and almost certainly the best open model. This is really the culmination of many years of effort getting data and software in place to run such a large scale training job. Very impressive!
> For example, if you see a picture of a car, there's very little inherent uncertainty on what the answer is. If you see the world simulator state there's a lot of uncertainty on what happens next.
I've been thinking about this a lot. Many ML people work with what is "closed-domain" data -- the data is essentially complete (image, sound, words, or any kind of embeddings) with no unmeasured variables, so the ML algorithm is essentially trying to learn a function that can predict this.
Unfortunately a lot of "open-domain" data has tons of unmeasured variables that are contextual. Suppose I were to try to predict how a full a parking lot would be over the course of a week. You can gather lots and lots of data, but still never get to a near-perfect level of accuracy because the co-variates that drive how full a parking lot is (unexpected influencer effects on the demand, competitive forces that happen to shift one day, power outages in the other part of town, other irreducible randomness = "aleatoric uncertainty" in technical parlance) aren't in the data (or at least not completely).
Fortunately this isn't a problem in real life because many effects cancel each other out, so we are able to arrive at a good-enough aggregate prediction. But "open-domain" ML problems will never achieve the kind of accuracy that "closed-domain" ML can achieve, even with tons of data. Closed domain ML can assume a degree of regularity that open domain ML can never assume.
The point is that the underlying physics of classical models should still hold true as the climate changes - and where they don't hold, we will have causal insight as to why. But this isn't necessarily true for an ML model: as climate changes and develops new patterns, there is no reason to think ML will adapt to it (and a ton of reasons to guess it won't).
Why not? What's so special about 7-14 days? I can see plenty of reasons one might want to predict accurately the weather even a year out, just one example: will the weather support an outdoor bbq event this late in fall next year?
> will the weather support an outdoor bbq event this late in fall next year?
Sure, but there could be a disease outbreak, or a pork meat recall that prevents it anyways. The weather, as a factor, is generally insignificant with respect to loss of life events. To the extent they are we can see those events far enough in advance at 7 to 14 days to compensate correctly.
A better example would be "can we launch a space craft from this launch site next year." Even then, we never pick a launch day, we establish a launch window, because we already know, the weather changes fast enough that it's unlikely to remain identical for several days in a row. So conditions on a single day are effectively meaningless.
The Space Shuttle program got even better with this by feeding in wind data from high atmosphere probes back into the launch vehicle software so it could plan it's maneuvers ahead of the wind so it could reduce vehicle stresses to within tolerable parameters. They went from a 20% launch probability to an 80% launch probability with this system.
I mean.. enjoy your BBQ either way just bring some popup tents.
Farmers definitely want long range weather predictions so they can better plan what crops to plant and avoid crop failures. Crop failures due to weather have caused devastating losses.
Never will? I wouldn't be surprised if predicting climate + weather 12 months out is a simpler problem than most medical problems at which AI is currently being pointed.
> wouldn't be surprised if predicting climate + weather 12 months out is a simpler problem than most medical problems at which AI is currently being pointed
Simple systems can be famously unpredictable [1]. Our bodies manage entropy; that should make them complex but predictable. The weather, on the other hand, has no governors or raison d'être.
The three body problem lacks a closed form solution. How does that mean it's unpredictable, though? I thought that numerical methods can be used to make n-body predictions to arbitrary precision. Are these simulations less accurate than I am thinking? How do engineers and scientists working on space probes plan their trajectories and such?
> numerical methods can be used to make n-body predictions to arbitrary precision
Arbitrary precision, not arbitrary length. Even "from [a] mathematical viewpoint, given an exact initial condition, we can gain mathematically reliable trajectories of chaotic dynamic systems" to only a "finite...interval" [1]. (This is due to "numerical noises, i.e. truncation and round-off error, where truncation error is determined by numerical algorithms and round-off error is due to the limited precision of numerical data, respectively.")
For a physical system like the weather, uncertainty "mainly comes from limited precision of measurement," though there is also the "inherently uncertain/random property of nature, caused by such as thermal fluctuation, wave-particle duality of de Broglie’s wave, and so on."
> Finite interval doesn’t mean it can’t be arbitrary
Skim the paper. Numerical noise means you cannot calculate the 3-body problem to an arbitrary length. There is a finite, mathematical limit even with perfect knowledge of initial conditions.
Isn’t the paper about the uncertainties that inherently exist with physical systems?
There isn’t any claim that mathematically exact starting values can’t be propagated with arbitrary precision to arbitrary length, and I would claim that this is possible (but not practical due to compute being limited, of course).
But there’s no hard limit of precision and length where a simulation can’t be made if the starting conditions are exact. The point of the paper is that starting conditions are never exact which limits the length you can propagate.
> Isn’t the paper about the uncertainties that inherently exist with physical systems?
It talks about that. Which is relevant when we're talking about the weather. But it opens by discussing the hard mathematical limits to numerical methods.
> there’s no hard limit of precision and length where a simulation can’t be made if the starting conditions are exact
Wrong.
Read. The. Paper. Numerical methods for chaotic systems are inherently, mathematically uncertain.
Beyond a certain number of steps, adding precision doesn't yield a more precise answer, it just produces a different one. At a certain point, the difference between the different answers you get with more precision covers the entire solution space.
You can solve to arbitrary precision but you can't measure and specify initial conditions to arbitrary precision, making the solution wrong outside of a small time interval.
Predicting that next December will be cold, sure. Predicting how cold or how much rainfall or snowfall there will be in next December would be difficult, but you could get in the right ballpark. Predicting in which week it will snow a year from now? Not a chance.
This is still on retrospective data. The machine learning graveyard is filled with models that worked well on retrospective data, but did not hold up in a live inference setting. Just ask Zillow. The real test is whether they can predict the weather 14 days out in 2025.
I am guessing they did not want to set up the data pipeline to run inference in a live setting. But that is what I would need to see to be a true believer.
ECMWF runs many such models at their site, a run two or four times per day, and they have verification statistics too, no need to doubt the accuracy.
The Google model is probably the best so far but ECMWF's own diffusion model was already on par with ENS and many point-forecast models (graph transformers, not diffusion) outperform state-of-the-art physical models.
What is missing is initialization directly from observations. All the best-performing models initialize from ERA5 or other reconstruction.
> One caveat is that GenCast tested itself against an older version of ENS, which now operates at a higher resolution. The peer-reviewed research compares GenCast predictions to ENS forecasts for 2019, seeing how close each model got to real-world conditions that year.
And GenCast was tested against and older model which performs worse.
> The ENS system has improved significantly since 2019, according to ECMWF machine learning coordinator Matt Chantry. That makes it difficult to say how well GenCast might perform against ENS today.
And the testing makes it "difficult to say." The obvious conclusion is "run a new set of tests" but they'd rather pay of the verge to publish half truths instead.
[1] more about this: https://press.princeton.edu/books/hardcover/9780691249131/ai...
reply