Although it's obviously difficult to crack open an ML model, they do perform enough computation to have potentially learned something like the dynamical equations for the atmosphere.
At the same time, some ML models are surprisingly parsimonious. Graphcast has about 37 million trainable parameters, but its output is a forecast (increment) of six variables on a 37-level, quarter-degree lat/lon grid. That's about 235 million outputs for a single forecast date, so it's safe to conclude that Graphcast cannot memorize its training set.
Researchers are also explicitly interested in probing the out-of-sample behaviour of ML models for plausibility. A paper last year by Hakim and Masanam (https://arxiv.org/abs/2309.10867) put Pangu-Weather through some simplified but out-of-sample test cases and saw physically plausible outputs, so the ML models have at least not fallen at the first hurdle.
Meanwhile, it's also not quite correct to give traditional models an automatic pass for out-of-sample behaviour. The large-scale dyanmical equations of the atmosphere are well-understood, but so much of the chaos comes from poorly-resolved, poorly-modeled, or poorly-constrained processes near and below the grid scale. The microstructure of clouds, for example, is completely invisible to models that must necessarily run at kilometer or tens-of-kilometer scales. Operational weather models rely on parameterizations to close the system and statistical correlations to assimilate observational data.
As far as I'm aware, all of the operational weather models missed the rapid intensification of hurricane Otis last year, an out-of-sample event with deadly consequences.
There really isn't anything to crack open. The models are curves fit to data, the units of the weights are whatever the units of the data are... so, eg., if fit to temp data, then temp.
If you draw a line through measurement data of one kind, you arent getting a function of another: a function is just a map within this space.
Why it should be that drawing a line around shadows is a good prediction for future shadows isn't very mysterious -- no more and no less regardless of the complexity of the object. There isn't anything in the model here which explains why this process works: it works because the light casts shadows in the future the same way it does in the past. If the objects changed, or the light, the whole thing falls over.
Likewise, "generalization" as used in the ML literature is pretty meaningless. It has never hitherto been important that a model 'generalizes' to the same distribution. In science it would be regarded as ridiculous that it could even fail to.
The science sense of generalisation was concerned with whether the model generalizes across scenarios where the relevant essential properties of the target system generated novel distributions in the measure domain. Ie., the purpose of generalization was explanation -- not some weird BS about models remembering data. It's a given that we can always replay-with-variation some measurement data. The point is to learn the DGP>
No explanatory model can "remember" data, since if it could, it would be unfalsifable. Ie., any model build from fitting to historical cases can never fail to model the data, and hence can never express a theory about its generation.
> weather models missed the rapid intensification of hurricane Otis last year
Which happened because there was very little data to feed into the models. AI isn't going to help with this. The Atlantic Ocean and Gulf of Mexico have tons of data-collecting bouys and the Hurricane Hunter aircraft fly from the eastern US. Hurricane Hunters that go to the Pacific fly out of Mississippi, which adds quite a lot of latency to the data collection probes.
We should be adding more bouys to the Pacific, and need to add a Hurricane Hunter crew in San Diego (or perhaps the government of Mexico would like to host and pay for them).
Then we can start seeing what the models and AI will do.
I'm not up to date on the latest literature re: the Otis miss. Is the conventional thought that the ocean was in fact warmer than the models supposed, either at the surface or with a warmer upper-mixed layer?
If the problem was lack of constraint from data, this is still fixable in a probabilistic sense: we'd "just" (noting it's not that simple) need to assume more variability in ocean conditions in data-poor regions.
At the same time, some ML models are surprisingly parsimonious. Graphcast has about 37 million trainable parameters, but its output is a forecast (increment) of six variables on a 37-level, quarter-degree lat/lon grid. That's about 235 million outputs for a single forecast date, so it's safe to conclude that Graphcast cannot memorize its training set.
Researchers are also explicitly interested in probing the out-of-sample behaviour of ML models for plausibility. A paper last year by Hakim and Masanam (https://arxiv.org/abs/2309.10867) put Pangu-Weather through some simplified but out-of-sample test cases and saw physically plausible outputs, so the ML models have at least not fallen at the first hurdle.
Meanwhile, it's also not quite correct to give traditional models an automatic pass for out-of-sample behaviour. The large-scale dyanmical equations of the atmosphere are well-understood, but so much of the chaos comes from poorly-resolved, poorly-modeled, or poorly-constrained processes near and below the grid scale. The microstructure of clouds, for example, is completely invisible to models that must necessarily run at kilometer or tens-of-kilometer scales. Operational weather models rely on parameterizations to close the system and statistical correlations to assimilate observational data.
As far as I'm aware, all of the operational weather models missed the rapid intensification of hurricane Otis last year, an out-of-sample event with deadly consequences.