Although it's obviously difficult to crack open an ML model, they do perform eno...

mjburgess · 2024-08-02T13:07:24 1722604044

There really isn't anything to crack open. The models are curves fit to data, the units of the weights are whatever the units of the data are... so, eg., if fit to temp data, then temp.

If you draw a line through measurement data of one kind, you arent getting a function of another: a function is just a map within this space.

Why it should be that drawing a line around shadows is a good prediction for future shadows isn't very mysterious -- no more and no less regardless of the complexity of the object. There isn't anything in the model here which explains why this process works: it works because the light casts shadows in the future the same way it does in the past. If the objects changed, or the light, the whole thing falls over.

Likewise, "generalization" as used in the ML literature is pretty meaningless. It has never hitherto been important that a model 'generalizes' to the same distribution. In science it would be regarded as ridiculous that it could even fail to.

The science sense of generalisation was concerned with whether the model generalizes across scenarios where the relevant essential properties of the target system generated novel distributions in the measure domain. Ie., the purpose of generalization was explanation -- not some weird BS about models remembering data. It's a given that we can always replay-with-variation some measurement data. The point is to learn the DGP>

No explanatory model can "remember" data, since if it could, it would be unfalsifable. Ie., any model build from fitting to historical cases can never fail to model the data, and hence can never express a theory about its generation.

partitioned · 2024-08-02T13:57:26 1722607046

Please see my above comment in response to this as well.

Kon-Peki · 2024-08-02T13:37:52 1722605872

> weather models missed the rapid intensification of hurricane Otis last year

Which happened because there was very little data to feed into the models. AI isn't going to help with this. The Atlantic Ocean and Gulf of Mexico have tons of data-collecting bouys and the Hurricane Hunter aircraft fly from the eastern US. Hurricane Hunters that go to the Pacific fly out of Mississippi, which adds quite a lot of latency to the data collection probes.

We should be adding more bouys to the Pacific, and need to add a Hurricane Hunter crew in San Diego (or perhaps the government of Mexico would like to host and pay for them).

Then we can start seeing what the models and AI will do.

Majromax · 2024-08-02T14:34:45 1722609285

I'm not up to date on the latest literature re: the Otis miss. Is the conventional thought that the ocean was in fact warmer than the models supposed, either at the surface or with a warmer upper-mixed layer?

If the problem was lack of constraint from data, this is still fixable in a probabilistic sense: we'd "just" (noting it's not that simple) need to assume more variability in ocean conditions in data-poor regions.