While this is indeed clickbait as mentioned by others - I am consistently shocke...

andy99 · on May 3, 2021

The point of the article is more that even if all of your testing and validation is rigorous and the performance looks great, trivial changes in the production data can break your model anyway.

My view is that all high value production models should include out of distribution detection, uncertainty quantification, and other model specific safeguards (like self-consistency) to confirm that the model is only being used to predict on data it is competent to handle.

Der_Einzige · on May 3, 2021

All that is only needed because incremental learning algorithms don't really work all that well. It's a dirty secret in the field that we still don't have good answers for catastrophic forgetting in neural networks (the best candidate incremental learner as of right now), and the other alternatives are far worse.

fighterpilot · on May 3, 2021

This is good to have but it doesn't really address the problem of predictive accuracy in the presence of nonstationarity. The safeguards just help us switch off the model at the right time. We're still stuck with no capability in the new environment.

andy99 · on May 4, 2021

I think knowing what you don't know is still a pretty big win. It can help people trust the models in cases they do work, and it can serve as a diagnostic for why it fails in certain circumstances.

eyegor · on May 3, 2021

This sounds an awful lot like guassian processes which are fairly common in research environments. I don't know how common it is to deploy guassian processes in the real world, but I see published papers integrating them into other models all the time. The gist is that instead of input -> prediction you get input -> prediction + sigma (every prediction is given as a guassian distribution).

andy99 · on May 3, 2021

For common neural network models, the output probability has no meaning for out of distribution inputs, so you need to do an ensemble or some other method to get at the actual model confidence. I don't know enough about Gaussian processes to know if they have any limitations like that.

But it's an interesting point, e.g. if a CNN works better on in-distribution data, but a Gaussian process is better at providing a confidence estimate for OOD points (if it does), a hybrid model is possible.

erichahn · on May 3, 2021

No, this does not solve the problem that he describes in the article. You can have a great crossvalidation score and still struggle on unseen data if the data is relatively dissimilar from your train set. Like X-Ray scans produced from a different machine. There are numerous other examples. CNNs on images for example are famously known to disintegrate on images + white noise (which look the same to a human).

Rexxar · on May 3, 2021

For the last example, could they train on "images + white noise" instead ?

fighterpilot · on May 3, 2021

I think cross validation is a powerful tool but not always necessary and is prone to abuse, such as overfittimg to the test set.

A skilled modeller can reduce variance using domain specific tricks that are more powerful at variance reduction than cross validation. But still cross val is usually good to use as well.