> If work with not easily deterministic code typically but not always ML models ...

manquer · on April 4, 2022

I haven't heard of any method to test the model apart from statistical analysis of reference /training data.

The model is what gets continually updated and is the critical path that needs coverae, Testing interfaces are trivial and at times not critical to test if already running in production for a while (you probably have already caught most/all issues and know what to test or take care of in a interface rewrite).

It is not about impossible, here is an example, let's say you are working on English speech-to-text model, the next version works better in your set of benchmarks.

It could for example perform very poorly (compared to your previous model) for accented English or mixed with other languages, for older people or in noisy environments like a car, or for for specific subjects like medical/legal dictation and so on and since your benchmarks originally didn't cover these types of scenarios you wouldn't know one way or another.

These were real cases all added to speech-to-text models after user feedback and adequate demand being identified and research effort put in, and now training/benchmark data includes these. There are plenty of scenarios not yet solved (mixing two languages is active area of research) or not included because user feedback didn't capture it, of not yet worth solving.

Neural network testing is hard because by design they have millions(and these days billions) of parameters as inputs and you cannot feasibly test every possible outcome, you will not know what all things to check until people start using your app in ways you never thought off.

NN /ML is not hard requirement this is true for any complex systems. Shazam type fingerprinting for example is just spectrography and Fourier transforms, NN is just newest tool devs use. All complex systems with thousands and above parameters have same problems