Training a machine learning model is not particularly special from a programming perspective. The code is not usually that complicated. Write tests when you can, manually validate when you can't.
Also there are specific techniques for validating that you are model training procedure is directionally correct, such as generating a simulated data set and training your model on that.
Same as you would with your own code. You review it, ask GPT to write tests, and then tweak it.
The difference is that now, you are more of a code reviewer and editor. You don't have to sit there and figure out the library interface and type out every single line.
Tests can prove the presence of the bug, not the absence of them. '100% code coverage' is only 100% in code dimension, while it's usually almost no coverage in data dimension. Generative testing can randomly probe the data dimension, hoping to find some bugs there. But 100% code and data coverage is unrealistic.