Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not exactly. Those overparametrized models first overfit to the training data in the classical sense, and generalize worse to unseen data, and then, if we keep training past the interpolation threshold, the models start generalizing better. That's why the phenomenon is being called double descent. Injecting noise via dropout or mixup obscures this learning dynamic.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: