Not exactly. Those overparametrized models first overfit to the training data in...

Not exactly. Those overparametrized models first overfit to the training data in the classical sense, and generalize worse to unseen data, and then, if we keep training past the interpolation threshold, the models start generalizing better. That's why the phenomenon is being called double descent. Injecting noise via dropout or mixup obscures this learning dynamic.