Not an expert so take this with a grain of salt; I could be misinterpreting the paper.
It seems that the current accepted method is to train a network with distorted images as the input and the correct undistorted images as the targets. Then after training you can feed a new distorted image into the trained network and get the estimated "fixed" image.
However this team actually uses the distorted image as both the input and the target to the net. So if they were to let the training go on for too long the network will produce an exact copy of the distorted input image. But for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords. So if you stop the training early, you get an image that incorporates realistic features from the distorted image, but hasn't had time to "learn" the noisy features.
This is fascinating because I've been running into something similar with sequence to sequence models translating natural language into Python code. I got better results stopping "early" when the perplexity was still quite high, I thought it was a little crazy.
Ah you're totally right about it not producing an exact copy, they even mention how they used different error functions for the different classes of distortion, my mistake.
But w.r.t. the regularization term, what I thought they meant by
"we replace the regularizer R(x) with the implicit prior captured by the neural network"
at the end of pg 2 was that they let the natural behavior of a neural network during optimization serve as the regularization, without need for an explicit regularization term. Not entirely sure though.
> for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords.
Why does this happen? What characteristics does the network structure have that cause this effect?
It seems that the current accepted method is to train a network with distorted images as the input and the correct undistorted images as the targets. Then after training you can feed a new distorted image into the trained network and get the estimated "fixed" image.
However this team actually uses the distorted image as both the input and the target to the net. So if they were to let the training go on for too long the network will produce an exact copy of the distorted input image. But for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords. So if you stop the training early, you get an image that incorporates realistic features from the distorted image, but hasn't had time to "learn" the noisy features.