Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not an expert so take this with a grain of salt; I could be misinterpreting the paper.

It seems that the current accepted method is to train a network with distorted images as the input and the correct undistorted images as the targets. Then after training you can feed a new distorted image into the trained network and get the estimated "fixed" image.

However this team actually uses the distorted image as both the input and the target to the net. So if they were to let the training go on for too long the network will produce an exact copy of the distorted input image. But for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords. So if you stop the training early, you get an image that incorporates realistic features from the distorted image, but hasn't had time to "learn" the noisy features.



This is fascinating because I've been running into something similar with sequence to sequence models translating natural language into Python code. I got better results stopping "early" when the perplexity was still quite high, I thought it was a little crazy.



Fascinating! Could you share more on the details of your experiment?


> So if they were to let the training go on for too long the network will produce an exact copy of the distorted input image.

It won't. The objective (min (E(x, x0) + R(x)) they are trying to optimize (over output images x) amounts to a combination of:

- The output image x should "look like" the input image x0, this is the error term E(x, x0);

- The output image x should be "regular", this is the regularization term R(x).

The latter term prevents overfitting: R should be chosen such that noisy images for example are considered irregular (high R(x)).


Ah you're totally right about it not producing an exact copy, they even mention how they used different error functions for the different classes of distortion, my mistake.

But w.r.t. the regularization term, what I thought they meant by "we replace the regularizer R(x) with the implicit prior captured by the neural network" at the end of pg 2 was that they let the natural behavior of a neural network during optimization serve as the regularization, without need for an explicit regularization term. Not entirely sure though.


> for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords.

Why does this happen? What characteristics does the network structure have that cause this effect?


I think the reason this paper is so interesting is that no one had any idea why this happens


Wow thanks for the explanation, I would they would just use what you wrote for the abstract.

At first, I thought the machine learning algorithm was able to just bridge the gap in images from nothing...

CSI time: Enhance ... enhance... enhance!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: