Not an expert so take this with a grain of salt; I could be misinterpreting the ...

jboggan · on Dec 1, 2017

This is fascinating because I've been running into something similar with sequence to sequence models translating natural language into Python code. I got better results stopping "early" when the perplexity was still quite high, I thought it was a little crazy.

samgd · on Dec 1, 2017

https://en.wikipedia.org/wiki/Early_stopping

tchalla · on Dec 1, 2017

Fascinating! Could you share more on the details of your experiment?

syrak · on Dec 1, 2017

> So if they were to let the training go on for too long the network will produce an exact copy of the distorted input image.

It won't. The objective (min (E(x, x0) + R(x)) they are trying to optimize (over output images x) amounts to a combination of:

- The output image x should "look like" the input image x0, this is the error term E(x, x0);

- The output image x should be "regular", this is the regularization term R(x).

The latter term prevents overfitting: R should be chosen such that noisy images for example are considered irregular (high R(x)).

flyingspork · on Dec 1, 2017

Ah you're totally right about it not producing an exact copy, they even mention how they used different error functions for the different classes of distortion, my mistake.

But w.r.t. the regularization term, what I thought they meant by "we replace the regularizer R(x) with the implicit prior captured by the neural network" at the end of pg 2 was that they let the natural behavior of a neural network during optimization serve as the regularization, without need for an explicit regularization term. Not entirely sure though.

Mahn · on Dec 1, 2017

> for some reason, the structure of the network means that the estimated output learns realistic features first, and then overfits to the noise afterwords.

Why does this happen? What characteristics does the network structure have that cause this effect?

CardenB · on Dec 1, 2017

I think the reason this paper is so interesting is that no one had any idea why this happens

toblender · on Dec 1, 2017

Wow thanks for the explanation, I would they would just use what you wrote for the abstract.

At first, I thought the machine learning algorithm was able to just bridge the gap in images from nothing...

CSI time: Enhance ... enhance... enhance!