
Generalization in Deep Learning [pdf] - stablemap
https://arxiv.org/abs/1710.05468
======
beagle3
Very nice to see actual theoretical progress being made among the hundreds of
“we applied this weird network configuration and it works, we have some
intuition but no real understanding”.

------
sgt101
The Generalization Gap (test/train) seems to me to be quite unsatisfying as a
definition, and then an appeal that theory and practice are different doesn't
make me feel that anything is explained.

Being able to create new regularisation techniques is impressive though...
looks like something that will need a read!

~~~
Eridrus
I haven't read the paper yet, but when people say things like "We don't know
why neural networks work", what they mean is we don't understand why they
generalise so we'll, by most prior theory they are quite over parameterized,
so while they should be able to fit the data - and it has been shown that
typical architectures can fit random data - previous theory said the penalty
for this is that it will not generalize well to new data from the same
distribution.

So generalization bounds on NNs are actually the key thing that people want
from theory.

------
gfredtech
An aside: How do you read research papers and how long does it take you to
read one paper? Are you able to recall a large portion of the paper's content?

~~~
sgt101
read the abstract, read the conclusion, try to work out what the authors are
trying to say.. hunt for the pay off in the paper, see if I can understand
that, look for the results and check to see if I can understand that what they
claim to show supports the pay off then start reading it. The actual reading
can take up to six weeks for a big journal paper.

True.

~~~
seanmcdirmid
Skim, index, find context (e.g. Citations to the work if old enough). Some
papers take a few years to really understand, which is why mental indexing is
important: when you run into a problem solved by the paper, the paper becomes
useful and understandable at the same time.

~~~
sgt101
Agree about the years to understand - the important thing is to read it enough
so that when you come across something you remember the approach and can
option your investment in it.

------
briga
Why are they using MNIST to demonstrate generalization? Isn't it basically a
toy problem at this point?

~~~
Iv
It is a problem that is well-known and easy to solve. Their point is to show
that they offer a boost in terms of training speed.

