
Deep stuff about deep learning? - ot
https://blogs.princeton.edu/imabandit/2015/03/20/deep-stuff-about-deep-learning/?utm_source=twitterfeed&utm_medium=twitter
======
mellavora
You ask why dropouts help avoid overfitting. I'm immediately thinking of
Bowling's poker algorithm.
[http://www.sciencemag.org/content/347/6218/145](http://www.sciencemag.org/content/347/6218/145)
(sorry, paywall, and today I'm outside it-- anyone want to Arron Swartz it for
the rest of us). Bowling's secret sauce is that he lets his algorithm re-visit
earlier optimizations. You can make a choice early in the game, which is
optimal then, but the same pattern is not optimal later.

Back to deep learning. You are right about the non-convexity of the space,
especially the weird interactions between the choice of which W_k to include
and the shape of the manifold.

The guess I am exploring here is that continual re-optimizing against a
different manifold allows you to erase decisions which are only optimal on
certain manifolds, and that this is somehow more robust than simulated
annealing.

But I'm only guessing

