
New Theory Cracks Open the Black Box of Deep Learning - smokielad
https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/
======
iolsixuutic
Interesting to read this. I was a little confused at first because the
information bottleneck paper wasn't new, and then as I read I realized they
acknowledged that. It's interesting to see followup and new research coming
out about it, because it struck me as promising when I first read about it.

The information bottleneck idea is very similar, basically the same, as how
I've always thought about DL models, and statistical models more generally.
The hidden variables at each layer are basically digitized codes, and there's
a compression at each layer, which is equivalent to learning/inference as in
an algorithmic complexity/MDL sense.

What was surprising to me was the relationships with renormalization groups,
which I wasn't familiar with at all.

The quote from Lake was also interesting. I forgot about that Bayesian Program
Learning paper, which was interesting. My guess is BPL and DL are not really
all that different at some level.

------
daralthus
> certain very large deep neural networks don’t seem to need a drawn-out
> compression phase in order to generalize well. Instead, researchers program
> in something called early stopping, which cuts training short to prevent the
> network from encoding too many correlations in the first place.

This would explain the success of attention mechanisms quite well. If you are
good in selecting the right input you don't need to discard a lot later.

------
randcraw
Interesting premise (the info bottleneck) and a well written (long) article.

I particularly enjoy attempts to understand basic black box mechanisms in the
space of learning or cognition.

