
A Probabilistic Theory of Deep Learning - Anon84
http://arxiv.org/abs/1504.00641
======
j2kun
I am well versed in the usual theories of learning (PAC, SQ-learning, learning
with membership and equivalence queries, etc.)

Can anyone comment on how this relates to those standard models? There does
not appear to be any mention in the paper of the standard learning models, and
as a result I'm inclined to think this paper is not worth reading.

~~~
noelwelsh
On a veeeery quick skim I think it's a Bayesian generative model for deep
learning architectures. I thought Zoubin Ghahramani's group already had done
some similar work, but :shrug: it's not my field.

~~~
dustintran
To clarify, it has been studied in Zoubin Ghahramani's group [1] (and also
more recently in Ryan Adam's group [2]), and it's most widely known through
Radford Neal [3] who's won a lot of competitions using the Bayesian approach
to NNs.

[1] [http://mlg.eng.cam.ac.uk](http://mlg.eng.cam.ac.uk)

[2] [http://hips.seas.harvard.edu](http://hips.seas.harvard.edu)

[3] [http://www.cs.utoronto.ca/~radford/res-
neural.html](http://www.cs.utoronto.ca/~radford/res-neural.html)

------
therobot24
56 pages! They should really reorganize this into 10-16 pages to get the basic
ideas and results across.

~~~
jsyedidia
What a strange attitude! There's plenty of zero-cost bits available to allow
for articles of all sorts of lengths. For example, the "Foundations and
Trends" journals (e.g. Foundations and Trends in Machine Learning, or
Foundations and Trends in Optimization) publish articles that are usually at
least 100 pages long. These articles tend to be well-respected and highly
cited.

Longer articles have many advantages in allowing for a more in-depth
explanation, and it is certainly not the case that every reader wants papers
shoe-horned into an artificial page limit.

~~~
therobot24
> There's plenty of zero-cost bits available to allow for articles of all
> sorts of lengths.

Straw man - i wasn't talking about cost, hard drive space, or any relation
near what you're referring to.

> ...articles that are usually at least 100 pages long. These articles tend to
> be well-respected and highly cited.

Another straw man - why does length (short or long) correlate with quality
again?

> Longer articles have many advantages in allowing for a more in-depth
> explanation

Ah the real comment. Ok. I definitely agree - longer usually means more space
to explain.

> and it is certainly not the case that every reader wants papers shoe-horned
> into an artificial page limit.

So be it. There's usually an appendix or supplementary materials that can
offer expanded derivations. Often the authors trim a lot of the fat for the
published paper and put a longer version in a book/thesis.

> What a strange attitude!

As a writer of publications i want more space and agree, but as a reader of
publications (way more than i write) there's just too much out there to spend
my time going through 50+ pages. I can put in the time for 10-20 pages and if
i still want more i'll check out other publications, appendix, supplementary,
thesis...whatever. It's an important and necessary skill for academics to be
able to concisely present their work - not just for publications, but for
grant applications, presentations, etc.

~~~
jsyedidia
If you don't like to read long articles, don't read them. I like to read them;
they tend to be much easier to understand than artificially shortened
articles. It seems to me that complaining that somebody wrote a long article
because they take too long for you to read is exactly analogous to saying that
since you only like to read short stories, nobody should write a long novel
because it would take too much time for you to read.

------
matsiyatzy
Does anyone know if there is any code available for this paper? Would be
extremely interesting to see comparisons of the EM-algorithm mentioned with
regular SGD in terms of speed and precision.

------
mrdrozdov
What is the hypothesis? How was the hypothesis tested?

~~~
disgruntledphd2
From my reading of the first twenty or so pages, it appears to be a theory of
how neural networks model images (the running example). The authors claim that
popular neural network architectures can be reduced to particular cases of
their model.

If this model can provide an explanation for the small noises impacting NN
performance on images (from karpathy.github.io, posted to HN earlier today)
then that would be rocking.

Nonetheless, it does not appear to be an experimental paper, rather providing
a mathematical theory of some particular classification problems.

~~~
Nvn
> If this model can provide an explanation for the small noises impacting NN
> performance on images (from karpathy.github.io, posted to HN earlier today)
> then that would be rocking

This paper [0] does a pretty good job of giving an explanation for that.

[0] [http://arxiv.org/abs/1412.6572](http://arxiv.org/abs/1412.6572)

~~~
disgruntledphd2
Honestly, that abstract makes me more upset. If these are due to NN's nature
as linear classifers, then we are all in trouble, given that almost everything
useful is based off linear models. Given the title of the paper though, I
should probably be more hopeful :)

