
Towards universal neural nets: Gibbs machines and ACE - vonnik
http://arxiv.org/abs/1508.06585
======
murbard2
False positives do happen, but the style of the article raises some "crank"
red flags.

\- Doesn't get to the point until about halfway through

\- Repeatedly mentions Einstein

\- Appeals to quantum mechanics out of nowhere

The style is particularly hard to parse (or I'm particularly dense, but I am
generally comfortable reading papers on variational inference, neural
networks, etc). At the same time, a lot of it rings true-ish... Does anyone
get where they're going with this?

~~~
kastnerkyle
Just to be abundantly clear - the ladder network [1] outscores this by a
moderate margin, and is also _actually_ SOTA without data augmentation. In my
mind at least, adding data rotations in the latent space is still different
than a fully connected model without data augmentation.

I could do with a few more recent citations on generative modeling. It seems
the author isn't 100% aware of some of the most recent generative modeling
work.

That said, the ideas presented are interesting and seem complimentary to lots
of existing approaches - I will be looking into this paper further.

[1] [http://arxiv.org/abs/1507.02672](http://arxiv.org/abs/1507.02672)

~~~
themann9
Guy, the only thing abundantly clear is that you are full of dung. Your
shameless self-promotion may piss-off the police chief here - murbard2, so be
more careful. Ten digits better classified, out of 10000? With a structure
more complicated than a human DNA vs two lines of code? Congratulations! Or,
and how are your "stairways-to-heaven" even remotely universal? Show us
something these networks have generated like the VAE or Gibbs/ACE papers?
Perhaps you can show some density estimation results as in
[http://arxiv.org/abs/1502.04623](http://arxiv.org/abs/1502.04623) or
[http://arxiv.org/abs/1508.06585](http://arxiv.org/abs/1508.06585)? Oja and
Hyvarinen are great guys and have left their names in the pantheon of neural
nets. But it is time for you and the other 12 people who live there, to shake
off the legacy of ICA and the obsession with orthogonality: Andrew Ng and
company showed years ago that it is not needed and is in fact detrimental
[http://ai.stanford.edu/~quocle/LeKarpenkoNgiamNg.pdf](http://ai.stanford.edu/~quocle/LeKarpenkoNgiamNg.pdf)
. Read it! Also, too much dung smells bad in the arctic summer, murbard2 here
prefers comics and won't be reading your installments of 25+ page spaghetti
any time soon. At least Oja and Hyvarinen know how to write.

------
jostmey
I just briefly skimmed through parts of it.

If I understand the paper correctly, the objective function is a combination
of a classifier and a generative model. It sounds a lot like pre-training a
neural network as a generative model before fine tuning it as a classifier,
except this time the two steps are jammed together. I'm not sure what the
benefit would be...

~~~
themann9
The benefit seems to be shown on the right chart of Fig 8 (bottom line is best
descriptor of probability density). Combining classifiers and generative nets
is the natural next step, LeCun is working on in in the context of ConvNets -
see their SWWAE paper a few weeks ago

~~~
deepnet
Stacked What-Where Auto-encoders _Junbo Zhao, Michael Mathieu, Ross Goroshin,
Yann Lecun_

[http://arxiv.org/abs/1506.02351](http://arxiv.org/abs/1506.02351)

------
themann9
Shooting first and asking question later ain't gonna make u no friends in
Compton, murbard2. Wouldn't hold my breath to hear any answers...

~~~
murbard2
Like I said, there are false positives, but when I hear "quantum mechanics" in
a non quantum mechanical context, I remove the safety from my gun...

~~~
themann9
Modern stochastic/time series analysis has borrowed most of its formalism from
quantum mechanics. It is the ignorance and the bigotry that has held neural
nets back 30 years, until GPUs came to the rescue. Having gazillions of
parameters, which still nobody understands, did not help...

~~~
murbard2
What insight do quantum mechanics bring into this paper? What exactly do you
gain by calling a normal distribution: the equilibrium distribution of the
imaginary time Schroedinger equation?

It's quite possible that I'm a fool, unable to see the beauty of the argument
and the depth of the parallels being drawn, but to me it sounds like
[http://www.smbc-comics.com/?id=1957](http://www.smbc-comics.com/?id=1957)

~~~
themann9
reading is not your thing, ah? It is too condensed and have not finished it
but sections 1.5 and 2.3 are very explicit about it...

~~~
murbard2
Maybe it isn't, but I have found the literature on the topic to be very clear
in general. I do not find section 1.5 and 2.3 explicit at all.

Section 2.3 reads like a redefinition of the exponential family and its link
to the maximum entropy principle. I think the idea is to optimize the
sufficient statistics, but it's not clear.

Section 1.5... well you talk about two dimensional translational symmetries,
and then make a link with position and momentum, but where is that coming
from? This seems merely like an artifact of looking at two dimensional
translational symmetries, and not at the general case. Again, what does the
quantum analogy bring to the table? I think - though it's not clear - that
what you're suggesting is drawing the latent noise following a distribution
which is reflects the expected symmetry in the manifold.

So what is the takeout? Are you attempting to parametrize max-entropy
distribution using the symmetries as constraints?

~~~
themann9
Not me dude but I like underdogs and outsiders. What u saying sounds right:
symmetries, like the ones computed in section 3.8 are used as constraints
a.k.a. "quantum numbers" describing the states in the latent layer. That is
vintage quantum mechanics. The part is theoretical and not in the theano code
by looks of it, so harder to comment on... The quantum analogy also brings
along the laplacian form of the conditional density in section 1.5, so there u
go. On your comment, which "literature on the topic" is clear? Have u read the
original VAE paper
[http://arxiv.org/pdf/1401.4082.pdf](http://arxiv.org/pdf/1401.4082.pdf). 95%
of paper and most of 21 equations are about gradients, and gaussian too, they
even invent a new term: stochastic back-propagation, why, who needs that? All
the math is done in the Gibbs/ACE paper in 4 lines, just compute your cost
bound and back-propagate? Having seen this earlier would have saved me a lot
of pondering. Pythagorean theorem does sound too simple, I give u that...

