

Renormalization: A Common Logic to Seeing Cats and Cosmos - patocka
http://www.quantamagazine.org/20141204-a-common-logic-to-seeing-cats-and-cosmos/

======
thisisdave
This could be an important result, since there's a _lot_ of theory and
expertise regarding renormalization group methods, but I think it's too early
to tell.

The write-up is a bit misleading, though: the model in their preprint[1] is
about stacks of restricted Boltzmann machines (RBMs), and are very different
from the other examples of deep learning mentioned. The Google Cat Detector
model, for example, didn't describe a probability distribution over images,
which is the kind of task that the preprint is about. And in almost all of the
recent cases where deep neural networks have made substantial progress over
the previous state of the art, the models have not been probabilistic, or
trained layer-by-layer, or unsupervised, like the RBM-based approach in the
preprint.

I don't speak physics, but my reading of the paper says that could be
summarized pretty accurately as follows:

1\. Hinton et al. (2006) showed that each layer in a stack of RBMs improves a
variational lower bound on p(x).

2\. Variational methods for RG also iteratively improve a variational lower
bound on p(x).

3\. The two methods would thus be equivalent, if we could fit them without
error (which we can't).

4\. Here’s a figure from a stack of RBMs that vaguely looks like RG results
(not shown)

I don't see any comparison between the approximations that physicists normally
use versus the contrastive divergence for training RBM-based networks, or
evidence that their results are more similar in practice than any other
technique.

Am I missing something?

[1] [http://arxiv.org/abs/1410.3831](http://arxiv.org/abs/1410.3831) [2]
[https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf](https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf)

------
digital55
This is the paper in question:
[http://arxiv.org/abs/1410.3831](http://arxiv.org/abs/1410.3831)

------
MrQuincle
Very interesting. I was working on renormalization theory in the context of
self-organized criticality - [https://dobots.nl/2012/02/27/the-
renormalization-group/](https://dobots.nl/2012/02/27/the-renormalization-
group/) and swarm robotics.

I didn't think of it to apply to deep networks, but it makes a lot of sense.

What is not so likely in the brain is that these topographically mappings are
the only ones. There are probably some mix-and-match schemes that allow
integration of information on longer distances on scales different than that
of the most coarse layer. Just my two cents.

Oh yeah, and although physicists like fractal-like structures (the fixed point
in the renormalization flow can be seen as having the same mapping between
every two subsequent scales of granularity), people with real-life machine
learning problems probably would like to study more temporal and dynamic
aspects. When time gets introduced, things become complicated.

------
lutorm
So about these deep neural networks: I seem to remember from some introductory
AI class many years ago that a multilayer neural network can't do anything a
single-layer one can't. So what's the deal here then? Is it because the
algorithms for learning can't be reduced to an equivalent single-layer
network?

~~~
Houshalter
There isn't anything a large number of if-then statements can't do either. Why
do programmers use nested if-then statements or states?

~~~
cma
For some GPUS, all code runs, no nested its, just flags to turn unused paths
into no-ops. All I'm trying to say is even in your example, the optimal thing
we could come up with depended on the problem at hand (graphics vs general
purpose). Newer GPUs are different, but still, the ridicule of his question
isn't appropriate, reasoning by analogy is error prone, and the real answer to
his question is probably a lot more rigorous.

~~~
Houshalter
This is because GPUs are parallel and can't do nested ifs, not because
representing code that way is efficient.

Even GPUs do sequential operations. You could never program the vast majority
of algorithms like "if the input is exactly 100011110101... then output
100011001...".

If that seems silly, that is exactly how the proof that single layer NNs are
perfectly general works. It proves that you can represent any series of if
then statements like that in an NN. And people who don't understand the proof
are mislead into thinking single layer NNs are just as good as deep NNs.

------
trhway
our brain came up with renormalization in physics probably because that is how
our brain deals with complexity. Deep neural networks were built following the
way we know our visual cortex and recognition works. The same brain, the same
method. Not surprisingly. What would be surprising is if it happened to be the
best/optimal/etc method :) as this method is a subject to biological
constraints where is computers (or whatever another "thinking entity" would
appear) don't (willn't) have them.

