
A Theory of Deep Convolutional Neural Networks for Feature Extraction - sawwit
http://arxiv.org/abs/1512.06293
======
theunixbeard
The abstract is hard to parse for a layman. Anyone care to venture a
translation for the non-ML wonks?

~~~
yablak
Deep convolutional networks have enjoyed great success in image recognition
and understanding in the last 3+ years (see the ImageNet competition winners).
One of the main reasons people believe they are so powerful is due to
translation, rotation, and scale invariance of the learned networks: the top
layer coefficients are very similar (in terms of squared norm) regardless of
the target object's location, size, and rotation in the image.

This paper basically shows that the structure of deep convolutional networks
(specifically: shared convolutional kernels at each layer, subsampling /
pooling across layers) are the main reason behind this behavior. Furthermore,
the relative invariance increases with the depth (number of stacked
convolutional layers).

They show this by first showing that a wide variety of convolutional kernels
and pooling behaviors provide said invariance (the tools they use are from the
world of graduate level calculus + analysis: wavelets and frames); they also
show stability with regards to nonlinear deformation (even if the original
image is swirled around / deformed a bit before shifting, rotating, scaling;
the coefficients still don't change much). That's one major result of this
paper.

Then they show that the invariance property is not tied to the kernels per se:
you don't need the intermediate convolutions to be a wavelet, fourier
transform, or whatever. This is important because neural networks don't use a
fixed convolutional kernel - they learn one on the fly from training data. The
theorem in this paper that shows the invariance is somewhat independent of the
specific coefficients of the kernel and therefore general convolutional neural
networks share these same properties.

That said, the learned kernels of the lowest layers of convolutional neural
networks do tend to look like wavelets / curvelets / shearlets / etc, while
the higher convolutional layers tend to look like higher level features. So
neural networks' kernels are already close enough to the theoretical family of
possible convolutional functions studied in this paper for the theorems to
more or less apply.

Since it's impossible to prove that a Conv NN trained empirically on a dataset
of natural images actually exactly learns wavelets or anything possible to
study theoretically, it's nice to have a paper that says it's not the kernels
themselves that are super important, but that the structure of pretty much any
deep convolutional NN, is the important part.

Finally, the paper extends a neat way to think about convolutional neural
networks (by extendint Mallat's initial work): that the layers represent
functions which are invariant to group operations like rotation and
translation, and stable with respect to deformation. This means you can study
these types of neural networks using group theory (abstract algebra); an area
of math where it's easy to gather insight about how things work and develop
new algorithms that you can translate back to the real world.

~~~
brianchu
CNNs are not rotation invariant.

