
Geometric Understanding of Deep Learning (2018) - yoquan
https://arxiv.org/abs/1805.10451
======
cs702
Wow. As far as I know, this is the first time anyone reputable[a] has claimed
to _show_ (!) that the "manifold hypothesis" is the fundamental principle that
makes deep learning work, as has long been believed:

    
    
      "In this work, we give a geometric view to
       understand deep learning: we show that the
       fundamental principle attributing to the
       success is the manifold structure in data,
       namely natural high dimensional data
       concentrates close to a low-dimensional
       manifold, deep learning learns the manifold
       and the probability distribution on it."
    

Moreover, the authors also claim to have come up with a way of measuring how
hard it is for any deep neural net (of fixed size) to learn a parametric
representation of a particular lower-dimensional manifold embedded in some
higher-dimensional space:

    
    
      "We further introduce the concepts of rectified
       linear complexity for deep neural network
       measuring its learning capability, rectified
       linear complexity of an embedding manifold
       describing the difficulty to be learned. Then
       we show for any deep neural network with fixed
       architecture, there exists a manifold that
       cannot be learned by the network."
    

Finally, the authors also propose a novel way to control the probability
distribution in the latent space. I'm curious to see how their method compares
and relates to recent work, e.g., with discrete and continuous normalizing
flows:

    
    
      "...we propose to apply optimal mass
       transportation theory to control the
       probability distribution in the latent space."
    

This is _not_ going to be a light read...

\--

[a] One of the authors, Shing-Tung Yau, is a Fields medalist:
[https://news.ycombinator.com/item?id=18987219](https://news.ycombinator.com/item?id=18987219)

~~~
siavosh
I've been out of academia for a long time, but isn't the notion that 'natural'
high dimensional data lies on low dimensional manifolds the basic premise of
all machine learning techniques?

~~~
cs702
Yes, of course, this has long been a widely held assumption. That's why we
call it the "manifold _hypothesis_."

As far as I know, no one has been able to _show_ \-- with mathematical rigor
-- that this is _why_ deep learning works so well on so many challenging
perceptual ("cognitive") tasks.

 _That_ seems significant to me.

~~~
improbable22
But have they actually shown any such thing? There are some ambitious words,
then some very elementary definitions, and some "theorems"... and then it
ends?

------
lovelearning
[An OT question as somebody not familiar with the academic world] Two of the
authors are in a Chinese university. Two of them in different departments in a
US university. In general, how does this kind of intercontinental
collaboration start, and how do they progress? How are roles defined when
multiple people are involved in a theoretical paper like this? Are there some
tools that help with collaborative paper writing?

~~~
auntienomen
Yau is the connection. He's a professor at Harvard, but he's also been a major
force in the development of Chinese mathematical academic institutions. He's
the director of several of them.

~~~
randcraw
And Gu got his PhD in CS at Harvard in computational geometry. Yau teaches
differential geometry there.

------
jesuslop
Possibly less sophisticatedly I think of them as a sandwich of affine maps and
nonlinear isotropies (as those giving irregular rings in tree trunks). The
affinities are represented nicely in GL(n+1) with a homogenous coordinates
trick related to neuron biases. A question would be if there's something
interesting to say about the interactions of the affinities and isotropies in
group theoretic terms (which I dunno).

~~~
MAXPOOL
ReLU is very simple in this regard. In the plain form it's just affine
transformation followed by 'a viewport'. The mapping trough multiple layers is
alternating affine transformations and windows into data. Learning is
combination of squeezing and rotating data so that it can be seen or unseen
from the window and rotating window frames to do the same.

Any extra stuff, like batch normalization between the layers can again
introduce more complex nonlinearity.

~~~
jesuslop
Really agree with the extra simplicity in the ReLU case. Liwen Zhang, Gregory
Naitzat and Lek-Heng Lim showed last year that "the family of such neural
networks is equivalent to the family of tropical rational maps", where
rational functions are quotient of polynomials and "tropical" is in the
context of Tropical Geometry, where instead of the typical "plus, times" ring
one uses a "min, plus" ring that has somewhat unexpected applications. For
instance, with its module theory one can calculate minimum costs paths in
graphs just as one does for reachability starting with the adjacency matrix of
a graph over a boolean semiring. arXiv:1805.07091v1

------
yoquan
I'm reading this and just realized one author is the famous Field medalist,
Shing-Tung Yau :-)

~~~
soVeryTired
Sorta like when David Mumford got interested in computer vision, I guess.

~~~
jesuslop
you' kiddin?

~~~
soVeryTired
Nope:
[http://www.dam.brown.edu/people/mumford/vision/introvision.h...](http://www.dam.brown.edu/people/mumford/vision/introvision.html)

Tim Gowers is interested in automated theorem proving too!

~~~
jesuslop
Surprising. Voeovdski also.

~~~
jesuslop
Also positioned as not anti proof assistants I meant.

------
throwawaymath
_> ...we show that the fundamental principle attributing to the success is the
manifold structure in data...

> Then we show for any deep neural network with fixed architecture, there
> exists a manifold that cannot be learned by the network._

I'd venture a guess that you can extend this result to show that, for any deep
neural network with fixed architecture, there exists an adversarial manifold
it must be vulnerable to.

In other words not only is there a manifold the neural network _cannot_ learn,
but there is also a manifold it _will_ learn, but incorrectly.

------
crimsonalucard
Anybody know what I should study in order to understand this research paper?

~~~
dragqueen
Looks like you need to know a tiny little bit about manifolds, measure
theory/probability theory, topology and more importantly have the requisite
"math maturity".

For very easy intro to manifolds and measure you could probably take a look
into [0] A Visual Introduction to Differential Forms and Calculus on Manifolds
by Fortney and [1] The Lebesgue Integral for Undergraduates by Johnston.

[0] [https://www.amazon.com/Visual-Introduction-Differential-
Calc...](https://www.amazon.com/Visual-Introduction-Differential-Calculus-
Manifolds/dp/3319969919/ref=sr_1_1?s=books&ie=UTF8&qid=1548353414&sr=1-1&keywords=fortney+manifolds)

[1] [https://www.amazon.com/Lebesgue-Integral-Undergraduates-
MAA-...](https://www.amazon.com/Lebesgue-Integral-Undergraduates-MAA-
Textbooks/dp/1939512077/ref=sr_1_6?s=books&ie=UTF8&qid=1548353478&sr=1-6&keywords=lebesgue+integration)

~~~
mlevental
yea despite the purported firepower of the authors this is not a dense paper

~~~
AnimalMuppet
Maybe that shows most clearly the firepower of the authors...

~~~
improbable22
I'd reject it. I'm honestly struggling to find anything at all to chew on
here...

------
ttflee
Another interesting paper on optimal transportation and GAN:
[https://arxiv.org/abs/1710.05488](https://arxiv.org/abs/1710.05488)

------
twic
I know next to nothing about deep learning. But this geometric interpretation
really reminds of of the way self-organising maps work. Is there a real
connection there, or is that superficial?

~~~
FakeComments
I may be wrong, having just heard of self-organizing maps...

But they _do_ seem related, in that it’s arguing the data can always be
(mostly) accurately represented by a map projecting onto a lower-dimension
manifold. The recent paper on topological covers along with the self-
organizing maps wiki seem to indicate that nothing should be harmed by doing
this in a discrete setting.

Essentially, a self-organizing map projecting to the R^n slices, and then
learning the shape of the manifold generated by that atlas of projections.

------
heinrichf
(uploaded to the arXiv in May 2018)

~~~
sctb
Thanks! We've updated the headline.

