
Neural Networks, Manifolds, and Topology (2014) - flancian
https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
======
flancian
Previously:

[https://news.ycombinator.com/item?id=7557964](https://news.ycombinator.com/item?id=7557964)
[https://news.ycombinator.com/item?id=9814114](https://news.ycombinator.com/item?id=9814114)

But not a lot of discussion over there.

The visualizations are great, and this basically blew my mind. I didn’t know
of the manifold hypothesis until now.

    
    
      The manifold hypothesis is that natural data forms lower-dimensional
      manifolds in its embedding space. There are both theoretical and
      experimental reasons to believe this to be true. If you believe this, then
      the task of a classification algorithm is fundamentally to separate a bunch
      of tangled manifolds.
    

My interpretation/rephrasing: if you want to build a neural network that
distinguishes cat and dog pictures, in the worst case that would seem to
require a huge network with many nodes/layers (say, the number being a
function of the size of the image) rather than the number that seems to work
reasonably well in practice (six or some other rather low constant number
observed in reality). So the number of dimensions over which the “images” are
potentially spread is huge, but it’d seem that in the real world one can
rearrange the dog and cat images in a “shape” that then allows for relatively
easy disentanglement by the neural network; and these shapes can probably be
realized in much lower dimensions (in the example, six).

This could explain (for some definition of explain) the observed predictive
power of relatively small neural networks.

~~~
scottlocklin
It seems profound, but it's not really saying anything different than
"compression is the same thing as forecasting."

FWIIW unsupervised learning and stuff like topological data analysis is almost
entirely about discovering the actual manifolds (or some hand wavey topology).
Doesn't always work; the data often doesn't cooperate and live on a metric
space.

~~~
throwawaymath
That's a great way of saying it concisely. And to continue that generalization
- one of the consequences of information theory is that there exist trivially
incompressible strings. This plays nicely with the recent (2018) Lei-Luo-Yau-
Gu result that there exist manifolds which cannot be learned.

As I harp on every chance I can get, I have a pet hypothesis that there's a
very deep corollary here waiting to be proven rigorously. Namely, that we can
show there exist adversarial inputs that exploit neural networks because
they're incompressible. Furthermore, that these inputs are information
theoretically guaranteed to exploit the neural network (even if there are
practical complexity theoretic workarounds).

~~~
whatshisface
> _Furthermore, that these inputs are information theoretically guaranteed to
> exploit the neural network (even if there are practical complexity theoretic
> workarounds)._

I get the image of a technique that, when applied to humans, allows you to see
through political speeches and reveals the eldritch horrors scurrying around
us continually.

------
datascientist
Gunnar Carlsson will be teaching a related tutorial ("Using topological data
analysis to understand, build, and improve neural networks") on April 16th in
New York City [https://conferences.oreilly.com/artificial-
intelligence/ai-n...](https://conferences.oreilly.com/artificial-
intelligence/ai-ny/public/schedule/detail/73123)

------
yantrams
This was the article that helped me get neural networks when I began studying
them few years back. Interpreting them as a series of curvillinear coordinate
transformations really helped me understand them better.

PS: There is a great introductory article on entropy on the blog that is worth
checking out.

------
gdubs
This is beautiful, and surprisingly approachable. Also, feels relevant to this
recent conversation:
[https://news.ycombinator.com/item?id=18987211](https://news.ycombinator.com/item?id=18987211)

------
GlenTheMachine
If anyone could point me to literature on k-nn neural networks (or the
relationship, if any, between k-nn algorithms and basis function decomposition
and/or blind source separation) I’d be much obliged.

~~~
brookhaven_dude
He mentioned that he applied knn to handwritten digit recognition. A likely
approach would be as follows.

Given a new vector that needs to be classified (say, x), it is compared with
its nearest neighbors in the data set (let's call them x_1, x_2, x_3,..,x_k).
A weighted average of the categories of the nearest neighbors is calculated to
classify x.

That is to say, the category y that x belongs to is given by some function of
categories of the nearest neighbor (e.g. weighted sum)

y = f(w_1 _y_1 + .... + w_k_ y_k)

where f() is a function that converts the continuous weighted sum to one of
the integers representing the categories, and y_1,...,y_k are categories that
x_1,...,x_k belong to, respectively.

The weights w_1,....,w_k can be determined by optimizing some error function.

------
AlkurahCepheus
[https://www.youtube.com/watch?v=Yr1mOzC93xs](https://www.youtube.com/watch?v=Yr1mOzC93xs)

------
quenstionsasked
Haven't given it a lot of thought, but isn't his vector field idea somewhat
similar of an approach to neural ordinary differential equations?

