
A visual proof that neural nets can compute any function - p1esk
http://neuralnetworksanddeeplearning.com/chap4.html
======
otabdeveloper1
> Every continuous function in the function space can be represented as a
> linear combination of basis functions

[https://en.wikipedia.org/wiki/Basis_function](https://en.wikipedia.org/wiki/Basis_function)

This is basic and obvious math. Does slapping the word 'neural' magically make
obvious results 1000% more interesting? Why? Because the word 'neural' carries
some of that artificial-intelligence-technology-of-the-future cachet?

~~~
gizmo686
As the article says:

" If you're a mathematician the argument is not difficult to follow, but it's
not so easy for most people. That's a pity, since the underlying reasons for
universality are simple and beautiful."

Indeed, as a mathematician, the universality of neural networks is obvious to
me from their definition. However, this article is explicitly not aimed at
mathematicians, and (as far as I can tell) does a good job of presenting the
argument without requiring unnecessary math knowledge, which is something that
we tend to be bad at doing.

Furthermore, the universality does not directly follow from the quote you
provide, as you would still need to show that the hidden layer neurons form a
basis. Infact, the article was about constructing such a basis in the neurons.
The only thing that bringing in your quote would serve to do is make the
article more confusing by unnecessarily introducing the concept of basis
functions.

~~~
dicroce
As a non mathematician, this is non obvious to me.

Thinking about it a bit (I haven't finished reading the article yet)... Since
the size of the hidden layer isn't specified, I suppose you could have a
hidden layer node for every possible input... So, of course any function is
computable with a neural network. Really the magical thing here is finding the
smallest set of nodes that computes the function...

~~~
49531
I've been toying with a NN trying to get it to play 2048 based on game data I
recorded. I still have about 60% error rate, but I found that with 16 inputs
(the tiles on the game), and 4 outputs (directions to move), it works best
like a funnel.

I currently have 2 hidden layers, of 12 and 8 and it's the best I've gotten so
far.

------
shoo
I am glad that the article mentions the Stone-Weierstrass approximation
theorem [1], an analogous result for polynomials from 1885.

More recently, there's now "Chebfun" [2], an open-source Matlab library for
working with highly accurate 1- and 2- dimensional polynomial approximations
to arbitrary continuous functions.

Approximation theory: pretty useful, pretty well-established, under-hyped.

[1]
[https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theo...](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem#Weierstrass_approximation_theorem)
[2] [http://www.chebfun.org/](http://www.chebfun.org/)

------
mostly_harmless
My favourite way of looking at it is to imagine neurons in a neural net as
analogue nand-gates.

The logic within a computer processor is made entirely with nand-gates, and a
processor is able to compute any function. Nand-gates have 'functional
completeness' and can implement any other gate (AND, OR, NOT, NOR), and any
other high level construct with those functions.

Similarly, neurons in a neural net can output a 'NAND' function if set up
correctly.

This provides a logical inference of computational completeness of neural-nets

------
no_gravity
I wonder how efficiently it can do that compared to other systems.

For example a short iterative function like this:

    
    
        function(complex c)
          complex z=0
          int steps=0
          while (z<someNumber & steps<someNumber)
            z=z^2+c;
            steps++;
          return steps<someNumber;
    

Can calculate with extremely high accuracy if a point in the complex plane is
in the mandelbrot set or not. I would assume that a NN with the same accuracy
would be of enormous size. It would probably have way more neurons then there
are atoms in the universe.

~~~
3v3rt
I think one of the caveats was that the function should be relatively smooth
and continuous, a mandelbrot set isn't very smooth is it?

~~~
fnbr
Smoothness isn't necessary to prove the result, just continuity.

You do need smoothness to prove a bound on the rate of convergence of the
basis representation, and given that the boundary of the Mandelbrot doesn't
have a closed form representation (as far as I'm aware), I think that the
convergence of the neural network representation would be extremely slow.

------
chestervonwinch
I tend to think that the real advantage of big nets is that they're simply
compositions of matrix-vector operations (with some component-wise non-
linearity tossed in), which allows them to scale more naturally to massive
problems on the GPU ... Don't get me wrong - the universal approximation
theorem is important - but I think this is just the _first_ property an
approximator must have. I would be interested to see if the network model
could be shown to be remarkable in some other way.

------
solomatov
They can compute any function, but the same can be said about splines,
polynomials and many other ways to approximate functions. The problem is with
an algorithms which will not lead to overfitting and can really reasonably
approximate any function.

~~~
dragontamer
Wavelets are my favorite approximate function to build up with.

But yeah, the overfitting problem / approximation is the real issue. I do like
seeing the hybrid approaches (Genetic Algorithsm searching random Neural Nets
with a little bit of backpropigation for some measure)

------
JumpCrisscross
> _No matter what the function, there is guaranteed to be a neural network so
> that for every possible input, x, the value f(x) (or some close
> approximation) is output from the network_

Love NNs, but accuracy is relative. At some resolution of "close
approximation," every function "computes" every other function.

~~~
throwaway39202
What the author means by "close approximation" is clarified later (under the
"Two caveats" header). The point is that you can get an _arbitrarily close_
approximation of any continuous function, just by choosing a sufficiently
large number of hidden units. That is, for any epsilon > 0, a neural network
can approximate any function within epsilon of the function's exact value at
all points, with enough hidden units. The smaller the epsilon, the better the
approximation you want, and the more hidden units you need.

~~~
TheOtherHobbes
What if you're not approximating a continuous function?

~~~
leni536
I think it's mostly accurate in L2 norm and not point by point. So every
function in L2 can be approximated arbitrarily in L2 norm.

~~~
chestervonwinch
This is not true. The theorem proves point-wise convergence for a continuous
functions on a compact subset of R^n. See theorem 2, page 6:
[http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf](http://deeplearning.cs.cmu.edu/pdfs/Cybenko.pdf)

------
mwexler
Note that this is one chapter of a good intro "online book" to neural nets.
[http://neuralnetworksanddeeplearning.com/index.html](http://neuralnetworksanddeeplearning.com/index.html)
is a good place to start.

------
forkandwait
I think an important thing to remember about neural nets is that they are
basically a good way to overfit. So, yes, you can approximate any function
because you have lots and lots of variables, but you shouldn't fool yourself
that you are getting the same information as when you write a deterministic
equation with a few variables ("with four parameters I can fit an elephant,
with five I can make him wiggle his trunk"... Von Neumann). You can the the
baseball but you don't yet know the physics.

I think skilled biological actors use this overfitting to get really good at
things without knowing how things actually work.

NN are still cool, though.

~~~
effie
> I think skilled biological actors use this overfitting to get really good at
> things without knowing how things actually work.

Perhaps that's the mechanism behind intuition?

------
GuiA
_> Consider the problem of naming a piece of music based on a short sample of
the piece. That can be thought of as computing a function. Or consider the
problem of translating a Chinese text into English. Again, that can be thought
of as computing a function. Or consider the problem of taking an mp4 movie
file and generating a description of the plot of the movie, and a discussion
of the quality of the acting. Again, that can be thought of as a kind of
function computation. Universality means that, in principle, neural networks
can do all these things and many more._

Eesh, that seems like poor technical written, and potentially misleading. It's
far from clear that these are computable functions in the first place; Penrose
argues (see notably "Shadows of the Mind") that there are functions which can
be computed by the human brain that are undecidable over the Church-Turing
thesis.

 _" Just because we know a neural network exists that can (say) translate
Chinese text into English, that doesn't mean we have good techniques for
constructing or even recognizing such a network."_ seems very hand wavy - we
don't _know_ that such a neural network exists without first going through a
big set of still controversial assumptions.

The rest of the article is good, but I'm not a big fan of that section. It's
not a huge part, but I think this could give some students the wrong idea
about what is a computable function, and what neural networks can compute.

~~~
cLeEOGPw
By function I think he meant mapping input points to output points in an
abstract plane.

So in that sense a piece of music, or a sentence in one language is a point of
input, while name of music or sentence in another language is another point.

Everything is a function as long as there is a way to turn that thing into
inputs that correspond to outputs.

~~~
GuiA
Ok, so by your reasoning let's have a function as a point of input, and
whether it halts or not as a point of output.

So now we have a function, I can't wait till "we have good techniques for
constructing such a network" that maps those inputs and outputs in an abstract
plane :)

~~~
cLeEOGPw
That is a good example. But you are forgetting that neural networks are
approximating the actual functions, so the function you described could be
built with some kind of confidence level in the answer. Just like you can have
some confidence that certain code will not halt from experience, neural
network could also be built to do that. Not all possible functions though,
unless you have infinitely large network with infinite computing power.

~~~
groar
Yeah but it's still misleading as we are talking about approximating
continuous functions here, not _any_ function. Those examples are not clearly
computable, or even just continuous..

------
bluesign
Isn't this similar to infinite monkey theorem: "Given an infinite length of
time, a chimpanzee punching at random on a typewriter would almost surely type
out all of Shakespeare's plays."

Given that we can have infinite big neural network and train it in an infinite
length of time, it can compute any function.

~~~
tfgg
I don't think so, since the former is about the occurrence of sequences in
stochastic, ergodic processes, and the latter is about approximating
deterministic functions.

It's more like given an infinite number of parameters, you have enough degrees
of freedom to describe the vector space a function lives in with some set of
basis functions, and a neural net supplies that.

------
jarradhope
If you're interested in this you should check out Chris Eliasmith's lab - they
have a Neural Engineering Framework and NENGO, which in it's current
implementation is like a Python to neural network compiler.

[http://www.nengo.ca](http://www.nengo.ca)

------
yugoja
Is anybody reading this book? How good is it for a beginner?

~~~
IngoBlechschmid
The book is truly excellent. You need some knowledge of (partial) derivatives
and matrices, but other than that it doesn't require any more sophisticated
mathematics.

We used it to prepare a 10-day course on neuronal networks for high school
students. For anyone interested (and capable of reading German), see
[https://github.com/iblech/mathematik-der-
vorhersagen](https://github.com/iblech/mathematik-der-vorhersagen) for some
notes, Python code, and videos.

------
bobm_kite9
I love the inline videos and graphs. How was this done?

~~~
p1esk
I believe you can see all the source code in your browser.

------
drdeca
I don't think this always works if the function in question has infinitely
many discontinuities...

~~~
p1esk
What would be an example of a practical task that needs to be modeled with
such a function?

~~~
gizmo686
Rounding

~~~
p1esk
Good example, but in this case it's actually very easy to find weights for the
neural network that would do rounding with any desired precision (look for the
"stack of towers" method in the article).

~~~
gizmo686
Yeah, I think I picked rounding because the article primed me to think about
it (the proof presented is essentially based on the ability to create steps,
and rounding is my go-to example of a step function). However, any attempt to
construct such a network would have a finite amount of steps in it, while the
actual step function would have infinite.

In practice, if we needed such a network, we would probably have a restricted
domain, so there is only a finite number of discontinuous, and there would be
a sufficiently large network that could solve it to an arbitrary precision.

We would not be able to do this with more exotic functions that are densly
discontinues, such as the function:

    
    
        f(x) = 0  iff x is rational
        f(x) = 1  iff x is irrational
    

There is no way a neural network can reasonably model that function.

------
grondilu
Even weirdos like the Weirstrass function?

[https://en.wikipedia.org/wiki/Weierstrass_function](https://en.wikipedia.org/wiki/Weierstrass_function)

~~~
gizmo686
One of the caveats mentioned in the article is that the neural networks do not
necessarily compute a given function precisely, but rather approximate it to
an arbitrary accuracy. In the case of the Weierstrass function, this removes
any weirdness. This is because, for any accuracy, you could approximate the
Weierstrass function as a polynomial. Note that the polynomial example is only
to show that the weirdness of the Weierstrass function is not relevant in this
context. Approximating it as a polynomial is not necessarily a good approach
to approximating it as a neural network.

------
bra-ket
this is really cool

~~~
kpil
Isn't it so that all continous functions can be approximated with sums of sine
waves too?

That is also cool but without the AI connotations...

~~~
titanomachy
I don't know, sine waves are pretty smart... I think it's only a matter of
time before they take over the world.

