
What does a neural network actually do? - zercool
http://moalquraishi.wordpress.com/2014/05/25/what-does-a-neural-network-actually-do/#
======
dlsym
"What does a neural network actually do?"

This is a fundamental Question: Can we really say and predict what a neural
network does?

Contrary to an engineered / constructed algorithm a neural network is
'trained'.

Whenever we will present a 'known' input pattern it will reposnd with a
'learned' response.

This however introduces interesting problems: How can we _debug_ a neural
network? How can we debug a correlation? Sure we can tune its paramters, we
can train it some more to again show the desired response. But reaching this
point we just abandoned knowing how the intrinsic algorithm works in favor to
just focussing on the result.

Okay - now if we follow this argument - this would lead to: If we simulate the
whole brain by simulating the neural network, we won't gain any knowlege about
the intrinsic workings of the brain. We won't find any enlightenment about the
innermost algorithm _represented_ by the neural network we call our brain.

~~~
ShardPhoenix
I can't find an example on Google right now, but I've seen demonstrations that
it's possible to visualize the intermediate layers of a neural network - for
example you can see how an image recognition network is first breaking down an
image into horizontal and vertical lines, then combining those into more
complex shapes, etc.

~~~
joe_the_user
But visualizing is quite a ways from debugging.

To debug a program you actually verify that it's logic is correct (at least
the good kind of debugging).

Consider a spectrum:

1\. Natural language - we humans combine fragments of natural language easily
and on an ad-hoc basis. We can get a fairly amount of use from reusing
Shakespeare quotes and neologisms while spending rather little effort.

2\. Trained programmers can reuse and combine general purpose libraries - with
difficulty and often after considerable debugging.

3\. AI algorithms like Neural Networks. These are just plopped in and tweaked,
not combining seems possible.

It seem like "intelligent behavior" should be going more towards #1 but the
process of Machine Learning seems to move things more towards #3. The "learn
once, understand never" approach means that for each significant case, you'll
need to do a re-tweaking and re-learning. The potential to get harder rather
than easier over time might well be there.

~~~
judk
Can you debug a human brain? I can't.

Is a human brain intelligent? I believe so.

~~~
joe_the_user
Well,

Admittedly, all this is in manner of speaking but still, I would claim that
most if not all the times you debug a program you are also debugging your
mental concept of what the program does. By that fact that we can change our
concepts, our minds are very "debuggable."

------
csense
I've found that, in practice, traditional neural networks tend to be prone to
overfitting and are finicky about their parameters (in particular the topology
and number of nodes you choose).

I use the word "traditional" to describe the NN architecture discussed in the
article. Recent NN research has been promising [1], but this article strictly
discusses traditional NN's. I don't really have much experience with the newer
NN algorithms, so I'm not sure to what extent they suffer from the same
problems as traditional NN's.

[1]
[http://en.wikipedia.org/wiki/Neural_network#Recent_improveme...](http://en.wikipedia.org/wiki/Neural_network#Recent_improvements)

~~~
billderose
Hinton's DropOut [1] and Wan's DropConnect [2] have ameliorated some of the
overfitting issues present in traditional NN's. In fact, DropConnect in
conjunction with deep learning are responsible for new records being set on
classical datasets such as MNIST.

[1] [http://arxiv.org/pdf/1207.0580.pdf](http://arxiv.org/pdf/1207.0580.pdf)
[2] [http://cs.nyu.edu/~wanli/dropc/](http://cs.nyu.edu/~wanli/dropc/)

~~~
agibsonccc
Dropout is actually a knob on any neural network. These are used in image
recognition as well as text and other areas.

The fuzzing creates a very similar effect to convolutional nets where it can
learn different poses of an image.

------
nullc
Another limit they don't address is that the training normally used is purely
local— just a gradient descent. So even when the network can model your
function well, there is no guarantee that it will find the solution.

For me ANN's always seem to get stuck on not very helpful local minima—
they're not one of the first tools in my bags of tricks by far.

Often I associate them as being the sort of thing that someone who doesn't
really know what they're talking about talks about. (Esp. if its clear that in
their minds NN have magical powers. :) maybe they'll also mention something
about "genetic algorithms")

~~~
nanidin
Well, you could use an EA to take a stab at finding better minima :)

And correct me if I'm wrong, but isn't the cost function for a feed forward
neural networks that uses a sigmoid activation function convex wrt the
parameters being trained, i.e. gradient descent is guaranteed to find the
global minimum when small enough of a step size is used?

~~~
chestervonwinch
Mostly, no. Hidden units introduce non-convexity to the cost. How bout a
simple counter-example?

Take a simple classifier network with one input, one hidden unit and one
output and no biases. To make things even simpler, tie the two weights, i.e.
make the first weight equal to the second. Now, mathematically the output of
the network can be written: z=f(w * f(w * x)) where f() is the sigmoid.

Next, consider a dataset with two items: [(x_1, y_1), (x_2, y_2)] where x_i is
the input and y_i is the class label, 0 or 1. Take as values: [(0.9, 1),
(0.1,0)]. The cost function (loglikelihood in this case) is:

L(w) = sum_i { y_i * log( f(w * f(w * x_i)) ) + (1-y_i) * log( 1-f(w * f(w *
x_i)) ) }

or

L(w) = log( f(w * f(w * 0.9)) ) + log( 1-f(w * f(w * 0.1)) )

Plot that last guy replacing f with the sigmoid, and you'll see the result is
non-convex - there's a kink near zero.

------
oldspiceman
A less mathy explanation with some real examples:
[http://neuralnetworksanddeeplearning.com/chap1.html](http://neuralnetworksanddeeplearning.com/chap1.html)

Coding a digit recognizer using a neural network is an extremely rewarding
exercise and there's a lot of help on the web to get you started.

~~~
agibsonccc
This is a great example of a hello world application. Keep in mind there are
several kinds of neural nets that allow you to do this. This includes
convolutional RBMs (recognizes parts of an image) and normal RBMs (learns
everything at once)

------
robert_tweed
This is a pretty good article, but I'm seeing a lot of confusion in this
thread because the article is maybe one step ahead of the basic intuition
needed to understand why ANNs are not magical and are not artificial
intelligence (at least not feed-forward networks).

Perhaps a simpler way to look at it is to understand that a feed-forward ANN
is basically just a really fancy transformation matrix.

OK, so unless you know linear algebra, you're probably now asking what's a
transformation matrix? Without the full explanation, the important
understanding is why they are so important in 3D graphics: they can perform
essentially arbitrary operations (translation, rotation, scaling) on
points/vectors. Once you have set up your matrix, it will dutifully perform
the same transformations on every point/vector you give it. In graphics
programming, we use 4x4 matrices to perform these transformations on 3D points
(vertices) but the same principle works in any number of dimensions - you just
need a matrix that is one bigger than the number of dimensions in your data*.

Edit: For NNs the matrices don't always have to be square. For instance you
might want your output space to have far fewer dimensions that your input. If
you want a simple yes/no decision then your output space is one-dimensional.
The only reason the matrices are square in 3D graphics is because the vertices
are always 3-dimensional.

What a neural network does is take a bunch of "points" (the input data) in
some arbitrary, high number of dimensions and performs the same transformation
on all of them, so as to distort that space. The reason it does this is so
that the points go from being some complex intertwining that might appear
random or intractable, into something where the points are linearly separable:
i.e., we can now draw a series of planes in between the data that segments it
into the classifications we care about.

The only difference between a transformation matrix and a neural network is
that a neural network has at least two layers. In other words, it is two (or
more) transformation matrices bolted together. For reasons that are a bit too
complex to get into here, allows an NN to perform more complex transformations
than a single matrix can. In fact, it turns out that an arbitrarily large NN
can perform any polynomial-based transformation on the data.

The reason this is often seen as somewhat magic is that although you can tell
what transformations a neural network is doing in trivial cases, NNs are
generally used where the number of dimensions is so large that reasoning about
what it is doing is difficult. Different training methods can give wildly
different networks that seemingly give much the same results, or fairly
similar networks that give wildly different results. How easy it is to
understand the various convolutions that are taking place rather depends on
what the input data represents. In the case of computer vision it can be quite
easy to visualise the features that each neuron in the hidden layer is looking
for. In cases where the data is more arbitrary, it can be much harder to
reason about, so if your training algorithm isn't performing as you'd like, it
can be difficult to understand why it isn't working, even if you already
understand that the basic principle of a feed-forward network is just a bunch
of simple algebra.

~~~
sergiosgc
> The only difference between a transformation matrix and a neural network is
> that a neural network has at least two layers. In other words, it is two (or
> more) transformation matrices bolted together. For reasons that are a bit
> too complex to get into here, allows an NN to perform more complex
> transformations than a single matrix can. In fact, it turns out that an
> arbitrarily large NN can perform any polynomial-based transformation on the
> data.

Nice explanation. I need one clarification, though. Isn't matrix
multiplication associative? Isn't thus any transformation defined by two
matrices representable by a single matrix that is the product of the two
matrices?

I am probably misunderstanding how NNs bolt matrices together.

~~~
tfgg
You apply a non-linear function (usually some sigmoid) on the output vector
after each matrix product. Otherwise, you'd be correct and any multi-layer ANN
could be expressed as a single layer network.

~~~
sergiosgc
Thanks. It makes sense. The sigmoid is the activation function of the output
"neuron". Unfortunately, matrix algebra here is not as useful as in computer
graphics.

~~~
tfgg
No problem. Actually, I personally found that a pretty intuitive understanding
of linear algebra & vector calculus makes quite a lot of ML straight forward
to approach geometrically.

------
neuralnet
[http://www.dkriesel.com/en/science/neural_networks](http://www.dkriesel.com/en/science/neural_networks)
about 200 page bilingual ebook about neural networks containing lots of
illustrations, in case some guys of you want to read further. There's also a
nice java framework to try out for the coders.

------
thegeomaster
Looks like an interesting text, but to be honest I didn't understand a
substantial portion of it.

~~~
sillysaurus3
You're 17; get to work understanding it! The more you learn now, the more
you'll be able to do for the rest of your life. Plus, diving into random
topics and understanding them more deeply than anyone else is a ton of fun.

------
nsxwolf
Title made me think it would be something for beginners. Either it is not or I
am very dim.

------
ctdavies
They write comments on HN.

~~~
jacquesm
Those are Markov-chain based text generators. (yes, I got the joke).

~~~
Houshalter
Well recently a lot of work has gone into using NNs for natural language
processing. Typically it's trained to do something like predict the next word
or character in a sequence. Using that you could possibly create a far better
generative model than markov chains, and create more realistic sentences.
Perhaps you could even combine them (NN gets the output of the markov chain to
help make it's prediction.)

