

Introduction to Artificial Neural Networks – Part 1 - bananacurve
http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7/?

======
gamegoblin
I have been messing with ANNs since I learned how to program. For years,
understanding of ANNs eluded me. Here is the breakthrough I had some time ago
that finally let me understand ANNs:

Think about them like circuits. It is easy to see that a circuit full of NAND,
AND, OR, etc can carry out computations. An AND gate is a gate that requires
all of its inputs to be true and then it goes high. An OR gate requires at
least 1 of its inputs to be true and it goes high.

Think of a neuron as being somewhere in between that (I understand that in the
article he shows how to simulate gates, but doesn't quite describe neurons as
being fancy gates (which is where my breakthrough happening)). It requires N
of its inputs to be true and it goes high. Then imagine that the "wires" in
our neural network don't carry information in the form of bits, but instead in
the form of real numbers. Each wire has a "weight" associated with it, such
that when the wire is turned on, it outputs that value, rather than a simple
binary 1.

Now imagine that the neurons take all of these real numbered inputs, and apply
some function to them to decide if the neuron wants to turn on or not. It
might simply sum them, multiply them, or something more complex, but based on
its inputs, it turns on. Its on signal then gets sent to all of the neurons
that it points to, etc. The same way you can extract answers from a circuit of
logic gates by seeing the output of the gates, you can extract answers from an
NN by examining the output of certain neurons.

This description is quite simplified and doesn't go into the architecture of
ANNs, but if you are really having a lot of trouble grasping how ANNs work,
this description should give you some intuition. The hardcore ML people will
probably dislike it, but you have to start somewhere. After understanding it
like this, I branched out quite a lot and now all of my academic research
involves machine learning. But it took that initial breakthrough!

~~~
pmelendez
Actually that's the traditional analogy, and if I recall correctly in the
process of thinking it as a circuit was how Marvin Minsky realized that the
original perceptron is unable to mimic a XOR gate.

~~~
gamegoblin
Really? In all of my perusing of various tutorials online, I never encountered
it explained in this intuitive fashion, that neurons are just a generalization
of logic gates.

~~~
jonsen
Threshold Logic - a kind of in-between.

------
11001
People! If you're interested in learning Neural Networks, please do yourself a
favour, go learn them from a good textbook (e.g.
[http://www.cs.toronto.edu/~mackay/itprnn/book.html](http://www.cs.toronto.edu/~mackay/itprnn/book.html)
is free online), not these "online tutorials". I know, math-y books don't
often make it sexy "for hackers", but it is really the only way to learn a
math subject.

------
maaaats
Does anyone here have any good guides/articles about the structure of the
artificial network? At university I solved simple problems in a course with
ANs, but the structure of my network (hidden layers, how they were connected
etc.) were basically just try/fail until I achieved ok results.

~~~
dave_sullivan
There aren't any I'm aware of that tie this question together nicely, but here
are the general structures in use right now. Google will lead you to more
information on each.

Standard feed forward deep net: like the ones you used at university but with
a few important features. One, you can stack layers on top of each other then
train the whole network with back propagation. The nonlinearity you use can be
important (rectified nonlinear units--Max(0,x) are popular now).
Regularization can be important (with dropout being a popular method). Pre-
processing your data can also be important (eg scaling the inputs, subtracting
the mean, ...). How you initialize your weights is important. Using tricks
like momentum and learning rate decay are important too.

Autoencoder: A standard feed forward deep net that tries to output an
"uncorrupted" sample of a "corrupted" input. So basically, take an image, add
noise, ask it to output the denoised image. Why you ask? Well, in doing this,
the hidden units (the layers in between the input and output) tend to discover
new features which can then be used elsewhere in machine learning pipelines.
The big win here is automated feature engineering.

RBM: Restricted boltzman machine, sort of like an autoencoder but not. I'm not
convinced one is better than the other, but there are definitely differences.

Recurrent neural networks: Like standard feed forward deep nets but extended
to time series problems. The hidden units at each timestep feed in as
additional input to the next timestep (along with the new input). So
basically, each step it gets this as input: [whatever came before described as
hidden units] + [input at T]. These are my personal favorite and currently
hold state of the art in speech recognition, at least academically.

Convolutional neural networks: State of the art in vision, this one is kind of
hard to explain quickly, but you can think of it kind of like this: I take
some image where I want to recognize objects in it. Then I train a bunch of
neural nets that represent different aspects of an image (one net might look
for vertical lines, another might look for horizontal. It figures out what
it's looking for automatically and distributes what it thinks are important
across these little networks. You can think of this as "feature detection")
You then take a window on the image (400x400 image, maybe you've got a 20x20
window) which you then slide across the image and check each section for the
presence of whatever the net is looking for (vertical/horizontal lines, etc.)
There's several layers of this (specifically a convolution operation followed
by a pooling operation) before the result is fed to a standard fully connected
feed forward net which then outputs a prediction.

While lower layers look for low level aspects of an image, each layer
progressively looks for higher and higher level phenomena--for instance, when
you hear about "the cat neuron" it comes from probing a neuron in the higher
levels of a convolutional net and finding that pictures of cats happen to turn
this neuron "on" while pictures of anything else don't. What is "high level"
exactly? In this case, it really just means composed from "lower level"
components... Some also say "higher level = more abstract" but I don't know
this is quite the case--abstract means something different to me.

An important consideration is that the conv net architecture allows you to cut
down on the total number of parameters (good when you've got something high
dimensional like image data) and it also takes advantage of the fact that you
can make certain assumptions about images and how objects move/appear in space
basically.

Conv nets are probably the most difficult architecture to grasp and my
explanation is extremely high level.

Ha, that was a bit longer than I had hoped, but I hope it's helpful to
someone...

~~~
derefr
> Conv nets are probably the most difficult architecture to grasp

I might be totally off-base, but is Numenta's HTM (Hierarchical Temporal
Memory) model a conv net? If so, there are some really good introductory/TED-
talk-like explanations out there for the idea.

~~~
dave_sullivan
Heh, I _still_ can't figure out what an HTM is exactly, but I think it's sort
of similar to a recurrent convolutional network. It handles learning
completely differently (allegedly in a more biologically plausible way).

Personally, I think there are two big issues with what numenta was doing: they
diverged too far from mainline neural net research (which, when they started
in ~2005 was just before things started getting interesting--it was also a
time when "mainline neural net research" was widely assumed to be at a dead
end) and they tried too hard to come up with something that was biologically
plausible rather than mathematically expedient. Sort of how airplanes need
wings but don't need feathers. As far as I am aware, HTMs really just don't
work well in practice (and by that, I mean they are not anywhere near
competitive with any of the architectures I listed above).

~~~
chmike
What do you mean by things getting interresting ? Did I miss something after
2005 ? Do you have some reference on the web ?

~~~
dave_sullivan
2006 is commonly cited as the year when "deep learning" started becoming
practical. Pretty sure it was Hinton's group, but they used greedy
unsupervised pre-training to get a good initialization of the weights, then
followed by supervised finetuning of said weights. That result kicked off a
lot of renewed interest in NNs, which then led to using GPUs for a 40x
speedup, which then led to many more impressive results (and they just keep
coming). It turns out the unsupervised pre-training isn't even necessary, go
figure...

~~~
chmike
Thank you:
[http://en.wikipedia.org/wiki/Deep_learning](http://en.wikipedia.org/wiki/Deep_learning)

------
TrainedMonkey
I've been messing with Neural Networks for years. Including doing AI college
project and taking class on them later. What I learned - unless you structure
and query your neural networks in a very special way all they do is
approximate higher order functions. It is not trivial to correctly structure
neural network and input/outputs.

TL:DR Neural networks are super powerful, but really hard to use properly for
anything slightly harder than simple functions.

------
cjauvin
I have written a small series of posts which show (with some simple Python
code and math) the relationship between the training procedure of an ANN
(backpropagation) and some simpler, more basic machine learning algorithms
(like logistic regression):

[http://cjauvin.blogspot.ca/2013/10/neural-
network-101.html](http://cjauvin.blogspot.ca/2013/10/neural-network-101.html)

------
rdlecler1
It becomes a lot easier to think about these circuits if you realize that many
of the connections are actually spurious. That is, if you remove them and
systematically test the function over a range you'll realize that many of
these weights have no information bearing value. For instance, if you have a
w_i_j=0.01 this may be totally insignificant for any sigmoidal function and
for all intents and purposes should be w_i_j=0 in which case this complex web
of neural connections is trimmed, resulting in the underlying circuitry.

If you're interested I wrote a paper in Nature Systems Biology that shows this
for artificial gene networks (just another branch of ANNs): Survival of the
Sparsest: Robust Gene Networks are Parsimonious

------
howeman
If anyone is interested, I have a working neural net implementation
[https://github.com/btracey/nnet](https://github.com/btracey/nnet) . I don't
make any stability promises (especially on the training routines, that's a
mess at the moment), but sometimes it's nice to see code. The core of the net
is
[https://github.com/btracey/nnet/blob/master/nnet/functions.g...](https://github.com/btracey/nnet/blob/master/nnet/functions.go)
and
[https://github.com/btracey/nnet/blob/master/nnet/nnet.go](https://github.com/btracey/nnet/blob/master/nnet/nnet.go)

~~~
howeman
On the stability, the point is that if you'd like to use it you should fork
it.

------
amazedsaint
Cool, here is another one I wrote few years back -
[http://www.amazedsaint.com/2008/01/neural-networks-part-i-
si...](http://www.amazedsaint.com/2008/01/neural-networks-part-i-simple.html)

------
funky_vodka
That guy did another nice tutorial on a simple evolutionary algorithm and how
to implement it to solve the travelling salesman problem. His explanations are
useful for beginners like me.

------
justplay
Please please do not stop writing about this. I really want to know more.
thanks.

~~~
Anon84
You should check out the Coursera ML class videos.

------
pilooch
A very good source of information, and tutorials with source code is
[http://deeplearning.net/tutorial/contents.html](http://deeplearning.net/tutorial/contents.html)

~~~
stillsut
Also, Coursera partnered with Geoffrey Hinton (the big name in Deep Learning
right now) to put together a pretty good course:

[https://www.coursera.org/course/neuralnets](https://www.coursera.org/course/neuralnets)

------
foobarqux
Anyone have information about understanding how inputs affect outputs in a
trained ANN? Specifically to better understand the system, which inputs have
the most impact and how they interact with out inputs.

~~~
tlarkworthy
Its called the Jacobian. The differential of the outputs by the inputs. Tells
you how each input affects the output vector at a specific location in input
space. You can calculate it manually. Use inbuilt Matlab functions or finite
difference it.

~~~
foobarqux
So what is all the talk about how ANNs yield solutions that don't have a
straightforward interpretation unlike regression?

~~~
tlarkworthy
Well the jacobian is a numerical matrix unique at each point in the input
space. So its kinda hard to visualize the changing Jacobian matrix over the
input space.

People do interpret ANNs. Normally by visualising the weight matrices on the
input layer, which have a 1-1 mapping to input vector attributes (so you can
label them). It gets a bit hard on subsequent layers though.

decision trees are an order of magnitude easier to interpret though, they are
very compact and consist of a number of binary decision in the input space.
For a given classification you only need to look at log(d) decisions to work
out how it got to its conclusion. Its not usually that hard to how it arrived
at the tree from training.

The relationship between the Jacobian and the training is fairly convoluted,
given back propagation and cross validation.

PS so back-prorogation uses weights differentiated against outputs. Jacobian
is input differentiated against outputs. You can do smart things with Jacobian
aware neural architectures. See "Forward models: Supervised learning with a
distal teacher" which trains two networks in parallel, one a forward model,
one an inverse model, and uses the Jacobian to circumnavigate the problematic
non-convexity of the inverse model.

~~~
foobarqux
> Well the jacobian is a numerical matrix unique at each point in the input
> space. So its kinda hard to visualize the changing Jacobian matrix over the
> input space.

Yeah it doesn't seem like it would be easy to gain much insight since the
partial derivatives are a function of the (other) inputs.

> People do interpret ANNs. Normally by visualising the weight matrices on the
> input layer, which have a 1-1 mapping to input vector attributes (so you can
> label them).

What do the first-layer weights tell me?

~~~
tlarkworthy
If the weight is positive that feature helps the neuron fire and vice versa.

If the neural network is processing images, the weights form and image too.
You can tell what a lot of the units are "looking" for by plotting the weights
as an image.

page 14, Figure 3: jmlr.org/papers/volume11/erhan10a/erhan10a.pdf

You can clearly see various digit deformations in the weights. (white is
positive weight, black is negative weight typically)

Lots of papers do this, this one was happened to be the first one I managed to
google.

So the first layer is normally readable because weights are in the same space
as the input feature space. The second layer is normally a jumble as its
randomly initialised before convergence. But in the paper's example you can
imagine the 2nd layer output unit representing a final classification of a 4,
is probably summing up all the 4 deformation detectors in layer 1 and
negatively summing up everything else.

~~~
foobarqux
> If the weight is positive that feature helps the neuron fire and vice versa.

But that doesn't tell you much does it? It's effect could be reversed or
minimized in subsequent layers.

Haven't read it yet but thanks for the paper.

~~~
tlarkworthy
Just look at the one figure in the paper and you will see.

Because your final layer is positively encoded, that polarity trickles down
through back propogation. I also think the positive weights thing is to do
with the way features normally work. It makes sense to look for corners, not
anti corners for object detection.

