

A Gentle Introduction to Backpropagation - ttandon
http://numericinsight.blogspot.com/2014/07/a-gentle-introduction-to-backpropagation.html

======
callinyouin
I took a Machine Learning course this past winter and this article in
particular would have been really helpful since I struggled most with this
concept in particular (and gradient descent in general). While most resources
show you the mechanics of neural networks, none I found were very good at
explaining (to me) the purpose and meaning behind them. Sure, I could follow
along and eventually figure out how to write my own neural network, and I did,
but I honestly never completely understood what was going on. The problem with
most ML texts/resources for people like me without a strong math background is
that a lot of high-level math is presented without an explanation of what
mathematical concepts are being used. I admit that the onus is on me, the math
dummy, to go out and learn the concepts involved, but it's difficult to look
at a confusing algorithm chock full of unfamiliar concepts and know where to
start. This article explains things nicely and I hope to see more like this in
ML.

~~~
jbarrow
If you want a nice, visual, explanation of _what_ a neural network is actually
doing, I found this blog post to be particularly helpful. [1] Although there
is some math, the rest of the post gives a good intuitive understanding of
what's going on without it.

[1] [http://colah.github.io/posts/2014-03-NN-Manifolds-
Topology/](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)

------
taylorbuley
This is a very welcome read but it hypes neural networks a bit. I've been
working with in JavaScript using IndexedDB and while researching I was
disappointed to find that some smart people seem to think they are much more
limited than made out to be here and elsewhere
[https://www.youtube.com/watch?v=AyzOUbkUf3M#t=242](https://www.youtube.com/watch?v=AyzOUbkUf3M#t=242)

To summarize, people generally abandoned backpropigation trained neural
networks for Support Vector Machines because neural nets require labeled and
limited datasets, and work slowly and especially so when dealing with multiple
layers which is sort of the whole point.

In my work in JavaScript, I was able to pull off only a single layer
perceptron and it is neat but limited in what it can model.

~~~
jbarrow
It's true that the beginning of the talk is about how people abandoned
backprop trained neural networks because they often underperformed SVMs, but
the rest of the talk is about deep learning, which introduced a new generation
of neural nets (since 2006) that are the current state of the art for a lot of
problems.

In fact, deep neural networks are trained in an unsupervised manner at first,
but then back propagation is used to "fine tune" and improve the results.
Because they require unlabeled data sets, and can perform so well, research
into neural networks has experienced a recent resurgence.

By the way, any talk by Geoff Hinton is fantastic. If you are interested in
neural networks and their capabilities, and you haven't already seen it, his
Coursera course [1] builds from a simple linear perceptron to the current deep
learning methods.

[1]
[https://class.coursera.org/neuralnets-2012-001](https://class.coursera.org/neuralnets-2012-001)
(You'll have to sign in to see it)

~~~
agibsonccc
I want to add to this, that there's a lot of work that doesn't require
pretraining. I recently implemented the more advanced hessian free
optimization methods that don't require pretraining in my deep learning
framework. The results are amazing. I'm hoping to demonstrate a lot of the
tradeoffs of the different methods in a more comprehensive manner here
shortly. This was an updated extension by some of hinton's students. The paper
I implemented was:

[http://www.cs.toronto.edu/~rkiros/papers/shf13.pdf](http://www.cs.toronto.edu/~rkiros/papers/shf13.pdf)

~~~
taylorbuley
Thanks for the extra resources. Any more are very welcome. Neural nets
sometimes feel like quite the black box, despite their ease of implementation
and apparent power.

------
jkbyc
The best text I've read about backpropagation is in "Neural Networks: A
Systematic Introduction" by Raul Rojas [1]

He uses a nice graphical approach that is easily understandable yet formal.
It's been many years since I've read it to learn for an exam at university but
I remember it was an enjoyable read and I wished I had more time to spend on
the book.

[1] [http://www.amazon.com/Neural-Networks-A-Systematic-
Introduct...](http://www.amazon.com/Neural-Networks-A-Systematic-
Introduction/dp/3540605053/)

~~~
jbarrow
Not having read Rojas' book, I think the best text I've ever read on the
subject is Michael Nielsens (in progress) "Neural Networks and Deep Learning."
[1] Chapter 2 covers back propagation.

It has a similarly approachable yet formal style, and I have recommended it to
people with no ML experience in the past who have found it very intuitive.

[1]
[http://neuralnetworksanddeeplearning.com](http://neuralnetworksanddeeplearning.com)

~~~
Thriptic
I agree; I am just getting into machine learning currently, and chapter 2 of
this book is what made back propagation "click" for me.

------
jmount
The backprop idea is one instance of a general idea called reverse
accumulation (which you can use in other contexts): [http://www.win-
vector.com/blog/2010/07/gradients-via-reverse...](http://www.win-
vector.com/blog/2010/07/gradients-via-reverse-accumulation/)

------
WhitneyLand
"Neural networks, a beautiful biologically-inspired programming paradigm which
enables a computer to learn from observational data"

In the context of CS what's the difference between "learning" and optimizing?

~~~
agibsonccc
Optimization in the context of machine learning is a search function that
searches a param space by repeated calculations of gradients (derivatives: eg,
calculate the derivative and test the objective function) to arrive at some
optimal function.

Learning in this case generally means the creation of models. Optimization is
the process by which this happens.

A few examples:

Unsupervised learning - typically clustering and grouping things

Supervised Learning - Example based learning where you're trying to label
something. This could be sentiment, spam classification, object recognition
over pixels, even extending to sequence labeling.

Prediction/Regression - Predicting values based on a learned function.

Learning is more about the goal trying to be achieved. Optimization is more of
numerically solving relative to an objective function (also called an error
function)

------
ff7c11
5-second flashing gifs :(

