

Gradient Descent with Backpropagation - todd8
http://outlace.com/Beginner-Tutorial-Backpropagation/

======
1024core
Everybody writes the basic gradient descent example and then walks away. There
are tons of such tutorials around.

Want to impress the gallery? Do it for RNNs and LSTMs. Watch, as people will
handwave and claim it's left as an exercise for the reader. That's not enough;
give these topics the same level of introductory treatment you gave BP.

------
musesum
This is great! I'm exactly the person that post was intended for, with the
limited ML background. I'm following both Andrew Ng's and Richard Socher's
lectures. I got stuck around chain rule.

Speaking of Socher, he mentions in his dissertation about alternatives to
using the Sigmoid:

"Various other recent nonlinearities exist such as the hard tanh (which is
faster to compute) or rectified linear: f(x) = max(0,x) (which does not suffer
from the vanishing gradient problem as badly)."

I wonder if one of these alternatives would improve on the XOR example? Am
hoping to implement some ANNs on a mobile GPU, using iOS + Metal. But 10,000
epochs for the XOR example sounds a bit scary - as in turning an iPhone into a
hand warmer.

