
Backpropagation algorithm visual explanation - amzans
https://google-developers.appspot.com/machine-learning/crash-course/backprop-scroll/
======
londons_explore
I find explanations like these are great for people who understand
mathematical notation, sigmoid functions, derivitives, etc. These people
however typically understand whats going on from a simple text description of
the process.

For those without a math background, the notation is very opaque. A far better
explanation is to explain it numerically with simple examples.

For example, have two bits of training data:

input -> output

1 -> 0

0 -> 1

And a simple network with zero hidden nodes and train it. By hand...

Then add another bit of training data:

0.5 -> 1.5

Notice that it is now impossible to fit the training data exactly, however
many training iterations we do. Now add a hidden layer with one or two nodes.
Now we can perfectly fit the data, but show that depending on initialization
weights we might never get there through gradient descent. Nows the time to
mention different types of optimizers, momentum, etc.

~~~
nabla9
You don't need 'math backround' to understand the notation. This is high
school math. Maybe check some basics you have forgotten from Wikipedia.

I'm relatively sure that they teach basics derivatives, functions etc. in high
school in every country.

~~~
swebs
In the US it varies by state and even city, but they did not teach calculus in
high school in Philadelphia at least. Students could chose whether to take
statistics or pre-calculus. Pre-calc was just basically trigonometry.

~~~
nabla9
You learn something new every day.

Here in Finland the students are divided into "long" and "short" math. Both
will learn basics of calculus and derivatives at least.

------
stared
It is only me who gets frustrated by networks drawn upside-down (i.e. data
flowing from down to up? IMHO it is a poor convetion, mindlessly repeated.

In English we read from top to bottom. Data flows (be it equations or flow
charts) typically follow the same convention, so we can read articles in in a
coherent way. Even trees (both data structures and decision trees), grow from
their roots downwards (so, against their original biological metaphor). At
least most of researchers write neural networks from left to right, consistent
with English.

More of this point:
[https://www.reddit.com/r/MachineLearning/comments/6j28t9/d_w...](https://www.reddit.com/r/MachineLearning/comments/6j28t9/d_why_do_people_draw_neural_networks_upside_down/)

~~~
zerostar07
forward and back usually refer to right- and leftwards in 2D so it should
probably be drawn left-to-right

~~~
stared
It's not clear what you mean.

\- top->bottom is not compatible with left->right \- "back" propagation is
"back" for a reason, so it should go against the normal (forward) direction

~~~
zerostar07
E.g. in Feynman diagrams time flows forwards from left to right.

------
pastelsky
While we're at this, I found the explainations by 3Brown1Blue to be very
intuitive when it comes to neural networks, especially for folks who're new
and don't necessarily grasp concepts when explained primarily through math.

What is backpropagation really doing?
[https://youtu.be/Ilg3gGewQ5U](https://youtu.be/Ilg3gGewQ5U)

His other videos on this topic are just as good.

------
nsthorat
Unfortunately there is no attribution, but this tool was created by Daniel
Smilkov, who also built TensorFlow Playground and who is a cocreator of
TensorFlow.js.

[https://twitter.com/dsmilkov](https://twitter.com/dsmilkov)

~~~
ehsankia
Also, going back one on the URL leads you to the Google ML crash course, which
I guess this is part of:

[https://google-developers.appspot.com/machine-
learning/crash...](https://google-developers.appspot.com/machine-
learning/crash-course/)

Definitely worth checking out.

------
teekert
Sorry, this is not "visual explanation", this is "math heavy explanation with
some drawings next to it".

------
jorgeleo
Love the animation and the math explanation.

Only one, I hope constructive, criticism: Too many formulas without numbers.
It will help the explanation if you include numbers and how the results are
calculated. Not everybody is comfortable with the chain rule to distribute the
error across the individual weights

~~~
H1Supreme
Agree. Although I've been putting effort into learning more Calculus (former
art student!), Linear Algebra, and the like, a version of this with actual
numbers would go a long way.

------
Chilinot
Nice demonstration, but misses to explain the bias value in the forward
propagation step. While quite important, this value is often left out when
demonstrating the propagation function. So having it in warrants a short
description in my opinion.

It also skips over the bias value in the back propagation step.

~~~
ironSkillet
The bias is just another input node whose value happens to be constant
(granted with full connectivity), right? So the motivating idea/derivation of
backpropogation doesn't change.

~~~
Chilinot
The bias is often updated/corrected in the back propagation step as well. The
purpose of the value is to shift the activation function along the x-axis
basically. While the weights define the slope of the activation function.

------
inputcoffee
Great Viz. Although my favorite is still the video of Andrej Karpathy's
Stanford lecture explaining it.

He sort of goes through some other implications which, from an intuition
massaging pov, is great.

[https://www.youtube.com/watch?v=i94OvYb6noo](https://www.youtube.com/watch?v=i94OvYb6noo)

For those who are impressed by such things -- as I am -- he is now head of AI
or ML or something at Tesla.

------
apexalpha
What a nice website. No weird flashy banners, no external scripts, zoomable,
no ads, no tracking.

Just text accompanied by great visualisations.

Kudos!

------
keithnz
I found in firefox scrolling down left a lot of thing greyed out, I had to
highlight and unhighlight things to get them to show properly.

------
2bitencryption
question about this part:

> f(x) has to be a non-linear function, otherwise the neural network will only
> be able to learn linear models.

I thought one of the most common functions was relu, which is linear (but cuts
off to 0 for x values below 0)

[https://en.wikipedia.org/wiki/Rectifier_(neural_networks)](https://en.wikipedia.org/wiki/Rectifier_\(neural_networks\))

~~~
brchr
It’s exactly the cutoff at zero that puts the "kink" into the function which
makes ReLU nonlinear. :)

------
michaelmior
Am I the only one who starts seeing some LaTeX creep in halfway through
instead of the rendered formulas?

------
sandGorgon
Does anyone know if there is a tool that helps to create these kind of
presentations?

~~~
pc86
This just uses [http://scrollerjs.com](http://scrollerjs.com) and
[https://d3js.org](https://d3js.org).

Any tool that combines these things is by necessity going to limit your
creativity with them. Using the tools themselves is your best bet.

------
jakemor
this is so freaking cool.

------
mehh
underwhelming

~~~
pvg
You don't need to sign your comments on HN.

