
Deep Learning from Scratch Part 3: Generalizing Gradient Descent - jeff_ridgeway4
https://jdridgeway.com/deep-learning-from-scratch-generalizing-gradient-descent/
======
blackbear_
Sorry to be pedantic but you should not confuse gradient descent (the
procedure to optimize the weights) with the way in which you compute the
gradients.

Generalized gradient descent is used when minimizing a function that is not
differentiable at all inputs and has nothing to do with deep learning.

~~~
bonoboTP
I urge beginners not to learn these things from blog posts, even if it seems
more digestible. Take a reputable textbook and you will be better off. Or take
a MOOC, like Andrew Ng's or courses from MIT and Stanford on Youtube. It may
look harder, but it has all been carefully didactically structured. It's
harder on the long run to learn from a non-expert.

There's so much confusion out there, blog posts confusing cross-entropy with
"softmax loss", gradient descent with backpropagation etc. etc.

~~~
kevinskii
I prefer to combine the approaches. Read blogs to get a general sense of
what's worth studying more deeply. Then study it more deeply with a textbook
or MOOC. Then read blogs and watch videos to help clarify the concepts that
you didn't quite grasp. Usually your understanding will be solid enough by
that point that you'll be able to judge the quality of the material pretty
well.

------
russdpale
You should post the links to the other parts at the top of your articles..

------
f6v
Grokking deep learning is similar in style and is very accessible if you don't
like formulas.

