

An Introduction to Gradient Descent and Linear Regression - CrocodileStreet
http://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/

======
throwaway283719
I get that this is suppose to be an introduction to gradient descent, and that
the author chose a simple example to work with to illustrate the method (and
explained it very well - kudos!). However, they should probably at least
mention that the linear regression problem has an analytical solution[0] which
is _much_ faster than using gradient descent.

The strength of gradient descent is that it can be used for more complicated
problems, where an analytical solution isn't known (for example - you can
train a neural network using gradient descent! The standard algorithm uses
back-propagation to compute the errors associated with each layer of the
network, and then the "learning" step shifts each parameter in the direction
of those errors - just like gradient descent).

[0] [http://en.wikipedia.org/wiki/Linear_regression#Least-
squares...](http://en.wikipedia.org/wiki/Linear_regression#Least-
squares_estimation_and_related_techniques)

~~~
mattnedrich
Author here. These are all great points. I chose linear regression because
it's simple and easy to understand. There is obviously a simple closed form
solution to this problem. However, as already mentioned, gradient descent has
some nice advantages like scaling. Further, a "close enough"
solution/approximation might be sufficient depending on the application.

~~~
grayclhn
There are algorithms to add variables and observations to an estimated linear
regression model, so scaling is not an advantage of gradient descent. You
should mention in the post that there are better ways to estimate this model,
even if you're just presenting it as an example.

------
Jemaclus
This is explained very well and makes each step easy to understand. It's not
the best way to do it, but it does explain why gradient descent is awesome and
how the intuition works. Hopefully future posts go into greater detail about
gradient descent and linear regression to do things more efficiently and
faster, but for an intro post, this is most excellent.

Well done.

------
SandB0x
If you want a deeper introduction to this topic, and a surprisingly accessible
piece of mathematics, I recommend the classic paper:
[http://www.cs.cmu.edu/~quake-papers/painless-conjugate-
gradi...](http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf)

~~~
natosaichek
This was my introduction - I used the techniques described to write an
interplanetary trajectory timing optimization tool. (finding good launch
windows between various celestial bodies). It was an excellent read and
eminently transformable into functional code.

------
nullc
It really should mention the closed form solution, it's quite silly to compute
a linear regression this way.

It should also give a nod to robustness, a pair of sufficiently crazy outliers
would leave the line nowhere near the points, seemingly in contradiction that
lines the "fit" better will have a lower error value. (They will, only in the
circular sense where you've defined fit in terms of that particular error
function)...

------
shitgoose
"In our linear regression problem, there was only one minimum. This made our
error surface convex."

Shouldn't it the the other way around?

~~~
jmalicki
Correct... if there is only one minimum, it is unimodal, but not necessarily
convex. For instance, if there are inflection points, it will not be convex.

------
oliwary
There is also a great course running on coursera right now covering this:

[https://class.coursera.org/ml-006](https://class.coursera.org/ml-006)

------
santaclaus
Do machine learning cats use Nesterov methods? Add some inertia to gradient
descent, and boom, kick butt theoretical benefits and super fast in practice.

~~~
ajtulloch
Yes - see e.g. "On the importance of initialization and momentum in deep
learning" from ICML 2013
([http://www.cs.toronto.edu/~fritz/absps/momentum.pdf](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf))
for an overview in the context of NNs.

