

Guide to Linear Regression - alexhwoods
http://alexhwoods.com/2015/07/19/guide-to-linear-regression/

======
gpcz
A related really cool thing I learned in grad school is that you can implement
Deming regression
([https://en.wikipedia.org/wiki/Deming_regression](https://en.wikipedia.org/wiki/Deming_regression))
by storing the moments (outer product sum and point sum) of the training
points and then finding the dominant eigenvector of the outer product sum
using a singular value decomposition. Since you can approximate the
directional part of a 2x2 SVD with atan2(), it effectively becomes a O(1)
operation to add or remove training points.

(specifics here:
[https://april.eecs.umich.edu/courses/eecs568_f12/linefitting...](https://april.eecs.umich.edu/courses/eecs568_f12/linefitting.pdf)
)

~~~
jaytaylor
That is very cool. I love how closely the formal math relates to the actual
code needed/involved.

------
bluusteel
I thought closed form expressions existed for linear regression. Why is
gradient descent needed?

~~~
srean
That closed form requires matrix inversion. And that is almost always (but not
always) a bad idea. It is numerically unstable/sensitive and more expensive
than need be. Sometimes you also need a quick ball-park figure of the answer
and the ability to query the answer anytime. These can be done with iterative
algorithms (gradient descent is one such iterative algorithm, conjugate
gradient descent would be an improvement of that). With the closed form you
have to wait till it finishes going through the motions. Its an all or nothing
deal. OTOH you can stop an iteration anytime and peek at the current estimate
of the answer.

If this comment has even one takeaway, I would like that to be "don't invert,
unless you are very sure that is exactly what you need". In some scenarios
inverses are indeed required, solving linear equations are almost always not
that scenario.

~~~
santaclaus
Implicit in this discussion is the fact that the matrix is positive definite,
no?

~~~
shoo
yeah.

and helpfully, in this context, if you're starting from the point of
minimising the L2 error between the output of some linear function and a
target, then the resulting linear system will have the form $A^T A x = A^T b$,
so the matrix in question is $A^T A$, so it'll always be positive
semidefinite.

------
cstuder
This is a nice guide, I've also enjoyed playing around with the Ipython
Notebook here:
[http://nbviewer.ipython.org/github/justmarkham/DAT4/blob/mas...](http://nbviewer.ipython.org/github/justmarkham/DAT4/blob/master/notebooks/08_linear_regression.ipynb)

The one question which remains: Is there a more intuitive guide which helps
you with deciding which features you need to choose in order to get a good
regression?

~~~
alexhwoods
Hey, one easy way to decide features is to use a correlation matrix. The
stronger the correlation coefficient r is, the more of a linear relationship
exists.

The code goes like this -

install.packages('corrplot') library(corrplot)

mcor <\- cor(crime) # if crime is your dataframe corrplot(mcor)

That's one easy way to start out. Perhaps I'll write a post on feature
engineering.

~~~
cstuder
Thank you, I will try this. And I would love to see such a post.

------
kastnerkyle
For anyone who is interested in a python version with inferior plots I have a
notebook here:

[http://kastnerkyle.github.io/posts/linear-
regression/](http://kastnerkyle.github.io/posts/linear-regression/)

Closed form is at the bottom, basically invalidating the whole rest of my
post.

------
npx
Gonna use this as the title for my autobiography! :)

