
From Linear Models to Machine Learning (draft) [pdf] - sonabinu
http://heather.cs.ucdavis.edu/draftregclass.pdf
======
azuajef
Thanks for sharing. I also recommend "An Introduction to Statistical Learning
- with Applications in R": [http://www-bcf.usc.edu/~gareth/ISL/](http://www-
bcf.usc.edu/~gareth/ISL/)

~~~
apathy
A direct link to the PDF for ISL is here:

[https://web.stanford.edu/~hastie/local.ftp/Springer/ISLR_pri...](https://web.stanford.edu/~hastie/local.ftp/Springer/ISLR_print6.pdf)

The grownup version of that, ESL, is also available free:

[https://web.stanford.edu/~hastie/local.ftp/Springer/ESLII_pr...](https://web.stanford.edu/~hastie/local.ftp/Springer/ESLII_print10.pdf)

And for people who are genuinely curious about how this segues into graphical
models, NNs, and the autoencoder (maybe the most interesting part of modern
NNs), there's

[https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS...](https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS_corrected_1.4.16.pdf)

The more curious or research oriented may appreciate

[https://web.stanford.edu/~hastie/local.ftp/hastie_glmnet.pdf](https://web.stanford.edu/~hastie/local.ftp/hastie_glmnet.pdf)

I doubt Gareth or Daniela (the primary authors of ISL) would mind my pointing
you towards Hastie's archives since both of them were advised by Trevor Hastie
during their PhDs.

Matloff is a great guy. The chapters on shrinkage and dimension reduction
aren't yet written in his book, and since these are important topics, you
should consider reading the others. These things are mostly of interest for
people who want to draw inference about underlying processes that may be
generating observed outcomes. If all you care about is prediction, fit a
Random Forest or xgboost GBM or a DNN and be done with it. But if you're
actually curious about how complex descriptions of rare events can be
thoughtfully analyzed, this is the standard progression.

Matloff's book is a great introduction. I particularly like the example on
page 204. /ducks

------
RhodesianHunter
Has anyone come across something similar using Python?

~~~
danielmorozoff
Here you go: [https://github.com/JWarmenhoven/ISLR-
python](https://github.com/JWarmenhoven/ISLR-python)

R is a popular language for doing any form of stats/ learning theory work for
research/ academia. Productionwise not as popular.

~~~
blahi
It's pretty popular.

~~~
rm999
I've put substantial amounts of R code into production - it's a nightmare.
Both for development and operationally. I think 2-3 years ago R was still a
superior language for ML/data science dev work. But Python's library support
has really caught up is now mature. The policy I put in place on my current
team is to minimize R in production, with Python and Scala preferred. R in
some cases still has the best machine learning libraries, which is really the
only reason I've found to use it in the production stack. Even then, I prefer
to just keep it at a few lines of R code (load the data, build the model,
handle errors, export the model).

For analyzing data I love R and almost always prefer it to Python.

~~~
eric_bullington
> R in some cases still has the best machine learning librari

Would you mind naming some specific machine learning techniques that are still
better in R? I've been studying machine learning and linear algebra the past
few months, and I'd love to have a try at implementing one myself in Python,
as a learning exercise.

~~~
hcarvalhoalves
Glmnet and Cox proportional hazards regression (survival analysis) are two
recent examples I came across missing Python implementations.

~~~
eric_bullington
Did you look in statsmodels? I appreciate the suggestions, and for a moment I
was hopeful about the need for survival analysis models, but it looks like
both that and GLM are well-covered in the latest version of statsmodels (don't
be misled by the old sourceforge site, there's been a huge flurry of recent
activity in statsmodels, hundreds of new PRs merged, look at Github and the
docs site linked from that repo:
[http://www.statsmodels.org/stable/](http://www.statsmodels.org/stable/)).

There's even a Jupyter notebook comparing the R, Stata (that takes me back,
used Stata in survival analysis class 10 years ago), and Python versions of
proportional hazards regression:
[http://nbviewer.jupyter.org/urls/umich.box.com/shared/static...](http://nbviewer.jupyter.org/urls/umich.box.com/shared/static/epie6pcdk1rgb10zcd5v.ipynb)

~~~
rm999
Glmnet has quite a bit of functionality that is lacking in the Python elastic
net implementations. The most notable is the regularization parameter sequence
grid search (alpha in statsmodel, lambda in glmnet) which works remarkably
well and can be orders of magnitude faster than a traditional grid search.

------
FraaJad
Use of non-monospaced fonts for code fragments in LaTeX composed books must
stop.

