The grownup version of that, ESL, is also available free:
And for people who are genuinely curious about how this segues into graphical models, NNs, and the autoencoder (maybe the most interesting part of modern NNs), there's
The more curious or research oriented may appreciate
I doubt Gareth or Daniela (the primary authors of ISL) would mind my pointing you towards Hastie's archives since both of them were advised by Trevor Hastie during their PhDs.
Matloff is a great guy. The chapters on shrinkage and dimension reduction aren't yet written in his book, and since these are important topics, you should consider reading the others. These things are mostly of interest for people who want to draw inference about underlying processes that may be generating observed outcomes. If all you care about is prediction, fit a Random Forest or xgboost GBM or a DNN and be done with it. But if you're actually curious about how complex descriptions of rare events can be thoughtfully analyzed, this is the standard progression.
Matloff's book is a great introduction. I particularly like the example on page 204. /ducks
R is a popular language for doing any form of stats/ learning theory work for research/ academia. Productionwise not as popular.
For analyzing data I love R and almost always prefer it to Python.
Would you mind naming some specific machine learning techniques that are still better in R? I've been studying machine learning and linear algebra the past few months, and I'd love to have a try at implementing one myself in Python, as a learning exercise.
There's even a Jupyter notebook comparing the R, Stata (that takes me back, used Stata in survival analysis class 10 years ago), and Python versions of proportional hazards regression: http://nbviewer.jupyter.org/urls/umich.box.com/shared/static...
That said, the secret sauce in all of those is FORTRAN.
I'm not sure how well those compare to the R implementations, but they look well-built at first glance.
Any other ideas out there?
R-vs-Python is almost never the problem in production. Interpreted-vs-compiled is almost always the issue. (I'm aware of Numba and similar efforts. Last time I tried it, it sucked on nuts. And Theano is a rather specialized bit that most people don't actually need.)
JMHO. But I've never seen anyone dealing with truly huge data and inference problems that had the low-level bits in anything other than C++ or FORTRAN. I could imagine that Scala can do a pretty good job now, especially if you use Spark a lot. But R vs Python seems like a really stupid question. Use the one that has the libraries you need.
Also google is doing to R what they did to javqscript with v8. Expect GA next year.
I think the right answer is both.