Hacker News new | past | comments | ask | show | jobs | submit login
Introduction to Statistical Learning, with Applications in R (stanford.edu)
222 points by fitzwatermellow on Jan 11, 2016 | hide | past | web | favorite | 46 comments

The lecturers here, Hastie and Tibshirani, are also the authors of the classic text book, "Introduction to Statistical Learning," probably the best introduction to machine/statistical learning I have ever read.[1]

I highly recommend the book and this online course, both of which are FREE.

Hastie and Tibshirani's other book, "The Elements of Statistical Learning," is also excellent but far more theoretical, and best for experienced practicioners who want to use it as a reference guide.[2]


[1] http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.p... [2] http://statweb.stanford.edu/~tibs/ElemStatLearn/

Very helpful course, I highly recommend it. If nothing else, get a copy of the book:


Even though the book is introductory material, it explains everything with great clarity. I have yet to find a better resource for someone wanting to get into data science. The theoretical material side-by-side with coded examples is very powerful (although the labs could use some work). Even if you have been using data science for a while, this is a great refresher on theoretical concepts.

Latest printing (only difference is corrections, I believe):


Any idea if there is a list of changes?

If you've followed my posts on Hacker News, you may notice I'm a HUGE user of R. I actually started using R because the classes I took as a statistics minor in college required the use of R. And it sucked, mostly because trying to reimplement statistical algorithms can be very fussy (it's not as well suited toward linear algebra as MATLAB and Octave)

I note that because the curriculum seems to focus on more machine-leaning-esque algorithms like Random Forests and SVM, which are heavy in linear algebra, but the class itself states it is not math heavy.

In R, you typically implement these algorithms...by installing a package from CRAN that contains the algorithm. (I skimmed through the linked book at that appears to be the case)

Nothing wrong with that, of course. Although a more proper intro to R would require teaching how to use the factor data type without going insane. (Processing data is half the battle, and something I wish I learned at college)

Or just avoid using factors in the first place. Character vectors are usually more appropriate and a modern ingest pipeline won't produce factors in the first place.

Thanks to readr's stringsAsFactors=F default, I've fortunately been avoiding unexpected issues due to factoring.

However, the default R functions still load data as factors and I have bad memories of that catching me by surprise, particularly when performing mathematical operations on vectors which are secretly factors and silently getting wrong results.

I still need to use factors with ggplot2 though for discrete orders. (E.g bar chart sorting)

Yeah I've been thinking about making a new class based on top of character vectors that allows custom ordering

After much frustration with a project I'm working on, yesterday I stumbled across using data.frame instead of cbind.

Are you maybe Hadley Wickham? If yes, you probably already know this :-) My brother bought me "Advanced R" for Christmas.

edit: data.frame, not read.table

So glad that stringsAsFactors exists. Saves me a lot of time.

Couldn't agree more. New users I help often get stuck when they run into factors. Skip the whole class! Strings are the way to go.

readr - Doesn't place factors in the first place. R reader written by Hadley Wickham https://cran.r-project.org/web/packages/readr/README.html

Did you notice the username you're replying to?

No LOL I thought I was replying back to minimaxir.

The difference between writing software (including machine learning, but a ton of useful software has to lunge data of some sort of another) and learning programming reminds me of the difference between learning medicine in school and then learning how to actually practice medicine in residency.

To add to that, many of the practical data problems I run into almost never use most of the techniques you learn in classes. Either it is a small subset which can be more robust in data handling, or it is a technique or package not covered in classwork.

Daniela Witten, a coauthor on the course textbook, has a couple very interesting videos on statistical learning with graphical models of cancer genetics. A good one is here: https://www.youtube.com/watch?v=jmnJiXA5fm0

Thanks for sharing, great videos.

Thanks for this!

I took this course last year. I had a degree in statistics already. I just wanted to go through this GREAT book and watch these great professors. If you have any interest in learning actual statistics, this is a great introduction. Again, I can't repeat this enough but both the book and the professors are great!

Thank you @phillipamann. I am currently overwhelmed by the number of great MOOCs starting in Jan 2016 and wanted to know if this one would be worth it.

I have been waiting for this session to begin for a few months now (the course ran just once in 2015).

I'm planning to try the exercises in python too after doing them in R, reference material for this exists already: https://github.com/sujitpal/statlearning-notebooks

Even if all you're interested in is deep learning, let me just say that the topics covered in this are absolutely vital if you're to be taken seriously as a machine learning practitioner.

This text misses two very important sub-areas of statistical learning, so much so that I can't believe they gave their textbook such a broad name ("statistical learning", more like a subset of it. The whole book doesn't have a single mention of the word "markov". Seriously disappointed.).

- Graphical models

- Probabilistic logic, probabilistic programming, bayesian data analysis, etc, etc.

Can anyone point to good introductory texts?

I know for the first, Koller's text is highly recommmended but I'm not sure if it's at the introductory level.

As someone in the field, I don't think there are any texts at below the level of Koller/Friedman. Universities rarely if ever teach graphical models to students below the level at which Koller/Friedman is appropriate.

Implementation of these graphical models in R is dkne really well with the gRain package.

Im not sure graphical models should be the default tool people reach for

There's a coursera course on Probabilistic Graphical Models: https://www.coursera.org/course/pgm

I would guess that's more approachable than a text book, but who knows.

Kruschke - Doing Bayesian Data Analysis (2nd ed)

For someone interested in neural networks, deep learning and stuff like that, is this a useful course? Also, how valuable a skill-set is R?

Two courses I've found are very good for Neural Networks & Deep Learning are Karpathy's CS231n from Stanford[0] and Nando de Freitas's Deep Learning class from Oxford[1][2].

Been through parts of both and they seem really good. That said, I'd bet that both of these classes would be a lot more meaningful after either Hastie's class (listed here) or Andrew Ng's course.

[0] http://cs231n.github.io/

[1] https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPo...

[2] http://www.cs.ox.ac.uk/teaching/courses/2014-2015/ml/

There is not much here about deep learning or neural networks, but from a "how to measure and rate models" perspective, it is very helpful. I consider R to be a very useful skill-set. I have used it for solving many issues, both in and out of stats.

More broadly, are you interested in getting a job in machine learning? If yes, then this stuff is a requirement. You won't be taken seriously if all you know is deep learning.

For someone interested in neural networks, deep learning and stuff like that, is this a useful course?

Not really (depending on exactly what "stuff like that" means)

Also, how valuable a skill-set is R?


I usually hear from skilled people that the language is usually not a big deal and I tend to agree. I mean if you know what you're doing, the framework is not that important. It's just a constant-time overhead to learn how things are in R, if you already know it in, say, Matlab or the python-numpy-scipy ecosystem.

Yes. If you know - say Python + Scikit + Numpy, or Matlab then R isn't that valuable.

If you don't know any statistical processing package/language/whatever, then learning one is valuable, and R isn't a bad one to know.

This course is more of an introduction to statistics.

No, it's an introduction to Machine Learning (without much neural network stuff).

This post has a critical side. Some readers will be offended. But most readers should be glad that they got some hype peeled back and learned a little more.

Sounds like an updated course in otherwise just mostly classic multivariate statistics. I have about four feet of such on my bookshelf.

Okay, have some more recent versions of regression and Breiman's work on classification and regression trees and random forests and some resampling. Right.

Okay. Fine. Maybe as good as multivariate statistics has been for decades, maybe a little better. Fine. They are using R instead of, say, SPSS, SAS, or the old IBM SSL? Okay.

Been doing that stuff since early in my career, have published in essentially multi-variate resampling, and am currently using some derivations I did for a modified, generalized version of some of that material. Fine.

All that said, where did this stuff about learning enter? This is nearly all old stuff; where'd this learning stuff come from?

Getting estimates of parameters? Sure. Estimates that are minimum variance and unbiased? If can, sure -- good stuff. Confidence intervals on the estimates? Of course. Statistical hypothesis tests of significance -- standard stuff. A variance-covariance matrix, symmetric positive definite? Sure. Polar decomposition, non-negative eigenvalues, a basis of orthogonal eigenvectors, singular value decomposition, matrix condition number, ..., definitely.

But learning? Where'd that come from?

Learning sounds like an attempt, like neural networks, artificial intelligence, deep learning to suggest that this work is somehow close to human thinking, which, Virginia, I have to guarantee you it definitely is not.

They didn't get learning from statistics. Okay, maybe they got learning from computer science. So sounds like statistics just stole learning from computer science or computer science stole learning from psychology and stole all the rest from statistics. Academic theft? Old wine in new bottles with a new label, "Learning"? Add mud to the waters, stir rapidly and confuse people? Hype? Sell books? Get students? Get consulting gigs? Get grants? Get publicity?

Doesn't look good.

Students trying to learn: For the classic multivariate statistics, there are stacks of highly polished texts. For more on closely related statistics, look at the work of Leo Breiman. For hypothesis testing, that is nearly all classic Statistics 101. For resampling, look at the work of P. Diaconis at Stanford or B. Efron at Yale. In that way, you will have mostly highly polished, classic sources. Econometrics has made a lot of use of this material. Also look at analysis of variance, which is closely related and quite powerful and heavily used in some fields of experimental science, e.g., agriculture. For "learning", mostly f'get about it.

Learning comes from Machine Learning. I hope you don't think neural nets are the only thing ever in machine learning or artificial intelligence.

SVM and Decision tree's are still top performers in many ML domains, and the course covers both.

SVM (support vector machines) is an application of curve fitting related to discriminate analysis in classic multi-variate statistics. The connections with trees are likely much as in the Breiman, et al., Classification and Regression Trees (CART), a classic in applied statistics. Breiman was a good mathematician, probabilist, and applied statistician.

Machine Learning is nothing like learning in humans or psychology.

The course looks like it stole the word learning from psychology and stole nearly all the rest from classic statistics and looks like stolen content relabeled.

To me, that the content of the course and text, and I have to conclude that I am quite familiar with nearly all of it, uses the word learning is inappropriate hype.

I object to the hype.

Students be warned: As I wrote, essentially all of that material is from statistics and nearly all from classic statistics and has essentially nothing to do with learning in any reasonable meaning of that word.

For computer science to make what is now broad misuse of the word learning is a big step up in hype and a big step down in academic quality and responsibility.

Ah, a controversial view for HN!

It's no secret that machine learning is mostly statistics and Bayesian probability, typically with a focus on prediction rather than explanation, but the distinction is at best blurry.

You also find these things called "pattern recognition".

More over, saying an introductory course to anything is "stealing" is ridiculous.

> It's no secret that machine learning is mostly statistics and Bayesian probability, typically with a focus on prediction rather than explanation, but the distinction is at best blurry.

Well, apparently it's a "secret" to whomever selected the title of the book.

Congrats on seeing that they are just trying to predict and not explain. Breiman came to that position a long time ago and explained his position. So, they are like Ptolemy and his circles within circles fitting the astronomical data on the planets and, then, predicting the motions of the planets instead of Newton who, with his calculus, law of gravity, and second law of motion, both predicted and explained the motion of the planets. With a lot of assumptions, some commonly justified in practice only with a flexible imagination, can do explanation -- to be believed only after believing the assumptions. A little better is the approach of factor analysis since do have orthogonality where, thus, really can identify the unique contributions that sum to the whole prediction. It's just that then the factors are super tough to explain.

Since everyone sees that this applied math called computer science learning really is just some basic statistics, then why the heck is the HN community so eager to swallow taking some statistics, calling it computer science, and saying that it's about learning?

> Bayesian probability

I will give you some warm, wise, rock solid advice: Take "Bayesian probability" and drop it into the wet, round bowl and pull the chain. For anyone talking about it, do the same with them or at least their material. That's very much not the good stuff -- it's like farming by plowing with a wooden stick. Then learn about the central topic in probability, conditioning based on the Radon-Nikodym result, e.g., with a nice proof by von Neumann. Now you are up conditional expectation, regular conditional probabilities, Markov processes, martingales, the strong Markov property, and the serious stuff. E.g., for random variables X, Y, E[Y|X] is a random variable and the best non-linear least squares approximation of Y from X. And, for some collection A of random variables, possibly uncountably infinite, E[Y|sigma(A)], where sigma(A) is the sigma algebra generated by the set A, is the best least squares approximation possible from all the random variables, used jointly, in A. Really, it's more powerful to condition on a sigma algebra than directly on the random variables that generate the sigma algebra. And there's much more, e.g., sufficient statistics. With Bayesian, you are crawling; with conditioning you are flying supersonic. If want to argue with me about theft of statistics and misuse of learning, then you will need another source on dumping Bayesian for conditioning. For that, read a text on, say, graduate probability by any of, say, Chung, Breiman, Neveu, Loeve and any of a few more.

> More over, saying an introductory course to anything is "stealing" is ridiculous.

Sure, would be if didn't relabel it as learning and computer science. Call it Stat 101, 102, 201, 202 -- fine. Call it computer learning or some such -- theft and BS.

I'm surprised at your willingness to swallow and smile at that theft of Stat 101, etc., and that just hype use of the word learning, both of which are just wildly inappropriate.

Gee, wait until the computer science big data people discover sufficient statistics!

The credit goes to the field of statistics -- give credit where it is due. Call the material by it's appropriate name -- statistics or multi-variate statistics or statistical hypothesis testing or resampling plans in statistics. The content is statistics and very definitely not computer science. E.g., a lot of the more recent content was directly from Breiman, and he was definitely not a computer scientist. When the social scientists, the biomedical scientists, and the agricultural scientists studied statistics, and they did study a lot of it, they called it, right, statistics. When the economists won Nobel prizes for applying linear programming, the called it linear programming. It is true that they called dual variables shadow prices. Similarly for quadratic programming (H. Markowitz). When the chemists used group representation theory in molecular spectroscopy, they called it, right, group representation theory. When the oil patch people were looking for oil by using the fast Fourier transform to do de-convolution of acoustic signals, they call it Fourier theory. Having computer science steal statistics and call it computer science learning is not the standard academic approach.

There's something going on with computer science that is not good and not clear.

This is really simple stuff. Somehow this simple stuff is controversial at HN. Not so good. Come on, guys -- there's a world outside the computer science department, and much of the best of the future of computing will come from that world and not anything within current computer science departments. E.g., for the last paper I published on a problem in computer science, the computer science chaired profs and journal Editors in Chief couldn't understand or review the math -- finally a EE prof could and did. BTW, when the high end EE guys work with stochastic integration, e.g., as in E. Wong's book, they call it stochastic integration. Amazing. Similarly for the high end finance guys.

I don't know anything about computer science or statistics, and I never once thought this course or machine learning in general ever had anything to do with actual learning in humans. The distinction between this and cognition is pretty clear, and it would be obvious to anyone 2 minutes into the first presentation.

Yup, and that's why adding the word learning to some material from the long well-established field of statistics is high hype, low academic standards, and inappropriate. So, I am objecting. As you note, can tell in two minutes that what computer science does with the word learning is inappropriate.

More generally computer science keeps misusing words to suggest that their work is somehow close to what humans do when thinking, and that suggestion is misuse of language, hype, or worse. They should stop it.

Yes, but this point, that as you correctly note can be seen in two minutes, is controversial here at HN.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact