I highly recommend the book and this online course, both of which are FREE.
Hastie and Tibshirani's other book, "The Elements of Statistical Learning," is also excellent but far more theoretical, and best for experienced practicioners who want to use it as a reference guide.
Even though the book is introductory material, it explains everything with great clarity. I have yet to find a better resource for someone wanting to get into data science. The theoretical material side-by-side with coded examples is very powerful (although the labs could use some work). Even if you have been using data science for a while, this is a great refresher on theoretical concepts.
I note that because the curriculum seems to focus on more machine-leaning-esque algorithms like Random Forests and SVM, which are heavy in linear algebra, but the class itself states it is not math heavy.
In R, you typically implement these algorithms...by installing a package from CRAN that contains the algorithm. (I skimmed through the linked book at that appears to be the case)
Nothing wrong with that, of course. Although a more proper intro to R would require teaching how to use the factor data type without going insane. (Processing data is half the battle, and something I wish I learned at college)
However, the default R functions still load data as factors and I have bad memories of that catching me by surprise, particularly when performing mathematical operations on vectors which are secretly factors and silently getting wrong results.
I still need to use factors with ggplot2 though for discrete orders. (E.g bar chart sorting)
Are you maybe Hadley Wickham? If yes, you probably already know this :-) My brother bought me "Advanced R" for Christmas.
edit: data.frame, not read.table
I'm planning to try the exercises in python too after doing them in R, reference material for this exists already: https://github.com/sujitpal/statlearning-notebooks
- Graphical models
- Probabilistic logic, probabilistic programming, bayesian data analysis, etc, etc.
Can anyone point to good introductory texts?
I know for the first, Koller's text is highly recommmended but I'm not sure if it's at the introductory level.
I would guess that's more approachable than a text book, but who knows.
Been through parts of both and they seem really good. That said, I'd bet that both of these classes would be a lot more meaningful after either Hastie's class (listed here) or Andrew Ng's course.
Not really (depending on exactly what "stuff like that" means)
Also, how valuable a skill-set is R?
If you don't know any statistical processing package/language/whatever, then learning one is valuable, and R isn't a bad one to know.
Sounds like an updated course in otherwise just mostly classic multivariate statistics. I have about four feet of such on my bookshelf.
Okay, have some more recent versions of regression and Breiman's work on classification and regression trees and random forests and some resampling. Right.
Okay. Fine. Maybe as good as multivariate statistics has been for decades, maybe a little better. Fine. They are using R instead of, say, SPSS, SAS, or the old IBM SSL? Okay.
Been doing that stuff since early in my career, have published in essentially multi-variate resampling, and am currently using some derivations I did for a modified, generalized version of some of that material. Fine.
All that said, where did this stuff about learning enter? This is nearly all old stuff; where'd this learning stuff come from?
Getting estimates of parameters? Sure. Estimates that are minimum variance and unbiased? If can, sure -- good stuff. Confidence intervals on the estimates? Of course. Statistical hypothesis tests of significance -- standard stuff. A variance-covariance matrix, symmetric positive definite? Sure. Polar decomposition, non-negative eigenvalues, a basis of orthogonal eigenvectors, singular value decomposition, matrix condition number, ..., definitely.
But learning? Where'd that come from?
Learning sounds like an attempt, like neural networks, artificial intelligence, deep learning to suggest that this work is somehow close to human thinking, which, Virginia, I have to guarantee you it definitely is not.
They didn't get learning
from statistics. Okay, maybe they got learning from computer science. So sounds like statistics just stole learning from computer science or computer science stole learning from psychology and stole all the rest from statistics. Academic theft? Old wine in new bottles with a new label, "Learning"? Add mud to the waters, stir rapidly and confuse people? Hype? Sell books? Get students? Get consulting gigs? Get grants? Get publicity?
Doesn't look good.
Students trying to learn: For the classic multivariate statistics, there are stacks of highly polished texts. For more on closely related statistics, look at the work of Leo Breiman. For hypothesis testing, that is nearly all classic Statistics 101. For resampling, look at the work of P. Diaconis at Stanford or B. Efron at Yale. In that way, you will have mostly highly polished, classic sources. Econometrics has made a lot of use of this material. Also look at analysis of variance, which is closely related and quite powerful and heavily used in some fields of experimental science, e.g., agriculture. For "learning", mostly f'get about it.
SVM and Decision tree's are still top performers in many ML domains, and the course covers both.
Machine Learning is nothing like learning in humans or psychology.
The course looks like it stole the word learning from psychology and stole nearly all the rest from classic statistics and looks like stolen content relabeled.
To me, that the content of the course and text, and I have to conclude that I am quite familiar with nearly all of it, uses the word learning is inappropriate hype.
I object to the hype.
Students be warned: As I wrote, essentially all of that material is from statistics and nearly all from classic statistics and has essentially nothing to do with learning in any reasonable meaning of that word.
For computer science to make what is now broad misuse of the word learning is a big step up in hype and a big step down in academic quality and responsibility.
Ah, a controversial view for HN!
You also find these things called "pattern recognition".
More over, saying an introductory course to anything is "stealing" is ridiculous.
Well, apparently it's a "secret" to whomever selected the title of the book.
Congrats on seeing that they are just trying to predict and not explain. Breiman came to that position a long time ago and explained his position. So, they are like Ptolemy and his circles within circles fitting the astronomical data on the planets and, then, predicting the motions of the planets instead of Newton who, with his calculus, law of gravity, and second law of motion, both predicted and explained the motion of the planets. With a lot of assumptions, some commonly justified in practice only with a flexible imagination, can do explanation -- to be believed only after believing the assumptions. A little better is the approach of factor analysis since do have orthogonality where, thus, really can identify the unique contributions that sum to the whole prediction. It's just that then the factors are super tough to explain.
Since everyone sees that this applied math called computer science learning really is just some basic statistics, then why the heck is the HN community so eager to swallow taking some statistics, calling it computer science, and saying that it's about learning?
> Bayesian probability
I will give you some warm, wise, rock solid advice: Take "Bayesian probability" and drop it into the wet, round bowl and pull the chain. For anyone talking about it, do the same with them or at least their material. That's very much not the good stuff -- it's like farming by plowing with a wooden stick. Then learn about the central topic in probability, conditioning based on the Radon-Nikodym result, e.g., with a nice proof by von Neumann. Now you are up conditional expectation, regular conditional probabilities, Markov processes, martingales, the strong Markov property, and the serious stuff. E.g., for random variables X, Y, E[Y|X] is a random variable and the best non-linear least squares approximation of Y from X. And, for some collection A of random variables, possibly uncountably infinite, E[Y|sigma(A)], where sigma(A) is the sigma algebra generated by the set A, is the best least squares approximation possible from all the random variables, used jointly, in A. Really, it's more powerful to condition on a sigma algebra than directly on the random variables that generate the sigma algebra. And there's much more, e.g., sufficient statistics. With Bayesian, you are crawling; with conditioning you are flying supersonic. If want to argue with me about theft of statistics and misuse of learning, then you will need another source on dumping Bayesian for conditioning. For that, read a text on, say, graduate probability by any of, say, Chung, Breiman, Neveu, Loeve and any of a few more.
> More over, saying an introductory course to anything is "stealing" is ridiculous.
Sure, would be if didn't relabel it as learning and computer science. Call it Stat 101, 102, 201, 202 -- fine. Call it computer learning or some such -- theft and BS.
I'm surprised at your willingness to swallow and smile at that theft of Stat 101, etc., and that just hype use of the word learning, both of which are just wildly inappropriate.
Gee, wait until the computer science big data people discover sufficient statistics!
The credit goes to the field of statistics -- give credit where it is due. Call the material by it's appropriate name -- statistics or multi-variate statistics or statistical hypothesis testing or resampling plans in statistics. The content is statistics and very definitely not computer science. E.g., a lot of the more recent content was directly from Breiman, and he was definitely not a computer scientist. When the social scientists, the biomedical scientists, and the agricultural scientists studied statistics, and they did study a lot of it, they called it, right, statistics. When the economists won Nobel prizes for applying linear programming, the called it linear programming. It is true that they called dual variables shadow prices. Similarly for quadratic programming (H. Markowitz). When the chemists used group representation theory in molecular spectroscopy, they called it, right, group representation theory. When the oil patch people were looking for oil by using the fast Fourier transform to do de-convolution of acoustic signals, they call it Fourier theory. Having computer science steal statistics and call it computer science learning is not the standard academic approach.
There's something going on with computer science that is not good and not clear.
This is really simple stuff. Somehow this simple stuff is controversial at HN. Not so good. Come on, guys -- there's a world outside the computer science department, and much of the best of the future of computing will come from that world and not anything within current computer science departments. E.g., for the last paper I published on a problem in computer science, the computer science chaired profs and journal Editors in Chief couldn't understand or review the math -- finally a EE prof could and did. BTW, when the high end EE guys work with stochastic integration, e.g., as in E. Wong's book, they call it stochastic integration. Amazing. Similarly for the high end finance guys.
More generally computer science keeps misusing words to suggest that their work is somehow close to what humans do when thinking, and that suggestion is misuse of language, hype, or worse. They should stop it.
Yes, but this point, that as you correctly note can be seen in two minutes, is controversial here at HN.