In the textbook "multivariate stats" by izenman, (http://www.amazon.com/gp/product/0387781889/ ) , he claims that stats & ML progressed in parallel. So traditional stats techniques like OLS, multiple regression, nonlinear regression, logistic regression, GLMs are generally not covered in ML. Similarly ML topics like k-means, svm, random forests etc. are not taught by the stats dept.
What is happening in this past decade is a convergence of stats & ML, primarily driven by data scientists working in the domain of big data. The stats folks are slowly incorporating ML techniques into stats & finding rigorous heuristics for when they should be employed. Similarly ML guys, who are mostly CS folk who unfortunately have taken only 1 course on undergraduate stats & probability, are discovering you can do so much more without resorting to needless large-scale computation, by sampling intelligently & leveraging plain old statistics.
This schism between stats & ML can be leveraged very profitably during interviews :))
When I interview data science folks, I usually ask very simple questions from plain stats - how would you tell if a distribution is skewed...if you have an rv X with mean mu, and say rv Y = X-mu, then what is the mean of Y...if you have an rv A with mean 0 variance 1, then what are the chances of being 3 standard deviations away from the mean if you have no clue about the distribution of A ? What if you knew A was unimodal ? What if A is now normally distributed ?
Now if its a stats guy, I ask very simple ML....what is perceptron, have you heard of an neural network etc.
surprisingly, the stats guys do much better on ML than the ML guys on stats!
Not surprising, really. ML is the shiny new thing, so the MLers don't tend to feel they missed anything while the statisticians need to keep up with the times.
I say this as an MLer still struggling to find out what R^2 is, among other things ..
Its just a stupid fraction.
say you have a dataset ie. sequence of (x,y) tuples. In OLS, you try to fit a line onto the dataset. So your manager wants to know how well the line fit your dataset. If it does a bang-up job, you say 100% aka rsquare of 1. If it does a shoddy job, you say 0% aka rsquare of 0. Hopefully your rsq is much closer to the 1 than to the 0.
Respectfully, it's not a stupid fraction. It is a fundamental quantity arising from the linear algebraic interpretation of correlation.
Correlation induces an inner product on the set of zero-mean random variables. The regression coefficient is precisely the projection coefficient <x,y>/<y,y> and R^2 is precisely the Cauchy-Schwarz ratio <x,y>^2 / <x,x><y,y> (i.e. the product of the two projection coefficients between x and y).
It is a theoretically natural measure of linear quality-of-fit. It has the added bonus of being equal to the ratio of modeled variance to total variance (variance being the square-norm of a random variable in the norm induced by the correlation inner product).
It's also very very cheap to compute. Though there are more practically useful measures of "predictive power", like mutual information, R^2 does an admirable job for an O(1)-space and O(num data)-time predictiveness metric.
I suspect that I learned more about R^2 by reading these comments in the order presented (informal, then formal) than I would have had they been reversed.
What is happening in this past decade is a convergence of stats & ML, primarily driven by data scientists working in the domain of big data. The stats folks are slowly incorporating ML techniques into stats & finding rigorous heuristics for when they should be employed. Similarly ML guys, who are mostly CS folk who unfortunately have taken only 1 course on undergraduate stats & probability, are discovering you can do so much more without resorting to needless large-scale computation, by sampling intelligently & leveraging plain old statistics.
This schism between stats & ML can be leveraged very profitably during interviews :))
When I interview data science folks, I usually ask very simple questions from plain stats - how would you tell if a distribution is skewed...if you have an rv X with mean mu, and say rv Y = X-mu, then what is the mean of Y...if you have an rv A with mean 0 variance 1, then what are the chances of being 3 standard deviations away from the mean if you have no clue about the distribution of A ? What if you knew A was unimodal ? What if A is now normally distributed ?
Now if its a stats guy, I ask very simple ML....what is perceptron, have you heard of an neural network etc.
surprisingly, the stats guys do much better on ML than the ML guys on stats!