Machine Learning Crash Course: Part 1 337 points by rafaelc on Dec 29, 2016 | hide | past | favorite | 16 comments

 As a counter-argument Linear regression to ML is "goto statement" to programming.Linear regression looks great on paper since you can derive residuals, slopes compare the individual "effects" etc. But thats unnecessary and in some cases wrong when the goal is mere prediction and not explanation. The big difference between ML and statistics is that latter selects a "correct" linear model and then assumes a distribution for "errors" due to pesky reality. The effects are used for explanations (538 Nate Silver style wonk/punditry). Machine Learning on other hand tries to predict as close to observations as possible without imposing a model or caring about an explanation.The simplest introductory Machine Learning approach should not be linear regression but rather a 1- nearest neighbor model.E.g. rather than giving data about house prices and square footage. The question should be "How do you predict price of a house in given location?". "What are relavant features?" (location,location, location,school district,number of rooms,sq ft, etc), "How would you collect labels/data". (Zillow, exclued prices older than 2-3 years).The simplest answer would be that the price is same as that of the neighboring house (closest lat/long) with similar sq foot sold recently. This can then be implemented as a weighted distance metric and tested using Leave one out cross validation (I know not the best metric). But consider how Nearest Neighbors allows us to incorporate location information in a natural manner. That is very important and cannot incorporated in an elegant manner in a linear regression model.A big part of ML is applying different set of methods across several domains. Thus for beginners Teaching ML should not be about teaching limear models or gradient descent but rather how do you start thinking from ML perspective.
 The whole "machine learning is just fancy statistics" discussion that happens on Hacker News endlessly is often pedantic semantics. However, in the case of linear regression, this is basic statistics that is an analysis life skill and has many practical applications outside of the hardcore TensorFlow blog posts. (case in point, I first learned linear regression during my undergrad in a "Statistics for Business" class)
 In practice it's a mix of mathematical fields; depending on approach, you can have traditional Bayesian probability, regression, estimation, euclidean geometry, classic logic or some combination of the above.The math is important for understanding and tweaking. But ML is not just fancy statistics (or math in general). It's _data_. An understanding of the math means understanding what data pairs best with what approach. It also means understanding error analysis. It means cross-validation. These are far more vulnerable areas to a beginner in ML.But the ultimate point is ML is not magic. It's a framework - trenched in mathematics - that provides the building blocks for simulating understanding. You don't need to know the math to use it, but you need it to use it well.
 Linear regression is dope.As a data scientist, I often walk new clients through a linear regression exercise to convey some key concepts about the engagement and demystify what I'll be doing for them.I'm often dealing with people who, much like you, haven't done much with stats since a college "Business Stats" course, so I get a lot of "oh yeah, I vaguely remember this" - but going through it again gives me a good foundation to relate back to as things get more complicated.
 Are there any resources you would suggest for someone in the same boat to get easily reacquainted and rebuild a good foundation?
 The Coursera Machine Learning course just started (I assume you could still join). I just finished the second week (I'm trying to keep a week ahead due to the somewhat unpredictable nature of my schedule lately), and have been enjoying it so far.The first couple weeks are all about univariate and multivariate linear regression (as well as an optional linear algebra refresher on matrix operations).
 I second this course; I took it when it was put on in 2011 in association with Stanford, and called the "ML Class" - its success was the catalyst for the creation of Coursera.
 I occasionally organize ML "orientation" sessions, and one of the first models I talk about is linear regression. For this very reason. Its an excellent bridge between what people are already familiar with and the larger "learn from data" philosophy.
 Brilliant. Would love to see you do a video and post to YouTube
 What's the best way for college freshmen to learn about ML? -- A.I. and ML aren't really topics talked about until upper-divs, which means a year or two out for me.
 You can certainly learn to implement the APIs that are available, but in terms of really understanding I'd say wait a bit.Take classes in probability, and linear algebra, from there you'll begin to have the level of mathematics to be able to really dive in. You'll also have the computer science maturity to better understand the libraries in use, and in truth the field will have advanced a bit in the time it takes for you to get those foundations.There was just a good reddit[1] thread about what topics to focus on in linear algebra and probability that you should be paying attention to, because those two subjects are largely the mathematical foundations of machine learning.
 Thanks for the info! I'm taking linear algebra this upcoming semester so this will be useful. :D