"The Elements of Statistical Learning" (https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLI...) is far and away the best book I've seen.
It took me hundreds of hours to get through it, but if you're looking to understand things at a pretty deep level, I'd say it's well-worth it.
Even if you stop at chapter 3, you'll still know more than most people, and you'll have a great foundation.
Hope this helps!
ESL pays short shrift to the computational complexity of learning whereas UML explicitly handles both statistical and computational complexity concerns. It doesnt matter how statistically pure your algorithm is if its running time scales exponentially with your data.
All of UML's chapters are conceptually unified even when discussing different ML algorithms, with ESL being more of a grab-bag by chapter.
Still, both high quality and free!
It feels like machine learning is only taught by academia but the majority of the audience is for practical use by the average developer wanting to play with it today.
In my opinion, physics students learn the best framework for thinking and get a very good mathematical intuition. For example, here's a problem from an introductory QM book that really threw me for a loop when I was studying:
A needle of length L is dropped at random onto a sheet of paper ruled with parallel lines a distance L apart. What is the probability that the needle will cross a line?
A circle of radius L is centered x far away from a border 0<x<L. This is because one end of the needle will always end in some zone (the center) and the other end will be L far away (the circle).
How much of the 2pi boundary is outside the zone?
When x -> 0 then it's going to be 50% since one boundary line becomes a tangent and the other goes through the middle. When we move x by k (e.g. f(x+k)) then 2k new points will be added on the left side while 2*k points leave the boundary on the right side. When x=L/2 then the boundary lines will split the circle in four equal parts (since they're tangent to the radius at r/2 on both sides) so intuitively its 50%.
This is true. I can't really say why, but after my Discrete Mathematics class a lot of my Computer Science problems became a lot easier to reason about.
Thickness of line is needed right? Otherwise P approaches 100% as thickness approaches 0?
The "lines" are like a sample of a point from a uniform distribution U with width L, and h is an interval inside U. The probability a number sampled from a distribution of width L will fall within interval h is h/L. Substituting for h gives p(cross|x) = sin(x).
Then assuming the needle is equally likely to drop at any angle, for any one angle theta we get probability density p(theta=x) = 1/(pi/2-0)= 2/pi.
The probability the needle drops at angle x AND crosses a line is the product of p(theta=x)p(cross|x)= (2/pi)sin(x). As mentioned, x can range between 0 and pi/2. To get the probability the needle drops at angle x1 OR x2 OR x3, etc and cross we need to sum all these. So take the integral of (2/pi)sin(x) from 0:pi/2. This gives 2/pi.
You will be surprised: you can make significant gains by including the zip code - i've seen that happen in a competitive setting. Where you live probably contains some signal about your credit worthiness.
Having said that, of course it doesn't make sense to simply feed the raw zip code to the tree. An appropriate encoding (most people would use a one-hot encoding, though there exist better ones) of the zip code will be key to extracting signal in a robust way.
>> ... the tree will have serious overfitting issues
Isn't it an almost standard practise now to use an ensemble of trees, such as a random forest? Decision trees have long been known to be prone to overfitting.
I guess I should still brush up on math though, it seems.
2. math prereqs at UCSC:http://people.ucsc.edu/~praman1/static/pub/math-for-ml.pdf
3. math prereqs at UMD: https://www.umiacs.umd.edu/~hal/courses/2013S_ML/math4ml.pdf
- 3 books: Barber, MacKay, and Rasmussen/Williams from this list: https://www.reddit.com/r/MachineLearning/comments/1jeawf/mac...
I can skip the math part - we don't all invent new algorithms - but what I really need is a large enough & gradual set of problems to solve (datasets + verification scripts). I mean, start from the simplest and teach people how to use the already available software. Machine Learning should be assimilated practically, too much theory with too little application is useless. Most of us should focus on using existing software efficiently instead of being able to implement backprop.
Note: I'd strongly recommend the Anaconda Python distribution , as it has pretty much everything you need. Also, for immediate feedback on what you're doing with Python, I've fallen in love with Jupyter Notebooks (formerly IPython Notebooks) , which you'll have as part of the Anaconda distribution, along with all the other popular Python packages for scientific work.
On the other hand, the math you absolutely need to follow along is pretty straightforward, so you can hopefully find tutorials that emphasize the applications and graphic examples of what is going on.
Ultimately, ML is a mathematical discipline. You can ask for a gentle approach that gets you to the foot of the mountain, but "if you want to learn about nature, to appreciate nature, it is necessary to understand the language that she speaks in." If you want to be more than an amateur, there's not much substitute for getting comfortable with math at the level of, say, Kevin Murphy's book.
The good news is that the required math is fairly elementary - calculus, linear algebra, probability and statistics, all freshmen or maybe sophomore-level topics - so it shouldn't be beyond reach of a motivated developer able to set aside some time to learn. MOOCs and organizing study groups with friends/co-workers can help a lot here as well.
I'm in the middle of that process right now. I only took Calc I in college and that was 20 years ago, so I have decided to work my way through a Calc sequence, Differential Equations, Linear Algebra, and Probability and Statistics through a combination of MOOCs, "X For Dummies" books, Youtube videos (hello, Gilbert Strang!), Schaum's Outlines books, Khan Academy, a mammoth stack of college maths texts that I've picked up at used book stores, and questions on stats.stackexchange.com, math.stackexchange.com, learnmath.reddit.com, etc.
I'm doing the Ohio State MOOC on Calc I now on Coursera, and accompanying that with the Gilbert Strang "Highlights of Calculus" video series. So far so good. I definitely think this stuff is learnable if one is willing to put in the time and work, even without going back to taking "on campus" classes at a university.
That said, I find a lot of the introductions to the theory behind ML techniques to be very poorly written. It's often worth giving a new student a conceptual simplification before introducing a rigorous definition.
Without linear algebra and basic probability/calculus though, forget it. Luckily there's great sources to brush up on it.
If you're interested, here's a gentle book that should give you enough background:
That said, there are some graphical examples to help understand of how learning algorithms work in 2 dimensions.
1 : Mathematics in ML - proofs etc.
2 : Understanding the intuition behind ML algorithms without the requirement of higher order math?
Personally, I feel that 2 can be tackled quite easily. The core issue is that most people who teach want to stay on a "higher dimension". ;)
The neural networks chapter is tiny (but that's ok - that's not the focus) and some of the questions are really hard - but overall I've really enjoyed it.
If you speak Hebrew you can get two different professors take on how to teach the material in the book, as well as lecture notes from a total of 3 professors. That's pretty neat if there's a concept you are struggling with as a student!
EDIT: I guess the focus on the theory might help me.
I was forced to learn a massive amount in a short period of time for work, but I'd previously watched Andrew Ng's lectures, as well as majored in Math/CS. I can also generally recommend Hinton's NN lectures, Socher's Deep learning for NLP, Andrew Ng's Machine Learning, and a few books.
Top comment here whines that math is hard.