In case you see the cheat sheet and think, "Wow, I'd love to understand that," there's an excellent (albeit challenging) complete course on machine learning in Stanford's "engineering everywhere" online repository. http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a...
Another option is "Programming Collective Intelligence," by Toby Segaran. I read through it recently on a long flight to Australia. It's one of the most straight-forward AI books out there, presenting most of these algorithms in just a few pages with nice sample Python code and diagrams. A perfect intro/refresher, and it takes a web developer perspective on these techniques.
Since reading it I've noticed how many friends have it on their bookshelves.
While it does a great job of explaining many AI concepts in an unintimidating fashion, the Python code in it is rather buggy. On the balance, I'd still recommend it as an intro.
All the algorithms requiring training can be optimized using stochastic gradient descent-- which is very effective for large data sets (see http://leon.bottou.org/research/stochastic)
Also, here are some additions for the online learning column:
The course unfortunately couldn't cover all material on all algorithms, so the cheat sheet basically reflects my own knowledge rather than what's possible. I've referenced the Online SVM and Online Mixture model though, thanks for those.
Also, I'll have to look into stochastic gradient descent!
KNN "no learning involved": one probaby wants to cross-validate K at the least, if not learn the metric.
Some methods say online learning isn't applicable. As pointed out elsewhere, objectives for K-means and mixture models could be fitted with stochastic gradient descent. In general there is always an online option. For example, keep a restricted set of items and chuck out ones that seem less useful as others come in.
Good point about using cross-validation to learn K, I forgot about that. I added this to the cheat sheet.
Also regarding online learning methods, I was probably a bit quick to dismiss certain algorithms as not supporting online learning; in coursework we unfortunately didn't have time to delve into all aspects of all algorithms. I've rewritten the Online column as "To be added." for those online methods I'm not familiar with (yet). Someone else is, of course, free to fork it on Github: http://github.com/Emore/mlcheatsheet
Nice summary; I like the format as well. However, the title of the cheat sheet is misleading since (a) many of the algorithms listed can be used for non-linear classification and (b) some of them can be considered supervised learning, such as naive Bayes and perceptron since they're trained with sample inputs and expected outputs (supervisory signals).
Otherwise, this is awesome. Hopefully you will add to it, and make it available in web form.
I've changed the title to "Algorithms for Supervised- and Unsupervised Learning", which is definitely more appropriate. Initially the cheat sheet only contained linear classifiers, hence the misleading title.
Fantastic work, I have an ML exam coming up and this should really help. If I'm honest its one of the subjects I've struggled with the most. It seems experts in the field while incredibly intelligent, have a hard time breaking the material down into structured and easily digestible pieces of information.