

Show HN: Machine learning cheat sheet - Emore
http://eferm.com/machine-learning-cheat-sheet

======
teuobk
In case you see the cheat sheet and think, "Wow, I'd love to understand that,"
there's an excellent (albeit challenging) complete course on machine learning
in Stanford's "engineering everywhere" online repository.
[http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a...](http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1)

~~~
natfriedman
Another option is "Programming Collective Intelligence," by Toby Segaran. I
read through it recently on a long flight to Australia. It's one of the most
straight-forward AI books out there, presenting most of these algorithms in
just a few pages with nice sample Python code and diagrams. A perfect
intro/refresher, and it takes a web developer perspective on these techniques.

Since reading it I've noticed how many friends have it on their bookshelves.

Here's a link: <http://oreilly.com/catalog/9780596529321>

~~~
klochner
I haven't read the COIN book, but if you want to get aggressive you can go for
"Elements of Statistical Learning".

Free pdf download, probably not a one-flight book:

<http://www-stat.stanford.edu/~tibs/ElemStatLearn/>

side note: Nat, did you intern at SGI in the late 90s, as the self-titled
"armchair programmer of the apocalypse"?

------
iskander
All the algorithms requiring training can be optimized using stochastic
gradient descent-- which is very effective for large data sets (see
<http://leon.bottou.org/research/stochastic>)

Also, here are some additions for the online learning column:

* Online SVM: <http://www.springerlink.com/index/Y8666K76P6R5L467.pdf>

* Online gaussian mixture estimation: [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.1698&rep=rep1&type=pdf)

One more thing: why no random forests? Or decision tree ensembles of any sort?

~~~
Emore
Thanks for the comments!

The course unfortunately couldn't cover all material on all algorithms, so the
cheat sheet basically reflects my own knowledge rather than what's possible.
I've referenced the Online SVM and Online Mixture model though, thanks for
those.

Also, I'll have to look into stochastic gradient descent!

------
imurray
KNN _"no learning involved"_ : one probaby wants to cross-validate K at the
least, if not learn the metric.

Some methods say online learning isn't applicable. As pointed out elsewhere,
objectives for K-means and mixture models could be fitted with stochastic
gradient descent. In general there is always an online option. For example,
keep a restricted set of items and chuck out ones that seem less useful as
others come in.

(Aside: I have a _very_ introductory lecture to machine learning on the web:
<http://videolectures.net/bootcamp2010_murray_iml/> — not for anyone that
knows the methods on this cheat sheat!)

~~~
Emore
Thanks for the comments!

Good point about using cross-validation to learn K, I forgot about that. I
added this to the cheat sheet.

Also regarding online learning methods, I was probably a bit quick to dismiss
certain algorithms as not supporting online learning; in coursework we
unfortunately didn't have time to delve into all aspects of all algorithms.
I've rewritten the Online column as "To be added." for those online methods
I'm not familiar with (yet). Someone else is, of course, free to fork it on
Github: <http://github.com/Emore/mlcheatsheet>

------
cloudkj
Nice summary; I like the format as well. However, the title of the cheat sheet
is misleading since (a) many of the algorithms listed can be used for non-
linear classification and (b) some of them can be considered supervised
learning, such as naive Bayes and perceptron since they're trained with sample
inputs and expected outputs (supervisory signals).

Otherwise, this is awesome. Hopefully you will add to it, and make it
available in web form.

~~~
Emore
Thanks for the feedback!

I've changed the title to "Algorithms for Supervised- and Unsupervised
Learning", which is definitely more appropriate. Initially the cheat sheet
only contained linear classifiers, hence the misleading title.

------
ses
Fantastic work, I have an ML exam coming up and this should really help. If
I'm honest its one of the subjects I've struggled with the most. It seems
experts in the field while incredibly intelligent, have a hard time breaking
the material down into structured and easily digestible pieces of information.

------
MatthewB
No idea what i'm looking at but it definitely looks cool.

------
axxl
I'm taking this class next semester, downloaded it so hopefully I'll
understand it later and it will come in use. Thanks!

