The only thing I'd probably add is that there's a pretty significant gap going from learning linear algebra to more advanced topics such as LDA.
For people who are just getting started with machine learning, it's probably best to get started with implementing some of the more "intuitive" algorithms such as decision trees, k-means, and naive Bayes before moving over to some of the more recent academic work.
Other things that are pretty useful, but often forgotten, such as feature selection, data normalization, and even data visualization. Algorithms are usually just one part of machine learning, but even the best algorithm wouldn't be able to do anything without identifying what the best features of your data are.
Still, it's a great list of more advanced topics, and definitely something I'll keep bookmarked for future reference.
To bridge the gap between naive Bayes and LDA, I would recommend going from k-means to EM and then from EM to variational Bayes. K-means to EM is covered in chapters 20 (pp. 286-), 22 (pp. 300-) and 33 (pp. 422-) of MacKay's ITILA [1] (excellent and free book BTW). I recommend to learn about (== apply on something) the junction tree algorithm because you will have to brush on graph theory. Also, do more convex optimization beforehand than I did, or you will have to catch up: take a full course or full book on it.
For LDA you'll need to understand Dirichlet processes, I find the introduction by Frigyik et al. [2] to be excellent for that. You may need to read A Measure Theory Tutorial (Measure Theory for Dummies) by Gupta [3] before. Finally, I put there the two most influential LDA papers to me: [4] and then [5].
In case the measure theory put anyone off: you don't need Dirichlet Processes for plain LDA, just the finite-dimensional http://en.wikipedia.org/wiki/Dirichlet_distribution (which isn't so bad and a very useful tool in Bayesian stats as the conjugate prior for discrete observations)
For some of the non-parametric variants like hierarchical dirichlet process LDA, you need DPs, but that stuff is pretty hardcore -- don't walk before you can run.
Another route to LDA (assumes some Bayesian stats basics):
* Learn a bit about Markov chains if you don't know them already
* Read up on sampling-based approximate inference methods and find a proof that a Gibbs sampler converges (or just take it on trust...)
* Read the classic Griffiths and Steyvers paper deriving a collapsed Gibbs sampler for LDA [1]
Thanks! Sure, I didn't mean to imply that once you learn Linear Algebra you are all done - just that it you really need it and will make your life much easier once you do.
Yeah, I didn't include a ton of stuff - nothing on EM, trees, boosting, etc. Prob the biggest thing missing is something explicitly about regularization. Maybe I will add something there.
Please feel free to add stuff you think others might be interested in down in the comments section of the post.
Thanks again for the comments and taking the time to read through it and don't forget to sign up for a free account!
Sorry for the slightly off-topic comment, but you should avoid shortening 'latent dirichlet allocation' to LDA without context because it also means 'linear discriminant analysis'. Linear discriminant analysis has historically been an important topic in general machine learning.
I was wondering, could you list some of the more recent academic works? I've touched up on the basics and feel pretty comfortable with them so I want to try something a bit more advanced.
I've spent the last 1.5 years as a machine learning PhD student slowly discovering many of these resources and topics, and I wish I had had this list at the beginning - it contains most of the gems I've found. I'd add that PGM course on Coursera clearly explains fundamental topics in probabilistic graphical models.
It's important to understand individual algorithms, but in many ways it's more important to have a broad overview of the field and its more modern methods, so that given a problem it's possible to think about the best way to solve it, and to share a common language with others who may have ideas. Beyond this list and various online courses, I've found that talking to people about their work and explain the high-level concepts of every black-box classifier or similarity metric or whatever it is they use has been quite educational
I don't think it measures up as much. It's basically a dumbed down version of his actual Stanford course. If you have any experience in Math and Linear Algebra, you should take a more serious course like his lectures in Youtube or Caltech's course on Youtube. The ideal audience of the Coursera course is non-math people who want to get an idea of what machine learning is.
I reached out to Yann LeCun and he emailed me a couple more recent links. I updated the deep learning section of the post to include them. Feel free to check them out.
The only thing I'd probably add is that there's a pretty significant gap going from learning linear algebra to more advanced topics such as LDA.
For people who are just getting started with machine learning, it's probably best to get started with implementing some of the more "intuitive" algorithms such as decision trees, k-means, and naive Bayes before moving over to some of the more recent academic work.
Other things that are pretty useful, but often forgotten, such as feature selection, data normalization, and even data visualization. Algorithms are usually just one part of machine learning, but even the best algorithm wouldn't be able to do anything without identifying what the best features of your data are.
Still, it's a great list of more advanced topics, and definitely something I'll keep bookmarked for future reference.