(a)Velleman's "How to Prove It"
(b)Gries and Schneider's "A Logical Approach to Discrete Math"
(c) Calculus (best "lite" book - Calculus by Strang (free download), best "heavy" books - (d) Calculus by Spivak, (e) Principles of Mathematical Analysis a.k.a "Baby Rudin")
(f) Discrete Math (ALADM above + (g) a good book on Algorithms, Cormen will do - though working through it comprehensively is ... hard!
(h) Linear Algebra (First work through Strang's book, then (i) Axler's)
(j) Probability (see Bradford's very comprehensive recommendations) and
(k) Statistics (I would reccomend Devore and Peck for the total beginner but it is a damn expensive book. So hit a library or get a bootlegged copy to see if it suits you before buying a copy, see brad's list for advanced stuff.)
(l) Information Theory (MacKay's book is freely available online)
(m)AIMA 3d Edition (I prefer this to Mitchell)
(n) "Pattern Recognition and Machine Learning" by Christopher Bishop,
(o)"Elements of Statistical Learning" (free download).
(p) Neural Network Design by Hagan Demuth and Kneale,
(q) Neural Networks, A Comprehensive Foundation (2nd edition) - By Haykin (there is a newer edition out but I don't know anything about that, this is the one I used)
(r) Neural Networks for Pattern Recognition ( Bishop).
At this point you are in good shape to read any papers in NN. My reccomendations - anything by Yann LeCun and Geoffrey Hinton. Both do amazing research.
(s) Reinforcement Learning - An Introduction by Barto and Sutton (follow up with "Recent Advances In reinforcement Learning" (PDF) which is an old paper but a GREAT introduction to Hierarchical Reinforcement learning)
(t) Neuro Dynamic Programming by Bertsekas
(u) Introductory Techniques for 3-D Computer Vision, by Emanuele Trucco and Alessandro Verri.
(v) An Invitation to 3-D Vision by Y. Ma, S. Soatto, J. Kosecka, S.S. Sastry. (warning TOUGH!!)
(w) Probabilistic Graphical Models: Principles and Techniques (Adaptive Computation and Machine Learning) - not about robotics per se but useful to understand the next book
(x) Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) by Thrun, Burgard and Fox
PS: I own all these books (except AIMA 3 for which I only have pre pub pdfs) and if any HN folks in Bangalore want to browse before they buy anything (friggin expensive when you add amazon's postage) send me email.
PPS: on languages, I think Bradford is on the money with regards to reccomending functional languages. I would just say, also know C well. Saved my ass a few times.
Mike Jordan at Berkeley sent me his list on what people should learn for ML. The list is definitely on the more rigorous side (ie aimed at more researchers than practitioners), but going through these books (along with the requisite programming experience) is a useful, if not painful, exercise.
I personally think that everyone in machine learning should be
(completely) familiar with essentially all of the material in the
following intermediate-level statistics book:
Casella, G. and Berger, R.L. (2001).
For a slightly more advanced book that's quite clear on mathematical
techniques, the following book is quite good:
Ferguson, T. (1996).
"A Course in Large Sample Theory"
Chapman & Hall/CRC.
You'll need to learn something about asymptotics at some point, and
a good starting place is:
Lehmann, E. (2004).
"Elements of Large-Sample Theory"
Those are all frequentist books. You should also read something
Gelman, A. et al. (2003).
"Bayesian Data Analysis"
Chapman & Hall/CRC.
and you should start to read about Bayesian computation:
Robert, C. and Casella, G. (2005).
"Monte Carlo Statistical Methods"
On the probability front, a good intermediate text is:
Grimmett, G. and Stirzaker, D. (2001).
"Probability and Random Processes"
At a more advanced level, a very good text is the following:
Pollard, D. (2001).
"A User's Guide to Measure Theoretic Probability"
The standard advanced textbook is
Durrett, R. (2005).
"Probability: Theory and Examples"
Machine learning research also reposes on optimization theory.
A good starting book on linear optimization that will prepare
you for convex optimization:
Bertsimas, D. and Tsitsiklis, J. (1997).
"Introduction to Linear Optimization"
And then you can graduate to:
Boyd, S. and Vandenberghe, L. (2004).
Getting a full understanding of algorithmic linear algebra is
also important. At some point you should feel familiar with
most of the material in
Golub, G., and Van Loan, C. (1996).
It's good to know some information theory. The classic is:
Cover, T. and Thomas, J.
"Elements of Information Theory"
Finally, if you want to start to learn some more abstract math,
you might want to start to learn some functional analysis (if you
haven't already). Functional analysis is essentially linear algebra
in infinite dimensions, and it's necessary for kernel methods, for
nonparametric Bayesian methods, and for various other topics.
Here's a book that I find very readable:
Kreyszig, E. (1989).
"Introductory Functional Analysis with Applications"
Keep in mind that Mike Jordan is a superhuman math machine. I remember his undergraduate research assistants at Cal were telling me that it would take grad students days to understand 5 minute proofs he would do on the fly.
My go-to book for Machine Learning is Christopher Bishops Pattern Recognition and Machine Learning. I've read that book cover-to-cover and its got an excellent foundation and covers all those other books in some capacity.
Can one fit this study list in a life time? Seriously, this has been a problem for me for a long time. Any one of the mentioned books would take me months to study. It'd take me a month to just read a textbook, without any toil. Am I too slow, or are there some study/reading techniques that I'm not aware of?
Thanks to liebke for posting this, and Bradford for writing it.
I have two points.
First, your point about programming is incredibly important. I've worked with people who had amazing insights about statistical problems, but went cross-eyed upon being asked about SVN and Git. This makes a CS homework assignment unpleasant, and a real world research project impossible.
Second, this really begs another post. It should be called "Learning how to read a textbook on your own." Successful self-learners don't just __read__ a textbook. They toil with it, try proving things on paper themselves, work through exercises, attempt to apply it to some real-world situation, and hunt down someone who's smarter than they are to explain something that seems unclear.
Not all textbooks - certainly not every one on your list - can be read with a great application in mind. A reader must interrogate mercilessly the book on analysis or the rigorous probability book mentioned in the post.
This seems intuitive to someone who can successfully learn on their own, but most people are not taught to do that. The difficulties of relicating a portion of the classroom learning experience is a major barrier to entry. This is why online intro lectures for programming, math, and certain CS topics like algorithms can steer a learner in the right direction. Stanford Engineering Everywhere and of course MIT's OCW, links to which have been posted on HN at least once a month, are great starts.
" It should be called "Learning how to read a textbook on your own." Successful self-learners don't just __read__ a textbook. They toil with it, try proving things on paper themselves, work through exercises, attempt to apply it to some real-world situation, and hunt down someone who's smarter than they are to explain something that seems unclear."
I wasted 3 years trying to avoid this bit. On the positive side, once you learn to do this you will never be afraid of any book or paper again.
For anyone doing the self-learning thing, right now there's 24 people that just started learning Stanford CS229 Machine Learning here: http://www.crunchcourse.com/class/stanford-cs229-machine-lea... (disclosure: Crunch Course is my website. I just thought it might be a good resource for people taking hamilton's advice.)
I like his language toolbelt for this kind of work.
At a minimum, I would recommend learning python (numpy/scipy), R, and at least one nice functional language (probably Haskell, Clojure, or OCaml).
This is effectively what I use as well. Python as a general purpose data munging library that's good for all of your dirty work whenever you need it. R for graphing, graphing, graphing, running statistical tests other people already wrote and foolproofed, database munging, and then more graphing. Haskell for prototyping and reasoning with types and then that occasional algorithm that screams for functional implementation or the not so occasional one that requires more speed than Python can provide.
I also write a few things in C/C++, though I try to avoid it. It's mostly there for standing on the backs of other people and that occasional need to blaze.
There is a website http://software-carpentry.org/ intended to teach scientists the basics of reliable programming. It's python based and covers version control, debugging, shells, testing, and the basics of databases and other stuff that would be useful for scientists without computer training. Since a lot of practical stuff like this isn't covered in any systematic way in most comp sci programs, it is useful even for people who have programming experience.
Interesting on the one hand, but is anybody seriously going to go through those books one by one now? Personally I have troubles going through just one book (Pattern Recognition by Bishop atm), and even that might be useless without practical application. I managed to eventually read through MacKay (enjoyable book and available as a free PDF, too) and feel I have already forgotten most of it again :-(
Another way might be to just get going, and pick up knowledge on the way?
"but is anybody seriously going to go through those books one by one now? Personally I have troubles going through just one book (Pattern Recognition by Bishop atm), and even that might be useless without practical application"
Working through CLRS completely is a very time consuming task I think Bradford intended that book as a reference, but yes, you need to work through some of the stuff in order. For example, you need to be fairly conversant in Linear Algebra, probability, and proof technique before you can tackle Bishop, else you won't make much progress. Once you get some basics under you (especially the underlying math stuff) you'll end up being able to read through an ML book the way you can read through a moderately tough book in programming.
" I managed to eventually read through MacKay (enjoyable book and available as a free PDF, too) and feel I have already forgotten most of it again :-("
The best way to learn this stuff is to have an eventual project in mind. I ended up learning most of this stuff because I was working on a Robotics project for the fine folks in the Indian Defence Depts and was very much "thrown in at the deep end" - nothing like it to accelerate learning but I wouldn't wish to do it again. for the first few weeks I couldn't (literally) understand a single sentence in an hour long meeting. Very humbling.
Depending on what exactly you wish to do, you maybe able to avoid many of the books. If you think I can help you narrow down to a smaller list , please ask here or send me email (my email id is in my profile).
But yes, in the end Norvig's point applies here too (as Bradford points out. I have been working in ML for 8 years now so still 2 years to go :-P) .
OTOH I am just a programmer who got bored with enterprise software and have no formal training in math (or CS for that matter) and if I can do it anyone (certainly anyone on HN) can.
To get the spirit of many of these techniques with practical examples in python using numpy/scipy check out the book "Machine Learning: An Algorithmic Perspective" by Stephen Marsland. It doesn't have the mathematical depth or proofs found in these other books, but the code is decent and will get you started doing some basic data analysis.
My advice is not to try to read through these books (like Elements or Bishop's book). If I were to learn the topic from scratch again * I would:
1. Learn basic terminology (basically, skim the chapters and understand roughly what the topics are)
2. Work on a problem in depth. You are probably interested in a certain area or type of problem.
a. Read the relevant chapters in detail.
b. Pick up the necessary math along the way using additional references. This way you are motivated to learn it (whether it be calculus, probability, or linear algebra). E.g., it would be hard to approach McDiarmid's Inequality and be able to imagine its use. However, if you run across it in a book/paper you'll understand the context.
c. Lastly, checkout recent NIPS, ICML, and JMLR papers on the topic (nips.cc, jmlr.org, and icml isn't centralized, but each conference can probably be found online).
* - I am a graduate student and have been studying statistical machine learning for the last 3 years.
I found this useful and interesting, because a great deal of social phenomena are not normally distributed (do not fit a Gaussian bell-curve distribution, regardless of the size of the sample).
I am interested in learning more about non-parametric statistics, and statistics using alternative distributions (e.g. stable distributions, power laws, etc.). Does anyone have good references for this kind of statistics (preferably written in English as opposed to jargon)?
Why second edition intro to algs? Why not third? also, considering that intro to algs is one of the "Books Programmers Claim to Have Read" http://www.billthelizard.com/2008/12/books-programmers-dont-... , so those planning to read it, note that it is best learned in a classroom setting where you are forced to work through the problems.
Finally, interesting that he does amazon reference links to all these books, hopefully profit opportunities didn't taint the items on his list!
The 2nd edition can be had for less than $30 used. The cheapest you can get the 3rd edition for is ~$60. For $30 you can pick up a linear algebra textbook used. If somebody had only $60 to spend, it's a no-brainer to pick the 2nd edition + a linear algebra textbook.
Someone pointed out the 3rd ed was out in the blog comments. I hadn't noticed, so thanks.
I agree that the CLR book is best learned by working through problems. For those that may not have retainend as much as they would have liked from classroom work, I think needing to use algorithms and programming them yourself also works very well. In that vein, CLR is also a great reference text.
I don't make any money from Amazon. Believe me, my friend, it is purely the other way around. ;-)
The publishers have a contract with distributors in those poorer countries that prohibits those distributors from distributing the books here. But, anybody who DOESN'T have a contract with the book publisher can buy from the distribitor and sell them here, unless their local laws prohibit that. And, nothing in OUR laws prohibits buying them--by now it's well established that you cannot really enforce laws against buying books.
Do you want the international edition? I have a few of them, and they are better than nothing. But, the paper is so thin that I get distracted by the words/figures on the other side of the page.