Mike Jordan at Berkeley sent me his list on what people should learn for ML. The list is definitely on the more rigorous side (ie aimed at more researchers than practitioners), but going through these books (along with the requisite programming experience) is a useful, if not painful, exercise.
I personally think that everyone in machine learning should be
(completely) familiar with essentially all of the material in the
following intermediate-level statistics book:
Casella, G. and Berger, R.L. (2001).
For a slightly more advanced book that's quite clear on mathematical
techniques, the following book is quite good:
Ferguson, T. (1996).
"A Course in Large Sample Theory"
Chapman & Hall/CRC.
You'll need to learn something about asymptotics at some point, and
a good starting place is:
Lehmann, E. (2004).
"Elements of Large-Sample Theory"
Those are all frequentist books. You should also read something
Gelman, A. et al. (2003).
"Bayesian Data Analysis"
Chapman & Hall/CRC.
and you should start to read about Bayesian computation:
Robert, C. and Casella, G. (2005).
"Monte Carlo Statistical Methods"
On the probability front, a good intermediate text is:
Grimmett, G. and Stirzaker, D. (2001).
"Probability and Random Processes"
At a more advanced level, a very good text is the following:
Pollard, D. (2001).
"A User's Guide to Measure Theoretic Probability"
The standard advanced textbook is
Durrett, R. (2005).
"Probability: Theory and Examples"
Machine learning research also reposes on optimization theory.
A good starting book on linear optimization that will prepare
you for convex optimization:
Bertsimas, D. and Tsitsiklis, J. (1997).
"Introduction to Linear Optimization"
And then you can graduate to:
Boyd, S. and Vandenberghe, L. (2004).
Getting a full understanding of algorithmic linear algebra is
also important. At some point you should feel familiar with
most of the material in
Golub, G., and Van Loan, C. (1996).
It's good to know some information theory. The classic is:
Cover, T. and Thomas, J.
"Elements of Information Theory"
Finally, if you want to start to learn some more abstract math,
you might want to start to learn some functional analysis (if you
haven't already). Functional analysis is essentially linear algebra
in infinite dimensions, and it's necessary for kernel methods, for
nonparametric Bayesian methods, and for various other topics.
Here's a book that I find very readable:
Kreyszig, E. (1989).
"Introductory Functional Analysis with Applications"
Can one fit this study list in a life time? Seriously, this has been a problem for me for a long time. Any one of the mentioned books would take me months to study. It'd take me a month to just read a textbook, without any toil. Am I too slow, or are there some study/reading techniques that I'm not aware of?
Keep in mind that Mike Jordan is a superhuman math machine. I remember his undergraduate research assistants at Cal were telling me that it would take grad students days to understand 5 minute proofs he would do on the fly.
My go-to book for Machine Learning is Christopher Bishops Pattern Recognition and Machine Learning. I've read that book cover-to-cover and its got an excellent foundation and covers all those other books in some capacity.