
Learning About Statistical Learning - liebke
http://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html
======
hamilton
Thanks to liebke for posting this, and Bradford for writing it.

I have two points.

First, your point about programming is incredibly important. I've worked with
people who had amazing insights about statistical problems, but went cross-
eyed upon being asked about SVN and Git. This makes a CS homework assignment
unpleasant, and a real world research project impossible.

Second, this really begs another post. It should be called "Learning how to
read a textbook on your own." Successful self-learners don't just __read__ a
textbook. They toil with it, try proving things on paper themselves, work
through exercises, attempt to apply it to some real-world situation, and hunt
down someone who's smarter than they are to explain something that seems
unclear.

Not all textbooks - certainly not every one on your list - can be read with a
great application in mind. A reader must interrogate mercilessly the book on
analysis or the rigorous probability book mentioned in the post.

This seems intuitive to someone who can successfully learn on their own, but
most people are not taught to do that. The difficulties of relicating a
portion of the classroom learning experience is a major barrier to entry. This
is why online intro lectures for programming, math, and certain CS topics like
algorithms can steer a learner in the right direction. Stanford Engineering
Everywhere and of course MIT's OCW, links to which have been posted on HN at
least once a month, are great starts.

~~~
imp
For anyone doing the self-learning thing, right now there's 24 people that
just started learning Stanford CS229 Machine Learning here:
[http://www.crunchcourse.com/class/stanford-cs229-machine-
lea...](http://www.crunchcourse.com/class/stanford-cs229-machine-
learning/2010/jan/) (disclosure: Crunch Course is my website. I just thought
it might be a good resource for people taking hamilton's advice.)

~~~
vdm
I didn't know about the idea of 'social learning'. I appreciate the link (and
your full disclosure).

~~~
imp
Yeah, it's kind of an experiment. I think some social interaction and
accountability can help a lot with self-guided learning. We'll see though.

------
pskomoroch
Nice lists, I often recommend these for people who want an introduction to the
field:

"Mathematical Statistics and Data Analysis" by John A. Rice

"All of Statistics: A Concise Course in Statistics" by Larry Wasserman

"Pattern Recognition and Machine Learning" by Christopher M. Bishop

"The Elements of Statistical Learning" by T. Hastie et al <http://www-
stat.stanford.edu/~tibs/ElemStatLearn/>

"Information Theory, Inference, and Learning Algorithms", David McKay
<http://www.inference.phy.cam.ac.uk/itprnn/book.html>

"Introduction to Information Retrieval" - Manning et al.
[http://nlp.stanford.edu/IR-book/information-retrieval-
book.h...](http://nlp.stanford.edu/IR-book/information-retrieval-book.html)

"The Algorithm Design Manual, 2nd Edition" - Steven Skiena
<http://www.algorist.com/>

------
tel
I like his language toolbelt for this kind of work.

 _At a minimum, I would recommend learning python (numpy/scipy), R, and at
least one nice functional language (probably Haskell, Clojure, or OCaml)._

This is effectively what I use as well. Python as a general purpose data
munging library that's good for all of your dirty work whenever you need it. R
for graphing, graphing, graphing, running statistical tests other people
already wrote and foolproofed, database munging, and then more graphing.
Haskell for prototyping and reasoning with types and then that occasional
algorithm that screams for functional implementation or the not so occasional
one that requires more speed than Python can provide.

I also write a few things in C/C++, though I try to avoid it. It's mostly
there for standing on the backs of other people and that occasional need to
blaze.

~~~
billswift
There is a website <http://software-carpentry.org/> intended to teach
scientists the basics of reliable programming. It's python based and covers
version control, debugging, shells, testing, and the basics of databases and
other stuff that would be useful for scientists without computer training.
Since a lot of practical stuff like this isn't covered in any systematic way
in most comp sci programs, it is useful even for people who have programming
experience.

------
Tichy
Interesting on the one hand, but is anybody seriously going to go through
those books one by one now? Personally I have troubles going through just one
book (Pattern Recognition by Bishop atm), and even that might be useless
without practical application. I managed to eventually read through MacKay
(enjoyable book and available as a free PDF, too) and feel I have already
forgotten most of it again :-(

Another way might be to just get going, and pick up knowledge on the way?

~~~
plinkplonk
"but is anybody seriously going to go through those books one by one now?
Personally I have troubles going through just one book (Pattern Recognition by
Bishop atm), and even that might be useless without practical application"

Working through CLRS completely is a _very_ time consuming task I think
Bradford intended that book as a reference, but yes, you need to work through
some of the stuff in order. For example, you need to be fairly conversant in
Linear Algebra, probability, and proof technique before you can tackle Bishop,
else you won't make much progress. Once you get some basics under you
(especially the underlying math stuff) you'll end up being able to read
through an ML book the way you can read through a moderately tough book in
programming.

" I managed to eventually read through MacKay (enjoyable book and available as
a free PDF, too) and feel I have already forgotten most of it again :-("

The best way to learn this stuff is to have an eventual project in mind. I
ended up learning most of this stuff because I was working on a Robotics
project for the fine folks in the Indian Defence Depts and was very much
"thrown in at the deep end" - nothing like it to accelerate learning but I
wouldn't wish to do it again. for the first few weeks I couldn't (literally)
understand a single sentence in an hour long meeting. Very humbling.

Depending on what exactly you wish to do, you maybe able to avoid many of the
books. If you think I can help you narrow down to a smaller list , please ask
here or send me email (my email id is in my profile).

But yes, in the end Norvig's point applies here too (as Bradford points out. I
have been working in ML for 8 years now so still 2 years to go :-P) .

OTOH I am just a programmer who got bored with enterprise software and have no
formal training in math (or CS for that matter) and if I can do it anyone
(certainly anyone on HN) can.

~~~
Tichy
I wish there was a more hands on introduction, somehow. Having read through
MacKay, I didn't feel as if I could just approach a company and suggest to do
data analysis for them.

I suppose I should come up with my own projects, and I have some ideas, but
they always have a huge question mark at the beginning.

~~~
pskomoroch
To get the spirit of many of these techniques with practical examples in
python using numpy/scipy check out the book "Machine Learning: An Algorithmic
Perspective" by Stephen Marsland. It doesn't have the mathematical depth or
proofs found in these other books, but the code is decent and will get you
started doing some basic data analysis.

[http://www.amazon.com/Machine-Learning-Algorithmic-
Perspecti...](http://www.amazon.com/Machine-Learning-Algorithmic-Perspective-
Recognition/dp/1420067184)

Code here: <http://www-ist.massey.ac.nz/smarsland/MLbook.html>

------
justokay
Mike Jordan at Berkeley sent me his list on what people should learn for ML.
The list is definitely on the more rigorous side (ie aimed at more researchers
than practitioners), but going through these books (along with the requisite
programming experience) is a useful, if not painful, exercise.

I personally think that everyone in machine learning should be (completely)
familiar with essentially all of the material in the following intermediate-
level statistics book:

1.) Casella, G. and Berger, R.L. (2001). "Statistical Inference" Duxbury
Press.

For a slightly more advanced book that's quite clear on mathematical
techniques, the following book is quite good:

2.) Ferguson, T. (1996). "A Course in Large Sample Theory" Chapman & Hall/CRC.

You'll need to learn something about asymptotics at some point, and a good
starting place is:

3.) Lehmann, E. (2004). "Elements of Large-Sample Theory" Springer.

Those are all frequentist books. You should also read something Bayesian:

4.) Gelman, A. et al. (2003). "Bayesian Data Analysis" Chapman & Hall/CRC.

and you should start to read about Bayesian computation:

5.) Robert, C. and Casella, G. (2005). "Monte Carlo Statistical Methods"
Springer.

On the probability front, a good intermediate text is:

6.) Grimmett, G. and Stirzaker, D. (2001). "Probability and Random Processes"
Oxford.

At a more advanced level, a very good text is the following:

7.) Pollard, D. (2001). "A User's Guide to Measure Theoretic Probability"
Cambridge.

The standard advanced textbook is Durrett, R. (2005). "Probability: Theory and
Examples" Duxbury.

Machine learning research also reposes on optimization theory. A good starting
book on linear optimization that will prepare you for convex optimization:

8.) Bertsimas, D. and Tsitsiklis, J. (1997). "Introduction to Linear
Optimization" Athena.

And then you can graduate to:

9.) Boyd, S. and Vandenberghe, L. (2004). "Convex Optimization" Cambridge.

Getting a full understanding of algorithmic linear algebra is also important.
At some point you should feel familiar with most of the material in

10.) Golub, G., and Van Loan, C. (1996). "Matrix Computations" Johns Hopkins.

It's good to know some information theory. The classic is:

11.) Cover, T. and Thomas, J. "Elements of Information Theory" Wiley.

Finally, if you want to start to learn some more abstract math, you might want
to start to learn some functional analysis (if you haven't already).
Functional analysis is essentially linear algebra in infinite dimensions, and
it's necessary for kernel methods, for nonparametric Bayesian methods, and for
various other topics. Here's a book that I find very readable:

12.) Kreyszig, E. (1989). "Introductory Functional Analysis with Applications"
Wiley.

~~~
ptuzla
Can one fit this study list in a life time? Seriously, this has been a problem
for me for a long time. Any one of the mentioned books would take me months to
study. It'd take me a month to just read a textbook, without any toil. Am I
too slow, or are there some study/reading techniques that I'm not aware of?

~~~
vdm
Just so you know, you're not the only one who feels this way. There's no way
you could carry that load AND a day job/family.

Maybe tools like incanter will bridge the gap and let application software
practitioners put this research to work.

------
Perceval
I just read yesterday a book on Non-parametric Statistics:
[http://www.amazon.com/gp/product/047045461X/ref=oss_T15_prod...](http://www.amazon.com/gp/product/047045461X/ref=oss_T15_product)

I found this useful and interesting, because a great deal of social phenomena
are not normally distributed (do not fit a Gaussian bell-curve distribution,
regardless of the size of the sample).

I am interested in learning more about non-parametric statistics, and
statistics using alternative distributions (e.g. stable distributions, power
laws, etc.). Does anyone have good references for this kind of statistics
(preferably written in English as opposed to jargon)?

~~~
bradfordcross
[http://www.amazon.com/All-Nonparametric-Statistics-
Springer-...](http://www.amazon.com/All-Nonparametric-Statistics-Springer-
Texts/dp/0387251456)

------
kaddar
This is a good list.

That said,

Why second edition intro to algs? Why not third? also, considering that intro
to algs is one of the "Books Programmers Claim to Have Read"
[http://www.billthelizard.com/2008/12/books-programmers-
dont-...](http://www.billthelizard.com/2008/12/books-programmers-dont-really-
read.html) , so those planning to read it, note that it is best learned in a
classroom setting where you are forced to work through the problems.

Finally, interesting that he does amazon reference links to all these books,
hopefully profit opportunities didn't taint the items on his list!

~~~
bradfordcross
Someone pointed out the 3rd ed was out in the blog comments. I hadn't noticed,
so thanks.

I agree that the CLR book is best learned by working through problems. For
those that may not have retainend as much as they would have liked from
classroom work, I think needing to use algorithms and programming them
yourself also works very well. In that vein, CLR is also a great reference
text.

I don't make any money from Amazon. Believe me, my friend, it is purely the
other way around. ;-)

~~~
kaddar
Ok awesome, just wanted to check :)

------
plinkplonk
I wrote a supplementary blog post to Brad's (
[http://pindancing.blogspot.com/2010/01/learning-about-
machin...](http://pindancing.blogspot.com/2010/01/learning-about-machine-
learniing.html)) with the list of books I found useful (no amazon referral
links if anyone is worried) with brief descriptions of each. I work in
somewhat different domains and so have a different list of books.

I hope someone finds this useful.

For those who want a summary,

Proof Technique

(a)Velleman's "How to Prove It" (b)Gries and Schneider's "A Logical Approach
to Discrete Math"

Math

(c) Calculus (best "lite" book - Calculus by Strang (free download), best
"heavy" books - (d) Calculus by Spivak, (e) Principles of Mathematical
Analysis a.k.a "Baby Rudin")

(f) Discrete Math (ALADM above + (g) a good book on Algorithms, Cormen will do
- though working through it comprehensively is ... hard!

(h) Linear Algebra (First work through Strang's book, then (i) Axler's)

(j) Probability (see Bradford's very comprehensive recommendations) and

(k) Statistics (I would reccomend Devore and Peck for the total beginner but
it is a damn expensive book. So hit a library or get a bootlegged copy to see
if it suits you before buying a copy, see brad's list for advanced stuff.)

(l) Information Theory (MacKay's book is freely available online)

Basic AI

(m)AIMA 3d Edition (I prefer this to Mitchell)

Machine Learning

(n) "Pattern Recognition and Machine Learning" by Christopher Bishop,

(o)"Elements of Statistical Learning" (free download).

(p) Neural Network Design by Hagan Demuth and Kneale,

(q) Neural Networks, A Comprehensive Foundation (2nd edition) - By Haykin
(there is a newer edition out but I don't know anything about that, this is
the one I used)

(r) Neural Networks for Pattern Recognition ( Bishop).

At this point you are in good shape to read any papers in NN. My
reccomendations - anything by Yann LeCun and Geoffrey Hinton. Both do amazing
research.

Reinforcement Learning

(s) Reinforcement Learning - An Introduction by Barto and Sutton (follow up
with "Recent Advances In reinforcement Learning" (PDF) which is an old paper
but a GREAT introduction to _Hierarchical_ Reinforcement learning)

(t) Neuro Dynamic Programming by Bertsekas

Computer Vision

(u) Introductory Techniques for 3-D Computer Vision, by Emanuele Trucco and
Alessandro Verri.

(v) An Invitation to 3-D Vision by Y. Ma, S. Soatto, J. Kosecka, S.S. Sastry.
(warning TOUGH!!)

Robotics.

(w) Probabilistic Graphical Models: Principles and Techniques (Adaptive
Computation and Machine Learning) - not about robotics per se but useful to
understand the next book

(x) Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) by
Thrun, Burgard and Fox

PS: I own all these books (except AIMA 3 for which I only have pre pub pdfs)
and if any HN folks in Bangalore want to browse before they buy anything
(friggin expensive when you add amazon's postage) send me email.

PPS: on languages, I think Bradford is on the money with regards to
reccomending functional languages. I would just say, also know C well. Saved
my ass a few times.

~~~
bradfordcross
great stuff. this and other feedback is helping to shape the revision to my
post only a couple hours after I made it. Second edition iterations are so
much faster on the web. :-)

------
FraaJad
Definitely a very good list. But also an expensive list!

HNers: any suggestions on where to find these books for cheap. (outside
university libraries)

~~~
mahmud
Local 4-year universities typically allow the public to access their libraries
for a fee; around $100/year.

~~~
silentbicycle
University alumni can often get an alumni card for much less. I can access
several CS academic journals online with mine.

------
jdlong
Great post full of references, and more importantly, brief explanations of why
each ref is useful.

------
melipone
I would argue that learning can be reduced to statistical inference.

