Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask YC: Best book to learns statistics?
80 points by smanek on May 28, 2008 | hide | past | favorite | 17 comments
The limiting reagent in a few projects I'm working on (or would like to work on) seems to be my knowledge of statistics/probability.

Ideally, I'd like something along the lines of Spivak's Calculus on Manifolds that could teach me from the ground up.

I have a fairly strong math background in general, but I've just never learned much stats/probability beyond the basics.

I'd appreciate any suggestions - there just seems to be so much cruft out there that teaches just enough to regurgitate for exams. I'm all for abstraction, but I rather dislike fundamental theorems being presented fully formed with no explanation/justification.



I agree with Feller as the classical book on probability; it is beautifully written and full of insight and deserves to be read over and over again.

For theoretical statistics, justifying why you'd use one method versus another, the graduate level classic is Cassela and Berger, Statistical Inference. This teaches you to design your own tests and compare Bayesian vs. frequentist methods and interpret the paradoxes - and is full of beautiful explanations - but it's not a bag of tricks.

Applied statistics book tend to be more specialized. There's plenty of cookbooks with statements like "and then you should do this regression and compare the p-value to 5%," but that's not what you want. Any of several books with titles like Applied Regression Analysis or Multivariate Statistical Analysis will be at a more substantial level. After that there are specialist topics - experiment design, survival analysis, heavy tailed distributions, robust statistics, multi-level modeling, asymptotics. Statisticians get involved in a lot of areas, from quality control in manufacturing to clinical trial design to economic forecasting to genetics, so there's a lot of overlap with other fields in the academic literature.

Econometric Analysis by Greene is the economists' favorite stats book, and is very self-contained, covering everything from computational tricks for Monte Carlo simulations top some very modern multivariate methods.

As a computer scientist/hacker, your natural starting point is Bishop. Pattern recognition and Machine Learning or perhaps http://www.ai.mit.edu/courses/6.867-f03/lectures.html

I've heard machine learning defined as the application of statistical methods to engineering; people with statistics degrees grumble that CS/machine learning types learn to do what's computationally feasible and not properly justify their methods, doing the sort of analysis you'd see in Casella and Berger.

In terms of software, a lot of statistical computing in the open source world now gets done on R platform. For modern computationally intensive methods like MCMC (Markov Chain Monte Carlo) the program BUGS/WinBUGS is a standard.

Also, http://videolectures.net/ has a whole lot of lectures (mostly from the Machine Learning perspective), starting with intro to probability and going right up to modern research.


The all time best "tutorial" style book for learning introductory statistics is "Fundamentals of Applied Probability Theory" by Alvin Drake. Unfortunately it is out of print and used copies are hard to come by. It looks like you can get PDFs from MIT:

http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Compute...

I have a math degree but didn't study any statistics. I used this book to teach myself. There may be more comprehensive books but this one is the best for learning the basics on your own.


I would highly recommend Biostatistical Analysis, 4th Edition by Jerrold H. Zar (http://www.amazon.com/Biostatistical-Analysis-4th-Jerrold-Za...). Don't let title mislead you. This book does does an excellent job presenting the background, development and fundamentals of a wide variety of statistical methods. The treatment is very thorough; if you can work through the entire book, you would certainly possess a good understanding of the topic. What I appreciate most is the comprehensive treatment of the limits of each method/test, allowing you to apply them (or not) with confidence.

Another classic is The Use and Abuse of Statistics by W.J. Reichmann. The treatment is not nearly as formal, but it is still a worthwhile read.


While they seem to be down at the moment (they were working this morning), Andrew Moore's introductory probability lectures are really good if you're involved in any kind of machine learning.

http://www.autonlab.org/tutorials/


Zed offers some classic recommendations at the end of his post "Programmers Need To Learn Statistics Or I Will Kill Them All":

http://www.zedshaw.com/rants/programmer_stats.html


The all time classic at the level your looking for is William Feller's two volume set "Introduction to probability theory and its applications"- if you can find a copy. A good university library should have a copy. On wikipedia, it claims that Persi Diaconis (of the seven shuffles theorem fame) was motivated to finish high school after dropping out so that he could understand the contents of this book. 'nuff said.

Karatzas and Shreve "Brownian Motion and Stochastic Calculus" is also a popular book among quants.


For probability, "Introduction to Probability" (http://www.athenasc.com/probbook.html) is the best I've come across. I am not sure if this might be a bit basic for you, since you say you have a strong background in math, but do take a look at the TOC.

I still haven't come across an equivalent book on statistics. If anyone knows a good book, I'd be very grateful if they'd post the details


I strongly recommend "All of Statistics", by Larry Wasserman. It lives up to its title pretty well. It is very concise in its explanations of various topics. I like this, but you might want a traditional book if some topic gives you trouble.

http://www.stat.cmu.edu/~larry/all-of-statistics/index.html

Feller is a lot of fun, but a very strange suggestion for someone who apparently just wants to learn some statistics and get to work. Feller is for, like, reading by the fireplace with a glass of wine.


For statistics, maybe you can start from:

http://www.statsoft.com/textbook/stathome.html


http://www.amazon.com/Probability-its-engineering-uses-Thorn...

is an excellent introduction to the subject



Don't know why this is getting downvoted -- I read and enjoyed this books a lot recently. I'm finding this to be a great thread for resources to satisfy the curiosity about probability and statistics that they left me with.


I don't doubt they're enjoyable books, but they're more about the social implications of statistics. They aren't good books to teach you statistics, which was the topic of this thread.



I rather enjoyed "The Cartoon Guide to Statistics", but it might not be quite what you're looking for.


intro stats is an excellent way to learn from the ground up http://www.amazon.com/Intro-Stats-DeVeaux-Velleman-Bock/dp/0...





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: