Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What is your favorite book for learning statistics?
56 points by djklanac on Dec 29, 2021 | hide | past | favorite | 20 comments
Daniel Kahneman’s assertion that we’re not great intuitive statisticians got me thinking how poorly I understood stat despite intro classes in college.

What is your favorite book that transformed the way you interpret and produce statistics?




Kahneman famously wrote in Thinking Fast and Slow that his results can't possibly be statistical flukes and then a bunch of the studies in the book ended up being exactly statistical flukes - https://slate.com/technology/2016/12/kahneman-and-tversky-re...

Anyway, about the books...

Blitzstein's Introduction to Probability and Harvard's Stat110 course are a good starting point if you've taken calculus. There's also good books like All of Statistics by Wasserman and Bayesian Data Analysis by Gelman but they're an absolute slog to get through - definitely not my favorite in any way but they cover a lot of stuff and you can have a copy around for reference.


At what level?

intro: freedman, Pisani, purves. Very clear and accessible.

Intermediate/advanced: casella and Berger

Advanced: Bickel and doksum

Overview of ML and modern stat methods: efron and hastie

You are spoiled for good choices frankly.

What’s the best textbook you have read about X? In general the answer is “the third one.” By that time things sink in and the third book seems super clear and understandable.


I second the Freedman, Pisani, Purves "statistics" recommendation.

This book explains the concepts without using mathematics, but even people with phDs in mathematics praise it as one of the best textbooks on statistics [1]

[1] https://stats.stackexchange.com/a/1666/18417


"Naked Statistics" by Charles Wheelan is a very approachable entry level book. As opposed to a textbook, it's something that can be read casually in a relatively short amount of time, while still building a solid intuition on the subject.


Was about to buy this book, but from the reviews it seems the author is using very US centric sports (Baseball, football) to explain things. So if you are from outside the US and have no understanding of the sports you will struggle.



This book had a big influence on me. I highly recommend it. I read both the first and second editions.

It is very clearly written, full of practical examples (with code), and doesn't assume heavy math knowledge.

This section from the preface sums up the intended audience:

"The principle audience is researchers in the natural and social sciences, whether new PhD students or seasoned professionals, who have had a basic course on regression but nevertheless remain uneasy about statistical modeling. This audience accepts that there is something vaguely wrong about typical statistical practice in the early twenty-first century, dominated as it is by p-values and a confusing menagerie of testing procedures. They see alternative methods in journals and books. But these people are not sure where to go to learn about these methods."


I recently decided to refresh my statistics knowledge after about 10 years of not touching it. My last experience with stats was during my (non-math/stats) undergraduate degree when I took some statistics coursework and ended up working fairly closely with a statistics professor during my thesis. Please keep in mind that the following recommendation comes from this amateur level of knowledge.

I read a HN comment that mentioned "Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions" by Jim Frost and picked up a copy. This book is part two of a three book series by Frost, first of which is an introductory statistics book and the last of which concerns linear regression.

I'm about 3/4 through the Hypothesis Testing book and am so impressed I just ordered the other two in the series. This book is the companion I wished I had during my undergraduate work. Reading it, I realized how much I partially digested or glossed over during my stats course. It also is a good way to reassess those statistical practices that I was taught during non-stats courses as "just how we do things." Frost is good at concisely explaining the appropriate uses of different kinds of statistical tests and their pitfalls.

What it lacks is implementation details- the book contains very few equations and no code (although I believe there is downloadable example data available). For my purposes, this works great as I most needed an overview of the different hypothesis tests and their suitability for different situations. I now have more confidence about choosing and understanding a test at a high level and can dive into specific implementation details in other sources as needed.

To be perfectly clear, I have no designs on becoming a professional statistician and I'm re-learning stats just to support a couple side projects and generally broaden my capabilities. Someone whose primary interest is the math and implementation of statistical calculations will be better served by another book.


When I studied economics in college, statistics was taught as a math course with virtually no straightforward explanation of the main concepts. Later as a graduate student in sociology, I helped teach statistics to undergraduates using an introductory textbook that featured an intuitive and conceptual presentation of statistics, and for the first time I really grasped the main ideas. By then it was no longer necessary to manually perform the advanced mathematical operations for statistical analysis, because the students were encouraged to use SPSS and other software tools. If you are looking for an outstanding introduction to statistics, I highly recommend the following textbook:

Healey, Joseph (2014) Statistics: A Tool for Social Research, 10th Edition

For those seeking a good conceptual knowledge of statistics based on real-world examples, including criticism of the limitations of statistical analysis, I would suggest these excellent studies:

Lewis, Michael (2010) The Big Short: Inside the Doomsday Machine

Paulos, John Allen (1988) Innumeracy: Mathematical Illiteracy and Its Consequences

Silver, Nate (2012) The Signal and the Noise: Why So Many Predictions Fail, but Some Don't

Taleb, Nassim Nicholas (2005/10) Fooled by Randomness, 2E; Black Swan: Impact of the Highly Improbable, 2E


Jaynes, Probability Theory: The Logic of Science (https://books.google.com/books/about/Probability_Theory.html...)

Grunwald, The Minimum Description Length Principle.


I second “Probability Theory.” This is simply the best book I’ve ever seen on probability and statistics. It’s the only book that made things actually make sense to me as a software developer.


“ An Introduction to Statistical Learning” even if it is with examples in R and you don’t care about R.

It’s also not aimed at complete beginners, but if you have a very solid grasp on high school math and some blood on your teeth, you can totally do it.

There’s also a coursera course and the book is free as a pdf


ISLR is a great introduction to machine learning, not necessarily to statistics. It doesn't cover stuff like hypothesis testing or simulation methods.


What should one read after ISLR?

Could probably go through it again but I’d also like more of a theoretical background.

Isn’t OLS a specific case of MLE, for example ?

Anyway, I’d like to understand machine learning and statistical inference more deeply.

I’d be happy to get the answers by reading a handful of books.


Elements of Statistical learning is from the same authors (mostly) as ISLR but it's longer and goes a bit deeper on the theory. Also, Computer Age Statistical Inference which is newer and covers more material.

Other big titles - Pattern Recognition and Machine Learning (Bishop), Machine Learning: A Probabilistic Perspective (Murphy) - these last two weren't for me but different people respond to different approaches so check them out especially if the first two books don't click for you.


> Also, Computer Age Statistical Inference which is newer and covers more material.

isn't that more a history of modern statistics than an actual statistics textbook?



The only right answer is All of Statistics: A Concise Course in Statistical Inference

https://www.stat.cmu.edu/~larry/all-of-statistics/



Thank you all for your suggestions! I can’t wait to get started.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: