Daniel Kahneman’s assertion that we’re not great intuitive statisticians got me thinking how poorly I understood stat despite intro classes in college.
What is your favorite book that transformed the way you interpret and produce statistics?
Kahneman famously wrote in Thinking Fast and Slow that his results can't possibly be statistical flukes and then a bunch of the studies in the book ended up being exactly statistical flukes - https://slate.com/technology/2016/12/kahneman-and-tversky-re...
Anyway, about the books...
Blitzstein's Introduction to Probability and Harvard's Stat110 course are a good starting point if you've taken calculus. There's also good books like All of Statistics by Wasserman and Bayesian Data Analysis by Gelman but they're an absolute slog to get through - definitely not my favorite in any way but they cover a lot of stuff and you can have a copy around for reference.
intro: freedman, Pisani, purves. Very clear and accessible.
Intermediate/advanced: casella and Berger
Advanced: Bickel and doksum
Overview of ML and modern stat methods: efron and hastie
You are spoiled for good choices frankly.
What’s the best textbook you have read about X? In general the answer is “the third one.” By that time things sink in and the third book seems super clear and understandable.
I second the Freedman, Pisani, Purves "statistics" recommendation.
This book explains the concepts without using mathematics, but even people with phDs in mathematics praise it as one of the best textbooks on statistics [1]
"Naked Statistics" by Charles Wheelan is a very approachable entry level book. As opposed to a textbook, it's something that can be read casually in a relatively short amount of time, while still building a solid intuition on the subject.
Was about to buy this book, but from the reviews it seems the author is using very US centric sports (Baseball, football) to explain things. So if you are from outside the US and have no understanding of the sports you will struggle.
This book had a big influence on me. I highly recommend it. I read both the first and second editions.
It is very clearly written, full of practical examples (with code), and doesn't assume heavy math knowledge.
This section from the preface sums up the intended audience:
"The principle audience is researchers in the natural and social sciences, whether new PhD students or seasoned professionals, who have had a basic course on regression but nevertheless remain uneasy about statistical modeling. This audience accepts that there is something vaguely wrong about typical statistical practice in the early twenty-first century, dominated as it is by p-values and a confusing menagerie of testing procedures. They see alternative methods in journals and books. But these people are not sure where to go to learn about these methods."
I recently decided to refresh my statistics knowledge after about 10 years of not touching it. My last experience with stats was during my (non-math/stats) undergraduate degree when I took some statistics coursework and ended up working fairly closely with a statistics professor during my thesis. Please keep in mind that the following recommendation comes from this amateur level of knowledge.
I read a HN comment that mentioned "Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions" by Jim Frost and picked up a copy. This book is part two of a three book series by Frost, first of which is an introductory statistics book and the last of which concerns linear regression.
I'm about 3/4 through the Hypothesis Testing book and am so impressed I just ordered the other two in the series. This book is the companion I wished I had during my undergraduate work. Reading it, I realized how much I partially digested or glossed over during my stats course. It also is a good way to reassess those statistical practices that I was taught during non-stats courses as "just how we do things." Frost is good at concisely explaining the appropriate uses of different kinds of statistical tests and their pitfalls.
What it lacks is implementation details- the book contains very few equations and no code (although I believe there is downloadable example data available). For my purposes, this works great as I most needed an overview of the different hypothesis tests and their suitability for different situations. I now have more confidence about choosing and understanding a test at a high level and can dive into specific implementation details in other sources as needed.
To be perfectly clear, I have no designs on becoming a professional statistician and I'm re-learning stats just to support a couple side projects and generally broaden my capabilities. Someone whose primary interest is the math and implementation of statistical calculations will be better served by another book.
When I studied economics in college, statistics was taught as a math course with virtually no straightforward explanation of the main concepts. Later as a graduate student in sociology, I helped teach statistics to undergraduates using an introductory textbook that featured an intuitive and conceptual presentation of statistics, and for the first time I really grasped the main ideas. By then it was no longer necessary to manually perform the advanced mathematical operations for statistical analysis, because the students were encouraged to use SPSS and other software tools. If you are looking for an outstanding introduction to statistics, I highly recommend the following textbook:
Healey, Joseph (2014) Statistics: A Tool for Social Research, 10th Edition
For those seeking a good conceptual knowledge of statistics based on real-world examples, including criticism of the limitations of statistical analysis, I would suggest these excellent studies:
Lewis, Michael (2010) The Big Short: Inside the Doomsday Machine
Paulos, John Allen (1988) Innumeracy: Mathematical Illiteracy and Its Consequences
Silver, Nate (2012) The Signal and the Noise: Why So Many Predictions Fail, but Some Don't
Taleb, Nassim Nicholas (2005/10) Fooled by Randomness, 2E; Black Swan: Impact of the Highly Improbable, 2E
I second “Probability Theory.” This is simply the best book I’ve ever seen on probability and statistics. It’s the only book that made things actually make sense to me as a software developer.
Elements of Statistical learning is from the same authors (mostly) as ISLR but it's longer and goes a bit deeper on the theory. Also, Computer Age Statistical Inference which is newer and covers more material.
Other big titles - Pattern Recognition and Machine Learning (Bishop), Machine Learning: A Probabilistic Perspective (Murphy) - these last two weren't for me but different people respond to different approaches so check them out especially if the first two books don't click for you.
Anyway, about the books...
Blitzstein's Introduction to Probability and Harvard's Stat110 course are a good starting point if you've taken calculus. There's also good books like All of Statistics by Wasserman and Bayesian Data Analysis by Gelman but they're an absolute slog to get through - definitely not my favorite in any way but they cover a lot of stuff and you can have a copy around for reference.