
Statistics Done Wrong - lawlorino
https://www.statisticsdonewrong.com
======
tomrandle
I really rate this book so it’s great to see it getting some publicity!

I think one misnomer is that to do stats right you just need to do the math
right. But what analysis is done in the first place, the overall methodology
and how that data is interpreted are more often where we go wrong.

At Geckoboard we’ve been trying to raise awareness of some of these issues.
Heres a poster we put together: [https://www.geckoboard.com/learn/data-
literacy/statistical-f...](https://www.geckoboard.com/learn/data-
literacy/statistical-fallacies/)

------
devit
Isn't most or all of this avoided by explicitly using Bayes' theorem along
with a correct formalization of the domain?

E.g. for the mammogram:

P(cancer) = 0.8%

P(~cancer) = 1 - P(cancer) = 99.2%

P(positive_mammogram | cancer) = 90%

P(positive_mammogram | ~cancer) = 7%

P(cancer | positive_mammogram) = P(positive_mammogram | cancer) P(cancer) /
(P(positive_mammogram | cancer) P(cancer) + P(positive_mammogram | ~cancer) (1
- P(cancer))) = 90% * 0.8% / (90% * 0.8% + 7% * 99.2%) = 9.39457%

~~~
stared
Yes, most of the problems (with conditional probability, statistical tests,
significance, etc) disappear once you express it in a Bayesian way (it's not
only Bayes' formula - it explicitly creating a Bayesian model).

Basically, most of them boil down to:

\- mistakes that can be tackled if you write it down explicitly

\- hidden assumptions that can be discovered (and made explicit or modified)

While there is some philosophical difference between frequentist and Bayesian
probability (and for some reason, I know people moving only one way).

"Frequentist probability is Bayesian probability, where priors are flat,
hidden, and considered taboo".

BTW: Frequentists vs. Bayesians
[https://xkcd.com/1132/](https://xkcd.com/1132/) (there is never too much of
xkcd!)

~~~
posix_me_less
That is fascinating. Which one is correct? Is Bayesian using more assumptions
than Frequentist, namely the fact that repeated queries that haven't been done
yet will show that the machine's answer is NO most of the time?

~~~
dragonwriter
> Which one is correct?

The Bayesian is correct to offer the bet.

Who is correct about the sun exploding actually is irrelevant to that; only
the conditional probability of the bet being collectable if the sun has
exploded vs. that of it has not exploded is needed here. You would care about
the probability that the sun actually had exploded if one of those weren't
zero, but it is, so it doesn't matter.

------
danenania
This is a great book. I read it a couple years ago and I remember a couple
takeaways that apply well to AB testing:

1 - Monitoring tests on an ongoing basis and then calling them as soon as they
hit some confidence threshold (like 95%) will give you biased results. It's
important to determine your sample size up front and then let the test run all
the way through, or at least be aware that the results are less reliable if
you stop early.

2 - Testing for multiple metrics requires a much larger sample. If you run a
test and then compare conversion rate, purchase amount, pageviews per session,
retention, etc. etc., you'll have a much higher error rate since the more
things you measure, the more likely you are to get an outlier. You either need
to run a separate test for each metric or increase your sample size a lot to
account for this effect (iirc the math for exactly how much is in the book).

~~~
capnrefsmmat
Thanks, I'm glad you enjoyed the book! (Author here -- the website got its
first publicity here on HN.)

Regarding AB testing, you might be interested in this recent research, which
uses real data from Optimizely to estimate how often people get AB test false
positives because they stopped as soon as they hit significance:
[https://ssrn.com/abstract=3204791](https://ssrn.com/abstract=3204791)

> Specifically, about 73% of experimenters stop the experiment just when a
> positive effect reaches 90% confidence. Also, approximately 75% of the
> effects are truly null. Improper optional stopping increases the false
> discovery rate (FDR) from 33% to 40% among experiments p-hacked at 90%
> confidence

------
r4um
Another good curation
[https://web.ma.utexas.edu/users/mks/statmistakes/StatisticsM...](https://web.ma.utexas.edu/users/mks/statmistakes/StatisticsMistakes.html)

How to Lie with Statistics by Darrell Huff
[https://en.wikipedia.org/wiki/How_to_Lie_with_Statistics](https://en.wikipedia.org/wiki/How_to_Lie_with_Statistics)

~~~
hermitdev
I don't remember the attribution, but one of my favorite quotes goes something
like: "There are lies, damned lies, and statistics." I think I first remember
seeing it in the foreword of a chapter in the book "Against the Gods: the
Remarkable Story of Risk" by Peter Bernstein

~~~
yesenadam
It seems that no-one knows exactly where it's from. Was in use in the 1890s.

[https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statist...](https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics)

------
ASpring
Highly recommend this book. I would consider myself experienced with many
statistical methods but this book was still chock full of brilliant examples
that let me look at things with fresh eyes. It was helpful also in giving me
language to explain technical concepts to less technical folks.

~~~
HillaryBriss
yeah. the "base rate fallacy" example sheds some light on the pros/cons of
mammogram results. one interesting thing about that section of the book is
that it says medical doctors fall prey to the fallacy more often than not.

i always kind of wonder when i see a medical doctor explaining statistics to a
general audience: do they really have this right?

~~~
misiti3780
I also found this passage about mammographers from "Moonwalking with Einstein"
interesting:

For most mammographers, practicing medicine is not deliberate practice,
according to Ericsson. It’s more like putting into a tin cup than working with
a coach. That’s because mammographers usually only find out if they missed a
tumor months or years later, if at all, at which point they’ve probably
forgotten the details of the case and can no longer learn from their successes
and mistakes.

One field of medicine in which this is definitively not the case is surgery.
Unlike mammographers, surgeons tend to get better with time. What makes
surgeons different from mammographers, according to Ericsson, is that the
outcome of most surgeries is usually immediately apparent—the patient either
gets better or doesn’t—which means that surgeons are constantly receiving
feedback on their performance.

~~~
hinkley
Many of us in the software community know that if the feedback loop isn't fast
enough, it doesn't work.

I wonder if you made mammographers spend a day a week analyzing mammograms
from 5 years ago and then showing them the outcomes if they would get more
accurate?

~~~
heavenlyblue
Well, that’s pretty much like training a neural network, except that the
integration of the matrix coefficients happens not from the immediate output
data, but from data which was output by the network millions of epochs ago.

------
jdietrich
Along similar lines, Dr Ben Goldacre's _Bad Science_ is an excellent
beginner's introduction to the scientific method. If you understand how quacks
bamboozle unwitting journalists, you gain a key insight into what good science
looks like.

[https://www.amazon.com/Bad-Science-Quacks-Pharma-
Flacks/dp/0...](https://www.amazon.com/Bad-Science-Quacks-Pharma-
Flacks/dp/0771035780)

------
wirrbel
I think I never was in a developed field that was as error prone as
statistics, both intentional (fraud, p value hacking, etc) and with honest
mistakes.

~~~
RA_Fisher
Statistics might be seen as the process of rooting out errors and falsehoods.
To come into contact with them is the goal (to eliminate them). :)

~~~
dcl
Indeed. It's the good statisticians that use good statistical methodology and
theory to identify out errors and falsehoods.

Honest mistakes occur because what looks like a simple problem cannot always
be analysed very easily (or in an obvious way).

Abuse occurs because it's quite easy to fool others that an analysis is sound
- most people aren't sophisticated enough to identify problems, even if they
are given the data!

------
stared
For practical statistic done well, I recommend "Sex by Numbers" by David
Spiegelhalter ([https://www.amazon.com/Sex-Numbers-Wellcome-David-
Spiegelhal...](https://www.amazon.com/Sex-Numbers-Wellcome-David-
Spiegelhalter/dp/1781253293/)).

I am reading Sex by Numbers, which I enjoy a lot. It's a touchy subject with a
lot of data of varying quality. A naive approach would be to take all of it. A
dogmatic - to set an arbitrary (and subjective!) threshold, separating "good"
from "bad" data. I love the way, in which it is done there, i.e. by grading
sources by:

4: numbers that we can believe (e.g. births and deaths)

3: numbers that are reasonably accurate (e.g. well-designed & conducted
surveys, e.g. the Natsal report)

2: numbers that could be out by quite a long way (e.g. non-uniform sampling,
the Kinsey report)

1: numbers that are unreliable (e.g. surveys from newspapers, even with huge
sample sizes)

0: numbers that have just been made up (e.g. "men think of sex every 7
seconds")

Just have a peek at the first chapter, which is freely accessible, and is
exactly on data reporting, data reliability and dealing with subjective
questions. A lot of thought is given about knowing the possible biases (e.g.
people who are less likely to respond, who would like to downplay or
exaggerate some things) and consistency of measurements.

So - in short: started reading to get curious facts about sex, ended up
recommending to my data science students and mentees. (As the vast majority of
problems starts with how you collects data, interpret it, and how well are you
aware of its shortcomings).

------
oriettaxx
This has always been a great book
[https://en.m.wikipedia.org/wiki/How_to_Lie_with_Statistics](https://en.m.wikipedia.org/wiki/How_to_Lie_with_Statistics)

------
fiatjaf
I thought this was going to a be a collection of statistics people really did
wrong.

------
edoo
I really loved xkcd 2059.

~~~
kondro
You make it sound like some type of ISO standard. :P

