
Statistics Done Wrong – The woefully complete guide - bowyakka
http://www.refsmmat.com/statistics
======
capnrefsmmat
Hey everyone, I'm the author of this guide. It's come full circle -- I posted
it a week ago in a "what are you working on?" Ask HN post, someone posted it
to Metafilter and reddit, and it made its way to Boing Boing and Daily Kos
before coming back here.

I'm currently working on expanding the guide to book length, and considering
options for publication (self-publishing, commercial publishers, etc.). It
seems like a broad spectrum of people find it useful. I'd appreciate any
suggestions from the HN crowd.

(A few folks have already emailed me with tips and suggestions. Thanks!)

(Also, I'm sure glad I added that email signup a couple weeks ago)

~~~
craigyk
As a scientist I think you are addressing a very important problem with this
book. I've taken two statistics classes, one graduate level, and even I am
plagued with doubt as to wether the statistics I've used have all been applied
and interpreted "correctly". That said, I think the recent spate of "a
majority of science publications are wrong" stories is incredible hyperbole.
Is it the raw data that is wrong (fabricated)? The main conclusions? One or
two minor side points? What if the broad strokes are right but the statistics
are sloppy?

People also need to realize what while the Discussion and Conclusion section
of publications may often read like statements of truth, they're usually just
a huge lump of spinoff hypotheses in prose form. Despite my frequent
frustrations with the ways science could be better, the overall arrow of
progress points in the right direction. Science isn't a process where the goal
is to ensure that 100% of what gets published is correct, but whereby previous
assertions can be refuted and corrected.

Edit:

To be more specific, I think the statement in your Introduction is overly
critical: "The problem isn’t fraud but poor statistical education – poor
enough that some scientists conclude that most published research findings are
probably false". I would change it to say: "conclude that most published
research findings contain (significant) errors", or something along those
lines.

~~~
capnrefsmmat
I based that statement off of John Ioannidis's famous paper, "Why Most
Published Research Findings are False." It's open-access:

[http://www.plosmedicine.org/article/info:doi/10.1371/journal...](http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124)

He's drawn some criticism for the paper, and perhaps things aren't as bad as
he makes it seem, but it _is_ true that _someone_ has suggested most findings
are false.

I may tone down the Introduction slightly.

~~~
Alex3917
If anything, Ioannidis's paper would hugely understate the problem, because he
was only looking at the percentage of papers that couldn't be replicated. But
just because a result can be replicated doesn't mean that the study is
actually correct. In fact, the vast majority of wrong papers are likely very
replicable, since most wrong papers are the result of bad methodology (or
other process-related issues) rather than fudged data.

------
Tycho
If I was a billionaire, I would set up some sort of screening lab for
scientific/academic/research papers. There would be a statistics division for
evaluating the application of statistical methods being used; a replication
division for checking that experiments do actually replicate; and a corruption
division for investigating suspicious influences on the research. It would be
tempting to then generate some sort of credibility rating for each institution
based on the papers they're publishing, but that would probably invite too
much trouble, so best just to publish the results and leave it at that.

Arguably this would be a greater benefit to humanity than all the millions
poured charitably into cancer research etc.

~~~
davmre
Something like that idea has actually already been the inspiration for at
least one startup: MetaMed
([http://en.wikipedia.org/wiki/MetaMed](http://en.wikipedia.org/wiki/MetaMed),
[http://nymag.com/health/bestdoctors/2013/metamed-
personalize...](http://nymag.com/health/bestdoctors/2013/metamed-personalized-
health-care/)) does meta-level analysis of the medical literature to determine
which treatments seem effective for rare conditions, taking into account the
sample size, statistical methodology, funding sources, etc. of each study.

Of course, medicine might be unique as a domain in which individuals are
willing to pay vast sums of money to obtain slightly more trustworthy research
conclusions, and the profit motive has obvious conflicts with "benefit to
humanity" (if someone pays you to research a treatment for their disease, do
you post the findings when done? Or hold them privately for the next person
with the same problem?). But maybe there are other domains in which the market
could support a (non-billionaire's) project for better-validated research.

~~~
gwern
MetaMed, as far as I know, does basically just customized literature reviews;
it isn't doing anything I'd recognize as 'meta-level analysis' like the work
done by the Cochrane Collaboration or using meta-analytic techniques to
directly estimate the reliability of existing medical treatments or beliefs.

------
daughart
As a graduate student in the life sciences, I was required to take a course on
ethical conduct of science. This gave me the tools to find ethical solutions
to complex issues like advisor relations, plagiarism, authorship, etc. We were
also taught to keep good notes and use ethical data management practices -
don't throw out data, use the proper tests, etc. Unfortunately, we weren't
really taught how to do statistics "the right way." It seems like this is
equally important to ethical conduct of science. Ignorance is no excuse for
using bad statistical practices - it's still unethical. By the way, this is at
(what is considered to be) one of the best academic institutions in the world.

~~~
dbaupp
> Unfortunately, we weren't really taught how to do statistics "the right
> way."

Learning the _right_ way takes a lot of work, there's a lot of ways to analyse
things, each one wrong/right in different situations. (Even teaching something
as "simple" as the correct interpretation of a p-value is hard.)

~~~
daughart
I'm sure it does, but they don't have a problem assigning other required
classes, such as a one-hour-a-week communication of science class. One hour a
week for a year is enough to cover a lot of material.

------
jimmar
One of the many challenges in science is that there is no publication outlet
for experiments that just didn't pan out. If you do an experiment and don't
find statistical significance, there aren't many journals that want to publish
your work. That alone helps contribute to a bias toward publishing results
that might have been found by chance. If 20 independent researchers test the
same hypothesis, and there is no real effect, 1 might find statistical
significance. That 1 researcher will get published. The 19 just move on.

~~~
essayist
They are working on it. There was an effort at Oxford to track perinatal
trials -- started in the 1980's. It looks like it hasn't happened yet, but
that various major players (PLOS, Center for Evidence-based Medicine) want to
expand the brief to cover all clinical trials:

[http://www.alltrials.net/about/](http://www.alltrials.net/about/)

[http://www.cochrane.org/about-us/history](http://www.cochrane.org/about-
us/history)

~~~
capnrefsmmat
For many types of clinical trials, pre-registration and publication of results
through ClinicalTrials.gov is required by the FDA. I think it's been five or
ten years now. Unfortunately, compliance isn't great -- something like 80% of
studies registered on ClinicalTrials.gov never have results published there.
20% of registered studies never have results published _anywhere_.

------
jmatt
Norvig's "Warning Signs in Experimental Design and Interpretation" is also
worth reading and covers the higher level problem of bad research and results.
Including mentioning bad statistics.

[http://norvig.com/experiment-design.html](http://norvig.com/experiment-
design.html)

------
Paradigma11
Quite a few years ago i devised an ambitious method to achieve significance
while sitting through another braindead thesis presentation (psychology):

If you are interested in the difference of a metric scaled quantity between
two groups do the following:

1.) Add 4-5 plausible control variables that you do not document in advance
(questionaire, sex, age...).

2.) Write a r-script that helps you do the following: Whenever you have tested
a person increment your dataset with the persons result and run a:

t-test

u-test

ordinal logistic regression over some possible bucket combinations.

3.) Do this over all permutations of the control variables. Have the script
ring a loud bell when significance is achieved so data collection is stopped
immediately. An added bonus is that you will likely get a significant result
with a small n which enables you to do a reversed power analysis.

Now you can report that your theoretical research implied a strong effect size
so you choose an appropriate small n which, as expected, yielded a significant
result ;)

~~~
chrislipa
XKCD did it first:

[http://xkcd.com/882/](http://xkcd.com/882/)

------
ultrasaurus
One thing that constantly saddens me about statistics is that a large amount
of energy is expended using is almost correctly to "prove" something that was
already the gut feel. Even unbiased practitioners can be lead astray [1] but
standards on how not to intentionally lie with statistics are very useful.

[1] [http://euri.ca/2012/youre-probably-polluting-your-
statistics...](http://euri.ca/2012/youre-probably-polluting-your-statistics-
more-than-you-think/index.html)

~~~
onion2k
There's no way to tell whether or not that "gut feel" is accurate without
proof. Often it's right, but occasionally it's very, very wrong (cancer risk
and Bayes theory provides a good illustration:
[http://betterexplained.com/articles/an-intuitive-and-
short-e...](http://betterexplained.com/articles/an-intuitive-and-short-
explanation-of-bayes-theorem/)). Consequently it's still worthwhile proving
things even when they're seemly obvious.

~~~
pessimizer
I think his point was that people seem to "prove" common sense statistically
all of the time - but when doing so make a lot of thoughtless assumptions
about representativeness, significance, definitions, etc. stemming from the
unspoken assumption of a particular outcome being inevitable.

Or maybe I'm projecting?

------
tokenadult
I see the author of this interesting site is active in this thread. You may
already know about this, but for onlookers I will mention that Uri Simonsohn
and his colleagues

[http://opim.wharton.upenn.edu/~uws/](http://opim.wharton.upenn.edu/~uws/)

have published a lot of interesting papers advising psychology researchers how
to avoid statistical errors (and also how to detect statistical errors, up to
and including fraud, by using statistical techniques on published data).

~~~
capnrefsmmat
Thanks. I had seen some of his work, but browsing his list of publications I
found a few more interesting papers. I've already worked one into my draft.

------
glutamate
One way to do statistics less wrong is to move from statistical testing to
statistical modelling. This is what we are trying to support with BayesHive at
[https://bayeshive.com](https://bayeshive.com)

Other ways of doing this include JAGS ([http://mcmc-
jags.sourceforge.net/](http://mcmc-jags.sourceforge.net/)) and Stan
([http://mc-stan.org/](http://mc-stan.org/))

The advantage of statistical modelling is that it makes your assumptions very
explicit, and there is more of an emphasis on effect size estimation and less
on reaching arbitrary significance thresholds.

~~~
thenomad
BayesHive is very interesting! I couldn't find any details on pricing, though?

~~~
glutamate
We're thinking about it. Everything is free for the moment and we will keep a
free tier for most data analysis needs.

------
mathattack
I like that he references Huff's "How to lie with statistics" in the first
sentence of the intro. That was the book that came to mind when I saw the
subject. Also reminds me of the Twain quote, "There are three types of lies:
Lies, Damned Lies, and Statistics."

But despite this, statistics done well are very powerful.

~~~
neuralk
With respect to that Twain/Disraeli quote, my friend who is a professor of
statistics tells me that he cannot go to a party and say what he does for a
living without someone repeating it smirkingly.

~~~
dragonwriter
Isn't that why the name "Data Science" was invented?

~~~
neuralk
Data science sure sounds sexier...ish.

The irony of people who use the "damned lies and statistics" quote snidely is
that the "statistics" part is not referring to the field Statistics but the
plural version of the noun statistics, which of course are easily abused. The
field of Statistics is all about NOT abusing statistics.

~~~
jessaustin
_...ish._

Yeah, it always makes me wonder when they have to put the "science" in the
name. "Computation" seems so much more timeless and elegant than "computer
science", for instance. It's almost like "Democratic Republic" for nations.

~~~
dragonwriter
You know, _lots_ fields are named that way. Its just that the some of them are
less obvious because both the part of the name identifying the domain and the
part saying its the science (or "study", but with the same intended meaning)
are in Latin, instead of plain English.

------
WettowelReactor
What is puzzling to me is that many of the statistical errors showing up in
all the science literature are well understood. The problem is not all the
junk science that is being generated but that the current tools and culture
are not readily naming and shamming these awful studies. Just as we have basic
standards in other fields such as GAAP in finance why can' we have an agreed
upon standard for data collection and analysis of scientific data?

------
Aloisius
If you want to see truly egregious uses of statistics, take a look at any
paper on diet or nutrition. Be prepared to be angry.

At this point, if someone published a study stating that we needed to eat not
to die, I'd be skeptical of it.

~~~
capnrefsmmat
You might enjoy this:

Schoenfeld, J. D., & Ioannidis, J. P. A. (2013). Is everything we eat
associated with cancer? A systematic cookbook review. American Journal of
Clinical Nutrition, 97(1), 127–134. doi:10.3945/ajcn.112.047142

They did a review of cookbook ingredients and found that most of them had
studies showing they increased your risk of cancer, while _also_ having
studies showing they _decrease_ your risk of cancer.

I think bacon was a notable exception -- everyone agreed that it increases
your cancer risk.

------
ambiate
The greatest problem of statistical analysis is throwing out observations
which do not fit the bill. All analysis should be thoroughly documented with
postmortems.

------
VladRussian2
whenever there is discussion about statistics role in science (sometimes even
going as far as crossing into how science is statistics) i always remember
this:

[http://en.wikipedia.org/wiki/Oil_drop_experiment#Fraud_alleg...](http://en.wikipedia.org/wiki/Oil_drop_experiment#Fraud_allegations)

~~~
evacuationdrill
More revealing than the fraud allegations is the next section discussing the
way results had pressure from his and other experiments' results for years,
delaying our arrival at a more precise measurement. It reads as though it
wasn't so much malice as it was self-doubt that lead to the scientists'
actions.

------
knassy
That was an excellent read. Thank you. I'll admit I'm often reluctant to read
to much in to data I deal with daily (web analytics), as I'm unsure of how to
measure its significance accurately. I'm going to dive in and learn more about
this.

------
Anon84
"Statistical significance does not mean your result has any practical
significance."

