
We're so good at medical studies that most of them are wrong - alexandros
http://arstechnica.com/science/news/2010/03/were-so-good-at-medical-studies-that-most-of-them-are-wrong.ars?utm_source=rss&utm_medium=rss&utm_campaign=rss
======
a-priori
From the end of the article: "... such as compelling researchers to share both
data sets and the code for statistical models."

This would be a welcome change. It seems very strange to me that it is not the
accepted practice to make available any source code used in the analysis. It's
an essential part of their methodology, and any bugs in that code could
produce hard-to-detect flaws in their results (dropped data, rounding errors,
etc). Just look at the work jgrahamc did on climate change analysis:
<http://news.ycombinator.com/item?id=1128782>.

~~~
applicative
It is stunning in general that it can count as 'science' and a piece of
'scientific research' though the code employed is held secret. In fact, the
use of proprietary software in science is at least as dubious as the use of
.doc format for government documents.

------
yaroslavvb
A big problem is that there's often no objective way to pick the "right"
statistical test. This gives experimenter the freedom to choose the
statistical procedure that favors positive conclusion. Here's a classical
example when the statistician has the freedom in choosing between binomial and
negative binomial test

Two experimenters contrast treatments A and B. They both have A preferred to B
in first 5 patients, and B preferred in the 6th. First experimenter planned to
run 6 experiments and count the number of successes, so they get P-value 0.11
for the hypothesis that A is better. The second experimenter planned to run
comparisons until B is preferred, up to 6, and got P-value of 0.03. (see
Appendix A of <http://www.annals.org/content/130/12/995.full.pdf+html>)

A realistic example of this issue coming up <http://www.jstor.org/pss/2336980>

Another example is choosing between one-tailed and two-tailed t-test. When you
ask for the probability of effect being as extreme as observed x under null
hypothesis, should you ask for probability of effect>x or |effect|>|x|?

Eliezer Yudkowsky goes into some discussion on this
[http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequ...](http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/)

The most illustrative example of subjectivity of hypothesis testing is
probably the issue of testing strings for randomness. There are many tests for
testing whether a particular string of bits is generated by a Bernoulli
process, with not one having a legitimate claim to being "the right one"

One way to remove the bias is three-way triple-blind testing, ie to measure
effects of existing, new, and placebo treatments, have statistician analyze
datasets while blinded to their true labels.

------
carbocation
The first portion of the article entirely focuses on multiple testing. If you
don't perform alpha spending or Bonferroni-type correction despite looking
multiple times at your data, you will - of course! - find spurious
associations. This is simply bad science and/or lazy refereeing on the part of
the journals.

In human genetics we have Mark Daly, David Altshuler, and Eric Lander to thank
for doing the thoughtful theoretical work beforehand to identify the alpha for
identifying genomewide significance for any study: 5E-8. To my knowledge, GWAS
results are much less commonly found to be invalid in a later study.

The message is this: everyone knows at least one method of correction for
multiple testing, and virtually everyone knows that they should be doing this.
No journal should publish papers that merely show nominal, instead of
corrected, P values.

------
nkassis
Could it also be because Statistics is hard and that a lot of researchers
don't have a good grasp of it?

~~~
marshallp
Especially these bio/medical types, rigorous thinking isn't a requirement in
their qualification process.

~~~
applicative
Isn't statistics what you resort to when there isn't scope for rigorous
thinking?

~~~
marshallp
No, statistics requires rigorous thought to be applied properly. That's why
it's usually taught in the mathematics department.

~~~
tokenadult
Teaching a first course in statistics in the mathematics department (rather
than in a dedicated department of statistics) may be a mistake.

<http://statland.org/MAAFIXED.PDF>

<http://escholarship.org/uc/item/6hb3k0nz>

------
giardini
"Why Most Published Research Findings Are False" (2005) by John P. A.
Ioannidis

[http://www.plosmedicine.org/article/info:doi/10.1371/journal...](http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124)

------
medgeek
Most medical studies have a small sample size because it is difficult and
often prohibitively expensive to do studies with large numbers of subjects.
Small sample size increases the likelihood of spurious results.

The other problem is that mainstream journalism does an abyssmal job of
reporting science. How many times have your read a newspaper article or seen a
segment on the evening news that tells you that X is good for your health or X
is bad for your health as if it was an absolute truth, a law of nature on par
with V=IR or F=ma? Then, out of curiosity, you look up the actual journal
article published by the scientists and find their claims to be considerably
more modest.

------
JoeAltmaier
Three kids, same pediatrician. First kid: "Drink lots of whole milk". 2nd kid:
"drink only skim milk; kids are too fat". 3rd kid: "drink 2% milk in
moderation".

~~~
dkarl
I have a friend who's a doctor. A few years ago I read a few books about
nutrition for laypeople, and when I had questions, I tried asking him. He
couldn't tell me much. Virtually everything I asked him about, I knew more
about it than he did, just from reading a few books. He obviously knew more
about enzymes, metabolic pathways, and cardiac arteries than I did, but when
it came to practical questions like how much protein I need to consume or how
well calcium is absorbed from broccoli and collard greens, he had learned
literally _nothing_ in medical school. The closest he came to studying
nutrition was when he took a special elective unit on diabetes in rural
Hispanic populations.

The way he sees it, nutrition is a separate specialty done by people who study
less than doctors and get paid less than doctors, and it isn't taught in
medical school, so he isn't embarrassed at all by his ignorance. He regularly
gives advice to people who have risk factors like high blood pressure or high
cholesterol, but the advice he gives them (eat in moderation, eat more
vegetables and less junk food, exercise a little) is pretty generic, and none
of them follow it anyway.

~~~
ericd
Doctors are typically educated in how to fix problems, not so much in prevent
them. This is unfortunate, since most people think they're experts in
everything including nutrition, and treat them as such, but they're really
not.

Maybe if doctors were paid to keep their patient base healthy rather than just
fix them when they're sick, they'd look into more of that.

~~~
nradov
Capitation is one approach to solving that problem.
<http://en.wikipedia.org/wiki/Capitation_%28healthcare%29> Unfortunately the
times frames are still fairly short because patients often move around. So
even when doctors are compensated through a capitation system, there isn't
enough incentive to encourage patients to make healthy lifestyle choices that
will only pay off many years in the future.

------
gchpaco
One of the problems I've run into in my current work is that statistics deals
with problems a lot like the drunk man looking for his keys under the lamp. As
far as I can tell there is one man in the entire world who works on stable
distributions (like normal distributions, but with tunable skew and heavy-
tailedness parameters), which have been quite useful for me. If you have or
believe you have data that has subtle dependencies among the random variables,
you _could_ use a copula but multivariate Archimedian copulas are hard to
compute with (far as we've been able to tell, at least), copula fitting is a
research problem, and copula choice is black magic.

------
xiaoma
I'm not a fan of the common sense check. A lot of science is, or at least was,
completely contrary to common sense.

However, reproducibility is absolutely vital. If future researchers can't
replicate the results of a study, then statistical flukes are going to be a
real problem. If studies can be replicated, then statistical error can be
handled very efficiently. Each time a given piece of research is reproduced
(even if it's only reproduced to serve as a set-up control for a future
study), will exponentially reduce the odds of a statistical fluke surviving.

------
mattchew
Some time back on--I think--Overcoming Bias, I read that less than half of
medical studies are verified when independent verification is attempted.

Does anyone have this reference? I can't find it again.

~~~
nazgulnarsil
<http://www.overcomingbias.com/2009/07/meds-to-cut.html>

<http://clinicalevidence.bmj.com/ceweb/about/knowledge.jsp>

~~~
mattchew
Thanks NN. I don't think this is quite the same thing as what I was thinking
of, but useful anyhow.

------
maurycy
Does it apply to other sciences? (I think it does but I don't have any access
to papers confirming it)

~~~
pw0ncakes
Most likely, not nearly as much. In medicine, tests are expensive and
ethically complicated, because you're experimenting on living beings whose
inner workings we barely understand. So the ability to make firm conclusions,
like you can do in the sciences, is not there.

Gravity is proved. Evolution is rock solid. We still don't know if coffee is
good or bad for us.

~~~
a-priori
_Gravity is proved. Evolution is rock solid. We still don't know if coffee is
good or bad for us._

Gravity is not proven. Falling is easily observed, and general relativity is a
pretty good theory for it, but the jury is still out on how exactly gravity
works. Evolution is also easily observed (e.g., drug resistant pathogens); its
basis in natural selection and genetics is the part that is rock solid.

As for coffee, it's some combination of good and bad. Which part dominates in
which situations is up for debate.

(Apologies for being pedantic -- caffeine high just set in.)

~~~
pw0ncakes
Fair enough. I should have said that the Law of Universal Gravitation, not
"gravity", is effectively proven. We don't know how gravity works.

