

Most Science Studies Appear to Be Tainted By Sloppy Analysis - nickb
http://online.wsj.com/article/SB118972683557627104.html?mod=most_emailed_day

======
danteembermage
Worth pointing out is that, while the studies are almost certainly wrong, they
develop into less wrong that builds on the old more wrong.

Take the Bohr atom, horribly wrong, but immensely valuable in it's time (and
still pedagogically). Then you add sub-orbitals, orbital hybridization theory,
and the quantum mechanical model, all almost certainly wrong.

Even in the social sciences you have results like "if we make charitable gifts
tax deductible, revenue will go down, but gifts will go up more than the lost
revenue" later studies came along using people as their own control by
studying their behavior through time instead of just over one year, and the
opposite is true, tax revenue goes down more than giving goes up. So the
original study was "wrong" but to get the right answer you have to start with
the simple and build up.

There are certainly areas that are so entangled with inter-relations and data
difficulties that make empirical studies worthless until someone comes up with
decent models to untangle it all, but I think those are the exception rather
than the rule.

~~~
mechanical_fish
I think I agree with your main point, but it's important to distinguish
between various kinds of "wrongness".

The Bohr atom is "wrong" in the way that a cartoon is wrong.

Newtonian mechanics (and non-relativistic quantum mechanics, for that matter)
are "wrong" because they are approximations: at certain length scales or time
scales they work perfectly, but at other scales they break down and need to be
replaced by something more precise.

The kind of "wrong" that this article is discussing is much different: biased
experimental design and lousy statistics. Unlike the other two kinds of
"wrong", this is actually a problem. I've seen thousands of hours wasted -
some of them by me - due to lack of knowledge of the principles of statistical
significance, or because someone found a way to fool himself into cherry-
picking the most interesting data points.

Public service announcement: If you're still in school as you read this, take
a stats course!

~~~
apathy
> If you're still in school as you read this, take a stats course!

If you're not, you can consult with a decent statistician at most
institutions. Might save your career, in fact...

------
byrneseyeview
What we need is a massive meta-study. Here's how it would work: take 1000
randomly-selected studies, each of which had a 95% confidence interval. Repeat
all of them. You should then get some idea of which 50 of the 1000 studies saw
noise and treated it as data. _Then_ compare the media attention accorded to
the 50 versus the 950 -- that should tell you what you probably already know:
that since we're drawn to interesting results, and implausible truths are
interesting, we all pay disproportionate attention to studies that happen to
be wrong.

~~~
apathy
This is rather expensive, and for human studies, would require both extremely
fine control of bias (very, very difficult in meta-analyses due to publication
bias) as well as extremely well-established investigators who could get a
hugely negative study published in a highly visible journal.

Most academics know better than to challenge the norms. Corporate control of
scholarly journals (in terms of the actual distribution and management of the
content rights) has not served the actual practice of science very well.

In any event, this massive meta-replication study you speak of is done, very
slowly and piecemeal, every day in every reasonable field. As new studies fail
to replicate old studies, the conclusions of the irreproducible are
deprecated, while those that consistently bear fruit are entrenched. Given
complex models and expensive studies (eg. massive multi-year cohort studies of
genetic, environmental, and gene*environment interaction effects on cancer
risk), that's about all you can hope for.

Science is simultaneously practiced in a manner worse than anyone's fears (on
the level of individual studies) and better than anyone's hopes (on the level
of accretion of knowledge over time vs. budgetary constraints). The fraudsters
burn out quickly in most rigorous fields.

~~~
byrneseyeview
>In any event, this massive meta-replication study you speak of is done, very
slowly and piecemeal, every day in every reasonable field. As new studies fail
to replicate old studies, the conclusions of the irreproducible are
deprecated, while those that consistently bear fruit are entrenched. Given
complex models and expensive studies (eg. massive multi-year cohort studies of
genetic, environmental, and gene*environment interaction effects on cancer
risk), that's about all you can hope for.

Sure, but in a very ad hoc way. It's not a study if you're just sucking up
data when you find it -- there isn't a control group, and you haven't
standardized it.

~~~
apathy
> there isn't a control group, and you haven't standardized it.

Be very, very careful walking down this path. For example, case-control
studies are inherently non-representative of population rates and thus more
appropriate for rare events (the tests reflect divergence from a large-sample
approximation of the hypergeometric distribution to a chi-squared
distribution). So if you want to test them against hypothetical nulls, it's
better to permute existing data (provided it is in fact good data) or do your
own replication study (focusing on the main effects) than to attempt to
standardize a model that resists standardization. The question for many
studies is not really 'how much' but 'whether', as in, whether a risk or a
process or a mechanism explains a significant amount of the observed and
unknown.

One of the worst fallacies in science is the widespread belief that large-
sample asymptotic results are applicable to smaller samples, often extreme-
value distributions. You can't know what you don't know in some of these
studies, in other words. You can attempt to build a model of what could have
happened given the constraints under the null, and then permute the events in
a simulation -- but for a meta-analysis with replication, how would you pull
this off under a finite budget?

And it would _still_ be less expensive than your previous suggestion of
attempting to directly replicate each of, say, 1000 studies. Even casting
aside the intense reticence of researchers to perform purely replication-based
studies, which is amazingly powerful to behold.

So, while it is rather piecemeal and ad-hoc, my contention is that the current
jury-rigged methodology is actually a less expensive way (albeit much slower)
to converge at the best approximation to _the truth_ in complex fields.

You can disprove that a transcription factor will bind to a kinase after a
functional mutation is introduced, because finding a control is easy -- you
split a colony of cells, transfect one, don't transfect the other, and run
everything in parallel. But when you are modeling genetic and environmental
effects over a span of 25 years among 15,000 men and women, how do you go back
in time and select an appropriate group of controls for meta-analysis?

The truth is that you don't. Barring craven and institutionalized
misinterpretation, such as is performed in some clinical trials analysis (when
we _all_ know better, and most biostatisticians can easily pick apart the
errors if the information is made public), the current process is a slow,
iterative, but useful approximation to the infinite-budget approach you
propose.

It might work for rinky-dink biochemistry or psychology experiments, or
perhaps for microarray studies with expression signatures for things like
tumor phenotypes. That's not too awful, and you could re-pilot the study for
replication (happens a lot already). But the monster cohort studies that spawn
sub-studies -- good luck with that! The big epidemiological studies are among
the most mistreated of all, because carefully parametrized statements of
summary results are then spun by talking heads to sound as dramatic as
possible. Sometimes the principal investigators will get in on the game, but
as often as not, calmly presented information ("we saw a 2.5x (95% ci:
1.4x-3.7x) increase in bladder cancer risk for GSTM1 null phenotypes exposed
to N-nitrosamines") will be re-spun as "KILLER CANCER GENE FOUND BY PIONEERING
RESEARCHERS AT THE UNIVERSITY OF SOUTH SCRANTON!!!1".

Don't shoot the message, shoot the messenger, for those...

------
apathy
From the piece:

> "The correction isn't the ultimate truth either," Prof. Kevles said.

No kidding! Folks, this is an iterative process. If you get as excited as the
P.I. about their findings, stop for a second, look at the methods & materials,
and ask if a different analysis would support their findings. In a 'hot'
field, look very carefully at the figures and tables, for these fields are the
most prone to shenanigans ( _cough_ stem cells _cough_ ). At the same time,
when you have a critical mass of smart, motivated people in a field that's
ripe for discovery, real advances can and do happen. What biologist would have
taken you seriously in 1990 if you told them that, not only would we have a
map of significant functional landmarks in the human genome by 2000, but by
2010, we'd have them broken down base-by-base into their patterns of variation
among sub-populations? And the same thing is happening for everything from
nematode worms to wine grapes.

Don't throw the baby out with the bathwater. But do expect more heavy-handed
anti-intellectual undertones from the WSJ as Murdoch begins to insert his
control into the editorial staff.

------
jey
Humans are not rational beings, their judgments are driven by certain inbuilt
heuristics and biases.

Excellent intro to the topic: <http://singinst.org/Biases.pdf>

Excellent blog: <http://www.overcomingbias.com>

------
dood
Here's the essay, _Why Most Published Research Findings Are False_ :
[[http://medicine.plosjournals.org/perlserv/?request=get-
docum...](http://medicine.plosjournals.org/perlserv/?request=get-
document&doi=10.1371/journal.pmed.0020124)]

~~~
apathy
I recall having read this piece before, and it is well written; but for a non-
statistician, there's really just one thing you need to remember:

 _Extraordinary claims require extraordinary evidence._

If you're testing 500,000 hypotheses and you find that one of them is
significant, it better be extremely significant, and it had better survive
independent replication. Otherwise... well, let's not belabor the point.

Just keep in mind that a wild and wacky theory needs some heavy-duty
experimental evidence (i.e. replication) before you, or the PI, or the
referees for Ye Olde Journal, have any reason to believe it. If everyone
involved would keep this in mind, it would cut down hugely on publication bias
and popular confirmation bias as well.

------
robg
In my field, I've watched as the statistics have been invented to deal with
the data. Papers from the early neuroimaging days may not have the best, or
even correct, stats, but we know if those studies were on to something if the
findings were replicated. I think that iteration in science is as good an
approximation of 'truth' as we can get from the world around us. Without
replication any individual finding is suspect, no matter how much play it gets
in the popular press.

------
ninguem
Medicine is not science!

