
Chart of the Decade: Why You Shouldn’t Trust Every Scientific Study You See - curtis
https://www.motherjones.com/kevin-drum/2018/11/chart-of-the-decade-why-you-shouldnt-trust-every-scientific-study-you-see/
======
epistasis
I 100% agree with the headline; _especially_ for research papers even more
than clinical trials.

However, I'm a bit puzzled by the weird direction the journalist ran with
this, which is straight to his preconceived notions that are not that
supported by the data he's looking at.

But there's a bit more to this than just that one chart. In addition to self-
correction (e.g. beginning to require pre-registration of trials), science is
somewhat additive. Is it not possible that the low-hanging fruit had been
found earlier, in the 1970s-1990s, and the problem got harder? With the
advances in treatments for cardiovascular disease couldn't the problem just be
harder?

And look at that "harm" study. Turns out it's one of many from the Women's
Health Initiative [1], and it's the test of whether hormone therapy for post-
menopausal women causes heart problems: that is, it's not a test of a drug to
prevent heart conditions, but a test of side effects.

How many other of these studies are like that: studies about effects on the
heart, not trials for new drugs to treat the heart? How did the ratio change
before and after 2000?

In any case, don't trust every popular news article you read about science,
particularly if it's written by Kevin Drum and posted on Mother Jones.

[1]
[https://www.whi.org/SitePages/WHI%20Home.aspx](https://www.whi.org/SitePages/WHI%20Home.aspx)

~~~
saalweachter
I'll go one stronger and say "You Shouldn't Trust Any One Scientific Study You
See".

Individual studies can be really interesting. They're important for
researchers to know about to inform their future work. But any one study -
even ones that are done honestly, with good methodology and sound foundations
- can be just totally wrong. There could be confounding factors you couldn't
have known about that completely invalidate the result. Your test subjects
could be unusual in some way, your animal models could be a poor analogue for
humans in this particular case, you could have just had really aberrant
statistical flukes in your statistic sampling.

It's the _body_ of scientific research, the dozens, hundreds, thousands of
studies stacked on top of each other that bring certainty.

~~~
KineticLensman
Yes. Science can be an adversarial process as people converge on the truth.
Like a legal trial, you shouldn't take out of context any single statement
from a defence or proceution lawyer as representative of the whole truth

~~~
saalweachter
Or it’s like seeing one great play in the middle of the game. It may be a
beautiful play, it might have scored the team you’re rooting for a point, but
until you know the whole game you don’t know who won.

------
icelancer
_" Before 2000, researchers cheated outrageously. They tortured their data
relentlessly until they found something—anything—that could be spun as a
positive result, even if it had nothing to do with what they were looking for
in the first place. After that behavior was banned, they stopped finding
positive results. Once they had to explain beforehand what primary outcome
they were looking for, practically every study came up null. The drugs turned
out to be useless."_

This is a ridiculous "plain English" description of what is happening here,
and I say that as someone who is regularly very critical of academia, drug
trials, and research science (I've lived it; check my bio).

Clinical trials and mandatory registration are great things. It does not mean
that researchers were massively cheating in the past, however - it means that
they were finding secondary and tertiary findings and reporting them instead
of the main investigative thrust of the research. Yes, some blatant cheating
happened, as did p-hacking (though this problem still exists), but to act like
the clinical registration database completely stopped a massive ring of fraud
is ridiculous and the data does not support that, merely a conspiratorial
narrative around it.

------
caymanjim
I'm cynical and have no trouble believing that results were twisted immensely
prior to the rule change in 2000. The incentives are huge: profit, prestige,
or simply job security.

I do wonder if this chart misrepresents something, though: there are studies
that produce incidental--but genuinely valuable--discoveries. It's unclear to
me if that accounts for the pre-2000 results or not. With the new rules, would
there have to be another study stating the new objective?

I'm not questioning that bad science occurs, but I am questioning what this
graph really tells us.

~~~
scotty79
> With the new rules, would there have to be another study stating the new
> objective?

I think it's only fair to force you to replicate at least once the positive
result you think you see in the data you collected for another purpose before
you can claim you got something.

~~~
Balgair
Yes, the linked article is saying that researchers used to employ techniques
like 'p-hacking' (among others) in order to report results that were
favorable/novel. That the scientists and clinicians in charge teased the data
too much.

------
smrtinsert
Clinical trials != scientific studies in general. Motherjones is doing a great
disservice implying equivalence.

~~~
geggam
Explain that to the layman in a way the entire Internet will understand.

Until you can ... they are for all intents and purposes.

~~~
gmfawcett
Hardly equivalent "for all intents and purposes." The purposes of the layman
are nothing like those of the research community.

~~~
geggam
If you put it in the public domain it is your responsibility to ensure it is
digestible by the audience you publish it to.

So yes... public posted information should be layman ready

~~~
radus
Do you apply this reasoning to articles or comments discussing kernel
development? Some topics have higher irreducible complexity than others.

More importantly, I can make whatever I want public, and I can do so with
whatever audience I have in mind.

~~~
geggam
Interesting.

So inciting panic online is fine but not in a theater ?

~~~
metaphor
Your remark finds difficulty in satisfying clear and present danger.

~~~
geggam
Vaccine studies

~~~
metaphor
...are of a certain irreducible complexity which undergo a process of peer
review prior to publication in scientific journals, and rightfully presume a
minimum prerequisite background for proper interpretation? I'm sure the topic
and its meta are interesting to the professional audience in which they were
intended for.

------
superjan
One could argue that this chart too is an example of post-hoc data mining. I
still find the effect plausible, though.

~~~
ineedasername
Post-Hoc or not would depend on what hypothesis the researcher had before
looking at the data. We don't really know, or at least I couldn't tell from
the article.

------
certmd
While many on this forum may understand why what this analysis shows is not
good science, the article does a poor job of explaining why finding positive
results one wasn't originally looking for may not be reliable.

I can imagine a non-statistically minded person thinking "So what it's not
what they were looking for originally? We're missing positive findings now.
This is a terrible regulation." When in reality these "positive" findings were
p-hacked to meet minimum criteria for statistical significance and likely
arose by chance since luck would have it that in any study there will be some
set of data that by chance is a statistical aberration.

------
rossdavidh
While some have pointed out that this very article could be considered post-
hoc data mining, I think it is not. This exact kind of affect was what was
intended by the requirement to pre-register. Looking afterwards to see if the
intended affect (change in how many studies report benefits) appears to have
happened in reality, makes perfect sense. They didn't institute the
requirement at random, and then later consider that maybe it could have an
impact of reducing (perhaps spurious) findings of significant benefit; that
was more or less the only reason for doing so.

~~~
LanceH
The article claims the drugs are worthless. The studies only show they aren't
successful in a single pre-registered result.

------
DrNuke
From a science perspective, positive results are overrated; the most important
aspect is to investigate a “good” question. Researchers understand “good” has
not the same meaning everywhere or to every stakeholder. More at large,
blurred journalism is attacking science to undermine the scientific method,
aka meaningful questions + proper scrutiny + full transparence. This is a sign
of the times, though.

------
brownbat
In our world, it's considered malpractice if you have the option to include
double blinding in your study design, but opt not to for convenience.

In some alternate universe, it is considered malpractice for those who design
the study to be the same group that runs the study.

I don't think we can get there from here, but if we had a core track of
theorists who designed studies, and a second equally prestigious track of
practitioners, who independently tested and ran studies, experimental science
would be much more rigorous.

Your prestige should be tied to your ability to identify novel experiments to
try, or in rigorous testing procedures, never tied to your ability to shape
data to make your claims appear grand.

------
qyz721
I often wonder about the kind of research done in computer science--how much
is it influenced by the kinds of things that easily get you a Ph.D., versus
the kinds of things that are useful but less apparently flashy.

A lot of the PL students at my school are extremely wary of doing any follow-
up work on ideas that have been published before, even if the implementations
of those things are obviously shoddy and don't really demonstrate that the
idea works. There's a lot of novelty chasing, which is part of what's pushing
people to include deep learning in their work, since it often allows them to
claim novelty, even if their results aren't very good.

~~~
SomewhatLikely
Another big influence is what problems can attract funding.

------
qwerty456127
> Then, in 2000, the rules changed. Researchers were required before the study
> started to say what they were looking for. They couldn’t just mine the data
> afterward looking for anything that happened to be positive. They had to
> report the results they said they were going to report. And guess what? Out
> of 21 studies, only two showed significant benefits.

Why is this considered good? Isn't this just a counterproductive limitation?
Significant benefits found without knowing what are they going to be in
advance are still significant benefits and if they are observed scientifically
and proven reproducible I'm glad we've found them.

> Once they had to explain beforehand what primary outcome they were looking
> for, practically every study came up null. The drugs turned out to be
> useless.

Aren't newly discovered drugs meant to undergo strict and targeted clinical
trials? How can they even be considered being drugs before this? And how can
they turn out to be useless after passing this stage?

Also in some cases when nobody wants to fund clinical trials despite very
interesting life-enhancing effects supposed or when it's clear the time to
general availability through the fully white research and approval chain is
going to take longer than people want to wait some non-approved substances
happen to be sold on eBay (or, for more questionable substances, on black
market), hundreds or thousands of people buy it and report their experience on
reddit and this data can be a source for further clues for research.

~~~
monktastic1
Are you familiar with the multiple comparisons problem[1]? The problem is that
in any sufficiently rich dataset, you can find _something_ unusual-looking,
and if you're not straightforward about how much digging you had to do to find
it, it will look much more special than it actually is. So if a benefit is
found by chance, that's fine, but it should only motivate further research
which specifically look for that effect.

[1]
[https://en.wikipedia.org/wiki/Multiple_comparisons_problem](https://en.wikipedia.org/wiki/Multiple_comparisons_problem)

~~~
qwerty456127
> but it should only motivate further research which specifically look for
> that effect

Indeed. But this doesn't disqualify looking for "something" as a valid and
useful method of research to be a stage of the whole research chain. That
ought to be allowed although research papers produced this way should make
this clear.

~~~
lolc
It's called exploratory research and everybody is fine with it when it's
labeled as such.

------
amirmasoudabdol
Although I agree with the argument but their chart and their data actually do
not represent what they are claiming. I’ve checked their paper while back [1]
and contacted them about it but haven’t heard anything back. Their argument is
sound but only because!

[1]: [https://amirmasoudabdol.name/likelihood-of-finding-null-
effe...](https://amirmasoudabdol.name/likelihood-of-finding-null-effects-over-
time/)

------
fizzigy
The article seems to suggest that scientific studies aren't reliable, but
fails to point out the key takeaway that now (post registration requirement),
studies are _much_ more likely to be valid/useful than before.

So the headline really should read something like "Why you should trust
scientific studies a lot more now than you did before"

------
nemo44x
Good science has its methods published, is reproducible and is peer reviewed.

Another, perhaps better, headline would be: "Why You Shouldn't Trust Every
'Study' You See".

~~~
jacksnipe
No True Scotsman. In fact, I’m pretty sure this is just another form of post-
hoc reasoning.

------
cup
The downside of this though is that unexpected positive results sometimes get
buried. For instance, clinical trials have to report all adverse effects but
not positive effects. We've had drugs tested for symptom X fail to have any
effect, while causing bald people to have their hair grow back. Yet the
company we were testing it for wasn't interested and those results never made
it into the public domain.

~~~
rickycook
i would assume that the positive results still get reported somewhere in the
paper? so other people could still pick up on it in the future.

if they’re positive enough then the drug company will fund another study with
that as the primary outcome. that seems prudent too: it should be the primary
thing you’re examining so that you can design the study correctly, rather than
a simple “oh and by the way” side note

------
corysama
This would seem to indicate that the results of drug research are
unpredictable. Therefore, if you are required to predict your results, you are
set up to fail.

My question is: Was the free-range research _actually_ effective? Or, was it
"technically-correct effective just so you can't call me out on a failure,
moving on..."

------
whack
Is it possible that prior to 2000, the studies that found a null result, were
simply not published? Hence why they are underrepresented in that chart? My
understanding is that it's very hard to get null results published in
scientific journals, though I'm not sure if that applies to clinical studies
as well.

~~~
Fomite
One interesting effect that pre-registration has is that there are now several
journals that allow you to publish your _protocol_ , and agree to publish the
followup paper about the results regardless of the outcome.

------
thedirt0115
This is very similar to A/B Experimentation -- Gotta force the user to select
a primary metric BEFORE the experiment starts to keep 'em honest.

------
sudofail
I have a general rule to ignore any statistical interpretation in articles
like these. Data interpretation is hard. There are a lot of variables that
need to be accounted for, and under the best circumstances of academic peer
review, we still get it wrong.

This article also feels like it has an agenda. Maybe I'm not familiar with
MotherJones, but the tone of the article strikes me as unprofessional. And the
headline, that's obviously correct and means nothing. It's like saying, "Why
You Shouldn't Trust Every Stranger You Meet".

------
leesec
Reminds me of this old classic, one of my favorites:

[http://slatestarcodex.com/2014/12/12/beware-the-man-of-
one-s...](http://slatestarcodex.com/2014/12/12/beware-the-man-of-one-study/)

------
ineedasername
I'd say a good rule of thumb is that the smaller the number of observations,
the less you should trust any research that hasn't been reproduced. And when
it comes to publication of false positive results, always remember your XKCD:
[https://xkcd.com/882/](https://xkcd.com/882/)

------
taruz
I want a chart like that but for journalism.

~~~
clircle
I don't follow this comment. What would you plot on the vertical axis?

~~~
nostrademons
Clickthroughs, presumably.

I assume that taruz is saying that journalists should be required to say what
they're investigating _before_ publishing an investigative report. Right now,
they start investigating, and if there's _something_ outrageous (even if it
wasn't what they were initially looking for), they publish. Sometimes they
even skip the "start investigating" part and just put up a tip-line for anyone
who has a beef to get a story out there.

This is great for manufacturing outrage and hence clicks, but it gives the
public a hugely skewed perspective on how the world is. Imagine that 0.01% (1
in 10,000) of all peoples' actions are outrageous and will piss off a large
portion of planet. Most people, by those numbers, would say that the majority
of folks are decent, law-abiding citizens. Now imagine that a news outlet is
allowed to freely go over someone's life, and they end up evaluating 1000
actions. There's a 10% chance that they'll find something outrageous. Now
imagine that 10 such reporters do this to 10 people, and if any one of them
finds something publishable (= outrageous), they go to press. Suddenly there's
a 64% chance that _one_ of them will find something, and you've likely got
your news cycle for the day.

With the millions of people looking for something bad that a tip-line can
generate, outrage is virtually assured. And that's where journalism is today.
The world isn't actually a worse place than it was in 1980; in fact, by most
metrics it's significantly better. But we've increased the amount of
unpleasantness that people can be _exposed_ to by 3-4 orders of magnitude and
then implemented a selection filter that ensures that only the worst stories
go viral. _Of course_ we get only bad news; that's all that's economically
viable, and we have such a large sample size that we can surely find it.

~~~
jernfrost
That is a natural outcome of the hyper capitalist world we currently live in.
I have a mother who has been a journalist for over 40 years. I've heard for
years about the pressure sell. Article titles often get changed against the
will of journalists to be more click-bait like to sell more.

As people pay less for good journalism, and ad revenue is shrinking, the media
is getting desperate to stay afloat.

Journalists today have significantly less independence than they had 15 years
ago. Everything is far more controlled and geared towards sales.

I would like to see more alternative finance models for quality journalism.

Given how social media is actually starting to destroy democracy, it may be
worth considering government grants to independent media organizations as a
part of national defense. A democracy cannot function if the media is utterly
broken. That means citizens are no longer capable of making informed
decisions.

------
monochromatic
>Post-hoc data mining is very bad, except when you’re gunning up evidence of
anthropogenic climate change, which is very good

~~~
noname120
For the record, I believe that you didn't get downvoted for suggesting that
there is post-hoc data mining happening around anthropogenic climate change.
Instead, it's because you're providing no evidence for this and it's in
addition offtopic.

