

Registered clinical trials make positive findings vanish - Alex3917
http://www.nature.com/news/registered-clinical-trials-make-positive-findings-vanish-1.18181

======
dthal
It seems that the main issue here is that with pre-registration, study authors
have to pick a single measure of primary benefit at the outset, whereas
before, they _might_ have made that choice after getting results back. The
original study is at PLoSONe, and it is not a difficult read[1]. From that
source:

>>Prior to 2000, investigators had a greater opportunity to measure a range of
variables and to select the most successful outcomes when reporting their
results... Among the 25 preregistered trials published in 2000 or later, 12
reported significant, positive effects for cardiovascular-related variables
other than the primary outcome.

That is, in most cases, there are large effects for some outcome, and if they
get to choose the primary outcome after looking at some results, they could
have been cherry-picking the outcome variables.

[1]
[http://journals.plos.org/plosone/article?id=10.1371/journal....](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132382)

~~~
bunderbunder
_if they get to choose the primary outcome after looking at some results, they
could have been cherry-picking the outcome variables_

Not just could, would. Choosing your hypothesis after you run the experiment
is (or at least should be) a cardinal sin in science for good reason. At the
standard p-value cutoff of .05, even when there's absolutely no effect going
on the probability of getting a spurious positive result when you do n
comparisons is equal to 1 - (.95^n).

So that 5% chance of a type I error if you only look at one test statistic
jumps to 40% if you look at ten, and to 72% if you look at 25.

Here's a nice piece of gonzo journalism that deals with this issue:
[http://io9.com/i-fooled-millions-into-thinking-chocolate-
hel...](http://io9.com/i-fooled-millions-into-thinking-chocolate-helps-
weight-1707251800)

~~~
jsprogrammer
Cardinals and sin are the realm of religion. Not sure why it's being brought
up here.

Choosing a hypothesis after the experiment is run is perfectly valid as long
as your experiment is valid for that hypothesis. Besides, you would always run
a new experiment again anyway.

~~~
bunderbunder
So perhaps it's worthwhile to trot out the idea of exploratory vs.
confirmatory research.

In exploratory research you collect a bunch of data, and then mine it for
interesting associations that might merit further study. It's an essential
part of the scientific process, but anything you find from doing it needs to
be treated as extremely tentative because it's liable to produce spurious
results at least as often as it finds genuine effects.

But that's not what this paper's talking about. It's talking about experiments
that are being used to support the approval of new treatments and drugs.
That's confirmatory research. In that realm you absolutely must paint the
target on the wall before you throw your darts.

~~~
jsprogrammer
Those are ways of looking at it, but experiments have definite structures and
that structure can be exploited to create experiments that potentially reveal
more information than another experiment.

If the experiment is constructed correctly, it can support validation of
multiple hypotheses.

I would completely believe that most experimentation being performed currently
is not structured to make further exploitation possible.

~~~
nitrogen
_I would completely believe that most experimentation being performed
currently is not structured to make further exploitation possible._

Could you provide an example of an experimental structure that is immune to
the statistics being discussed?

~~~
jsprogrammer
I should have clarified that statement as I don't think it's quite correct.

It should read:

 _I would completely believe that most experimentation being performed
currently is not structured to make further exploitation, of the type desired
/wished, possible._

There are almost certainly facts available that are not discovered/discussed
from past experiments. Many of them are likely trivial and/or not what
researchers would wish or hope that their data could tell them. However, they
can still be mined.

In any case though, you would still re-run experiments to further
validate/reject the hypotheses. That is simply basic science.

------
duaneb
Christ, that's frightening. I don't think pharmaceuticals even need to be
conscious of their bias to cause massive harm.

Would this explain the propagation of, umm, data-challenged psychiatric drugs
in the 80s and 90s?

~~~
pdabbadabba
> Would this explain the propagation of, umm, data-challenged psychiatric
> drugs in the 80s and 90s?

I'm not sure what you're referring to. Could you unpack for those of us who
haven't followed this but would like to learn?

~~~
Alex3917
[http://www.alexkrupp.com/Citevault.html#pharmaceuticals](http://www.alexkrupp.com/Citevault.html#pharmaceuticals)

See also:

[http://www.amazon.com/Anatomy-Epidemic-Bullets-
Psychiatric-A...](http://www.amazon.com/Anatomy-Epidemic-Bullets-Psychiatric-
Astonishing/dp/0307452425/ref=pd_sim_14_1?ie=UTF8&refRID=1TPM11KTRTXM9SGBCHJT)

[http://www.amazon.com/Emperors-New-Drugs-Exploding-
Antidepre...](http://www.amazon.com/Emperors-New-Drugs-Exploding-
Antidepressant/dp/0465022006/ref=sr_1_1?ie=UTF8&qid=1439920866&sr=8-1&keywords=the+emperor%27s+new+drugs)

~~~
pdabbadabba
I was hoping for some sort of accessible synthesis of the available
literature, since I don't have the time or inclination to get eyeballs deep in
this issue. But thank you for the links.

------
danieltillett
Everyone knows what the problem is. It is not that the drugs don't work (well
most of them), it is that they only work for a small subset of people. It is
possible these days to figure out which drugs work for which patients, but
doing this makes the drug non-viable commercially. We are stuck with a system
where the way a drug is tested is totally different to the way it will be used
in practice - a drug has to work in a large percentage of people in the trial,
while once improved it just becomes another drug through which people get
cycled while their doctor tries to figure out which one works for the
individual.

~~~
ZoFreX
This isn't what "the problem" is. There are lots of problems, but one of the
big ones this combats is systematic deliberate fraud. Companies were cherry-
picking the trials that got results (and if you run enough trials, some will
have results just through chance) in order to get drugs approved that were not
any better than placebo.

If there is actual data to suggest who the drug works on best, and empirical
tests to determine who that is, it can absolutely be tested within those
limits and approved for use on that population.

~~~
danieltillett
The reason the companies are cherry picking the results is they know this
basic problem exists. Only drugs that have evidence of working at stage I and
II progress to stage III.

What happens is the drug companies decide at stage III what patients to let
into the trial. What they are trying to avoid is anything that will limit the
market size once approved. This means they try to skate as close as they can
to the effectiveness limit and still get the drug approved.

For example, if they have a new cancer drug that they know only works on
cancers with certain mutations, what they will try to do is to test the drug
in a broad population of patients including those without the mutation. Since
for the approval process it does not really matter how effective the drug is
(as long as overall it is statically significant) and that you can only market
to the population you tested in, you will want to shove in as many “filler
patients" as you think the drug can support. Due to the difficulty of
designing large trials they often get the filler number wrong. I don’t condone
this activity, but this is the commercial reality.

More fundamentally the pharmaceutical industry is being crushed between two
very powerful forces - the cost of developing new drugs continues to get more
expensive every year (mostly thanks to regulations), while the real market for
each drug shrinks as sub-populations are identified for each disease. Combine
this with a refusal of society and insurance companies to pay ever increasing
drug prices and the outlook for the industry is not great.

~~~
ZoFreX
I've never heard this conspiracy theory before. Have you got some written
evidence to hand?

~~~
danieltillett
This is not a conspiracy theory. Who to include in a stage III trial is one of
the most difficult problems facing pharma management. They aim to make these
as broad as possible and lots of drugs have died because the stage III trials
were made too open.

If you want to get a good idea of what really goes on in the pharam industry
have a read of the “In The Pipeline” blog [1]. Most of the real value is in
the comments.

1\.
[http://blogs.sciencemag.org/pipeline/](http://blogs.sciencemag.org/pipeline/)

------
storm90
Over 50% to under 8% , that's a big change. This casts doubt on the pre-
registry studies.

~~~
kbenson
Yes, but as noted in the article some of that may be because of other factors
that have changed in the meantime. I.e. if hospital treatment for heart
disease has advanced and noticeably affects outcomes at that level, it could
change the overall outcome of a study about heart disease.

I myself don't anticipate that accounting for much though.

~~~
hga
Without intimate familiarity in the field it's really hard to figure out cause
and effect in something so complicated.

It's been noted that it's become much harder to find new drugs. Conventional
explanations are that we're pretty good at this by now, and all the low
hanging fruit has been picked (there's also fields like antibiotics where few
are trying for unrelated issues).

More rigor in studies could be a cause, and/or there may be fewer successes
because its become that much harder, and more iffy attempts are being made.

~~~
kbenson
Yes, that's a more eloquent way of saying exactly what I was trying to
express, and what I interpreted the article as alluding to. Thanks!

------
lemming
If anyone would like (a lot) more information about why this is a significant
change, Ben Goldacre's book Bad Pharma is both excellent and terrifying. I
think much harder about taking medicine unless it's really required since
reading it.

Click click: [http://amzn.com/0865478066](http://amzn.com/0865478066)

------
Gatsky
I think the key finding is this:

"Results for all cause mortality were similar. Prior to 2000, 24 trials
reported all cause-mortality and 5 reported significant reductions in total
mortality (25%), 18 were null (71%) and one (CAST) reported significant harm
(Table 3). Following the year 2000, no study showed a significant benefit for
total mortality."

So 24/30 pre-2000 trials they looked at reported all cause mortality. This is
important because all cause mortality is the gold standard endpoint in a
clinical trial. If you were cherry picking an outcome measure, this is the
last one you would choose as it's the hardest one to get a positive result
for.

This casts doubt on the paper's central thesis. How to explain the difference
in pre and post registration all cause mortality? The possibilities are: 1\.
Trials performed after 2000 are genuinely less likely to be positive because
of altered funding, lack of low hanging fruit etc. It is a well known problem
that estimating the survival of the control group in a randomised trial based
on historical data is usually an underestimate, which messes up the power
calculation and the chance of a positive result. Beyond a certain point, it
becomes infeasible to do a clinical trial to show a benefit over a very
healthy control group, because you would need 100,000 patients. 2\. After
registration became mandatory, fraudulent investigators who make up their
results stopped doing clinical trials.

The other point is that in most areas, there are only a few acceptable
clinical trial endpoints. For example in cancer studies, there are really only
3: overall survival (time alive), event free survival (time until the cancer
comes back or starts growing) and response rate (% of cases where the tumour
shrinks by 20% or more). Cardiology is a bit different because there is less
consensus about valid endpoints. Nevertheless, clinicians and regulatory
bodies are pretty strict about which endpoints they consider meaningful.

So in my opinion the authors chose the subject area that would give them the
most 'shocking' result (Cardiology, due to having more options for endpoints),
glossed over the interpretation (why such a big difference in the gold
standard endpoint that is supposed to be immune to manipulation?), and over-
hyped the significance of the result.

------
jostmey
Wow! In 25 trials 2 treatments were beneficial and 1 was detrimental. It would
seem that the process of finding new treatments is horribly inefficient and
perhaps just completely broken given that so few treatments were found to
work. I mean weren't most of these treatments found to work on animals first?
What's going on???

~~~
qiqing
We humans are biologically quite different from a lot of the species we
experiment on. To take a salient example, we've cured cancer (all sorts of
cancers) in mice hundreds of times. It's unlikely you'll find a tumor in a
mouse we can't fix. But the same things don't work in humans.

And although I feel like a weird contrarian saying this, it's entirely
possible that by selecting for treatments that are effective in other species,
we may be inadvertently missing out on things that might work in humans.

~~~
Florin_Andrei
> _we 've cured cancer (all sorts of cancers) in mice hundreds of times. It's
> unlikely you'll find a tumor in a mouse we can't fix._

Wait, what? Really?

If that's true, I can see only one explanation - with mice models, we can do
far more aggressive research, and therefore we cover a bigger chunk of the
parameter space. Some of the same research strategies may not be ethical with
human models.

~~~
icegreentea
Well, there's a whole bunch of other stuff too. Most of the time with animal
models, we induce cancer by a variety of means - examples include using breeds
with certain cancer suppressing genes knocked out, heavy exposure to
carcinogens, or direct grafting/injection. These are all mechanisms of getting
cancer that are extremely atypical in the human beings we are trying to cure,
which may (I think its super likely, especially in the case of knockout genes
and grafting) influence cancer-treatment interaction.

~~~
danieltillett
Exactly. I know of no mouse study that actually tries to replicate what we
want to do in humans.

In regards looking for a cure for cancer we are like the drunk who lost their
keys in the dark alleyway, but who is searching around the street lamp since
the light is better. Until we actually start doing the right studies we will
really struggle getting anywhere.

------
RA_Fisher
For those interested, I did a little meta-study of my own on the distribution
of p-values.
[https://news.ycombinator.com/item?id=10077042](https://news.ycombinator.com/item?id=10077042)
It would be pretty interesting to build a larger p-value database. It could be
a first step to measure how bad non-registered statistical-based science
really is and allow us to measure the benefits to registration.

------
BiologyRules
Looking at some of the studies in diseases that interest me on
clinicaltrials.gov, I find most of the studies are very uninspired.

------
novaleaf
I learned about this effect in Harry Potter and the Methods of Rationality.
Who knew?

