
Make journals report clinical trials properly - bootload
http://www.nature.com/news/make-journals-report-clinical-trials-properly-1.19280
======
Pyxl101
It's always interesting to look at past ages through the lens of the modern
age. I wonder if, decades or centuries from now, the misuse of statistics and
inadequate up-front documentation of research methods in science will be seen
as a great scientific scandal, or scientific failure, of the 20th and 21st
centuries. We should have known better, but many of us failed in these basic
ways.

An NPR episode of Planet Money covered the "Experiment Experiment":
[http://www.npr.org/sections/money/2016/01/15/463237871/episo...](http://www.npr.org/sections/money/2016/01/15/463237871/episode-677-the-
experiment-experiment) and described how the Reproducibility Project was
unable to successfully produce a significant percent of studies that they
undertook to reproduce. There is legitimate concern that data-dredging leads
to misunderstanding or misrepresenting of results; and even if some of the
studies are valid science, it's clear that the documentation of the studies do
not suffice to allow others to replicate it, which is the bedrock of science.

The episode above also discussed a number of psychological fallacies and
errors that experimenters make. Imagine that you begin a study of 200 people,
measuring some variable. You see promising results, so you decide to extend
the study and add more people; you test another 100 people.

By adding 100 more people to your study, have you _increased_ or _decreased_
the likelihood that your results were due to statistical chance? Counter-
intuitively, the answer is that you have _increased_ the odds that the result
was due to chance. You cannot add or remove people to a study while it is in
progress, based on preliminary results in the study, without impacting its
statistical validity in important ways.

These and other changes that experimenters make during the course of an
experiment create a great risk of conducting bad science. I for one support
any effort to encourage all research studies and methods to be declared ahead
of time.

~~~
nonbel
>"Imagine that you begin a study of 200 people, measuring some variable. You
see promising results, so you decide to extend the study and add more people;
you test another 100 people.

By adding 100 more people to your study, have you increased or decreased the
likelihood that your results were due to statistical chance? Counter-
intuitively, the answer is that you have increased the odds that the result
was due to chance."

I'll assume you are talking about using a t-test to see if two groups are
samples from the same population. The problem you point out has nothing to do
with chance. It is that you started with a null hypothesis that your groups
were independent samples from the same distribution, but by having the second
100 people sampled conditional on the results of the first 200, you have
ensured this is not true.

Such research designs are just a way of making sure the null hypothesis is
false. If you get rid of the incorrect attribution to "chance", I don't see
what is counter-intuitive about it.

~~~
wodenokoto
> but by having the second 100 people sampled conditional on the results of
> the first 200, you have ensured this is not true.

I still think this is counter intuitive, and you must have a lot of practice
in the field to get an intuitive feel for such cases.

The next 100 people are sampled just as randomly as the first 200. So by
adding 100 people, we are essentially re-doing the experiment with a larger
sample.

So how could the original experiment be valid if it had had 300 samples to
begin with, when the now-augmented experiment, which for all intend and
purposes is the same experiment, isn't valid?

I am not defending the validity of the above argument, but I am defending that
it _sounds_ pretty damn obvious.

~~~
nonbel
You have to think about the hypothesis you are testing. This would be that
groups A and B (including both the first set of 200 and second set of 100
people) have been independently sampled from the same distribution. This
hypothesis is used to calculate a prediction of the expected results.

Instead your data is forced to consist of a sample where
mean(A)-mean(B)=delta, where delta>0, for the first 200 people. Knowing that,
would you make the same prediction about the final result (after the data from
all 300 people is in)? It would be the same as getting data from all 300
people at once and adding delta to the first 200 in Group A. Clearly you have
specifically created a deviation from the original hypothesis by _design_.

Also, let me note I said nothing about validity. I don't consider testing a
hypothesis different than the research hypothesis to be a valid scientific
activity. Just because the null hypothesis is false does not mean your
research hypothesis is accurate or useful, so it is pointless from a
scientific perspective. To me, this is on the level of arguing whether the
Holy Spirit proceeds from the Father, or the Father and the Son. The entire
premise behind the discussion is flawed, but we can still discuss the proper
application of reason/logic to the arguments flowing from this flawed premise.

------
Pyxl101
I would like to eventually see a "science results build system", where
experimenters can register (preferably ahead of time) the analytical methods
they will use to analyze their data, expressed as runnable code. Any data sets
already available, on which they plan to rely, will be similarly registered
and made available to the "build system".

When the study is complete, the researchers upload their data, in as raw a
form as technology permits. All analysis performed on the data to achieve the
results should be expressed through runnable code evaluated by the build
system. Ideally 100% of this code would be registered in advance, though to be
realistic, in practice we'd find some of it to be written along the way.

Whether the code is registered in advance or written and registered along the
way, in both cases the build system will have the complete code and data set,
and will be able to reproduce the results. Ideally the experimenters simply
take the output of the build system as their actual results. The scientific
community will have complete visibility into the analytical methods of the
study, since all data and analysis will have been captured by the build
system.

The build system would ideally be agnostic to tooling and data format. Perhaps
it is something like a data store alongside a runtime platform, capable of
taking a virtual machine image and a data set, and executing the VM with the
data set as input.

Tagline: "GitHub for scientific data and runnable experimental analysis"

~~~
pjschlic
This reminds me of the recent Planet Money "The Experiment Experiment" and
they pointed out some journals requesting pre-registered experiments (pre-
commited experiment methods and analysis submitted to somewhere online before
the experiment is run), it's at the 18-ish minute mark.

------
timrpeterson
Irony for Nature to be talking ill of other journals' research reporting
"standards". Pot. Kettle. Black.

~~~
capnrefsmmat
I was reasonably impressed with Nature's new(ish) reporting checklist, at
least, which exceeds what most other journals are doing:
[http://www.nature.com/authors/policies/checklist.pdf](http://www.nature.com/authors/policies/checklist.pdf)

~~~
ekianjo
Yet Nature regularly publishes studies which are not properly statistically
powered or with wrong p-values conclusions.

~~~
tokenadult
For example?

------
gdewilde
John Ioannidis: "Reproducible Research: True or False?" | Talks at Google

[https://www.youtube.com/watch?v=GPYzY9I78CI](https://www.youtube.com/watch?v=GPYzY9I78CI)

