
Congratulations, Your Study Went Nowhere - cpeterso
https://www.nytimes.com/2018/09/24/upshot/publication-bias-threat-to-science.html
======
carlmr
>Researchers should embrace negative results instead of accentuating the
positive

The problem starts here. Most researchers would love to show their negative
results, they're well aware of the problem. But they need to publish, they
need grants, they need money which is outcome independent. They need an
alternative reality where negative results and reproducibility studies make
money.

As it stands you get money for publishing praise, a lot of industrially
sponsored medical research is just "9/10 doctors recommend x" advertising.

We need to instate the rule that only preregistered studies can be published
or at least used by the FDA as for decisions.

Right now you can conduct 1000 studies until you find a handful which randomly
show the result you want.

~~~
08-15
> We need to instate the rule that only preregistered studies can be published
> or at least used by the FDA as for decisions.

That won't help, not by itself. Preregistration helps with "p hacking", the
practice of moving the goal posts until some result become significant. A
bigger problem remains.

Medicine and Biology accepted the framework of Statistical Hypothesis
Inference Testing, where a null hypothesis is rejected at some p-value,
usually 0.05. Ignoring many faults of this framework (for example that the
alternative hypothesis is not tested at all, that it is a bad caricature of
the bad statistics R. A. Fisher introduced, that the numerical outcomes depend
on the probability of events that _didn 't_ happen, etc.), the logic is that
you limit the false positive rate to below the p-value threshold.

Unfortunately, journals consistently reject submission, unless there is a
significant p-value somewhere in there. ("Highly significant" is better, even
though the term makes no sense.) So if plenty of researchers investigate some
random nonsense where the null hypothesis is actually true, 100% of published
studies will be spurious results, preregistration or not.

To make progress, this kind of statistics has to go. Journals would have to
change their policies, which probably means peer reviewers have to change
theirs. I have no idea how to make that happen.

~~~
Bartweiss
Preregistration does help with this issue a bit. It won't make published
studies any better, but it still helps solve the file-drawer problem if the
results are available _somewhere_. If you take a shotgun approach to getting
significance out of null hypotheses, you presumably get 5% successes. (Sort
of. Maybe. p-values _really_ stink, as you point out.) But if 20 people all
run variants of the same bunk study, preregistration means that instead of
just seeing one success, we see one success on the cover of PNAS, and then at
least have the option to look up the ~10 prior failures and decide not to
accept the conclusion. (Well, one time in twenty it'll be the first try, but
it's still progress.)

Of course, that's still going to push researchers to change up their
techniques for getting false positives. It rewards either covering unexplored
domains (so there's no counterexample to your false positive), or exploiting
other tricks to manipulate significance in preregistered work.

I don't think existing journals have much hope of improving their practices,
honestly. Between their ludicrous financial incentives and ideologues talking
"methodological terrorism", that matter seems depressingly settled. But I
wonder if there might be hope for creating a counterweight to this?

What happens if Gelman and Khahneman get together to create the _Very
Prestigious Journal of Replications_ and the _Somewhat Prestigious Journal of
Rejected Hypotheses_? The first one is neat because it rewards negative
results, since they change the state of what's understood - at the very least
it might encourage more people to start checking the most influential and most
suspicious results. The second thing might be a harder sell, but it could
reward comprehensive negative work on key topics (much like Alzheimer's
research has heavily consisted of excluding possible pathways). Honestly,
"create new prestigious journals from scratch" sounds far easier than "reform
existing journals" or "tenure and reward professors for reasons other than
publications".

~~~
08-15
> Honestly, "create new prestigious journals from scratch" sounds far
> easier...

What used to be "prestigious" has come to mean "high impact factor", mostly
because funding agencies needed an "objective" way to grant funding, and being
run by bureaucrats, they picked impact factor as a measure of scientific
success. For a Journal Of Replications to have impact, researchers would have
to cite papers about replications. But that's not expected; you cite the
original paper, not the one that replicates a result, and you never hear about
the one that failed to reproduce a result.

I don't see that approach working, but I'd like to see someone try it.

~~~
Bartweiss
> _For a Journal Of Replications to have impact, researchers would have to
> cite papers about replications._

Yep, this is definitely the biggest weakness for the idea. A journal about
replications would probably end up with low impact factor, and wouldn't be
able to grab a groundbreaking result to boost its prestige the way some
narrow-topic journals have.

My only thought for handling that was alluded to with "Gelman and Kahneman";
maintaining a tolerable reputation and impact factor would probably require
leaning on big names and people who are devoted to doing replication work
regardless of journal. Gelman, for instance, showed up in PNAS with references
when he criticized that godawful "boarding and air rage" study. If we're
willing to abuse the process even further, there might be room to publish
methodology-of-replication papers in statistical or procedural journals and
create further citations that way.

It's not a very good answer, and I don't think it's going to free anyone from
"publish or perish". The best I can really hope for is creating a bit of space
for talented, skeptical researchers to add replications to their other work
without needing to be tenured and entrenched first.

~~~
grigjd3
A big part of the problem is there are too many PhDs chasing too little
funding. There's no room for negative results when the competition is that
big.

~~~
Bartweiss
I wish I could find it, but there's a superb piece out there about the
inherent dysfunction of lottery-style careers. The basic idea was that there
are a handful of careers where success is highly rewarded, but job count and
quality don't really respond to the market.

In academia, there are however many tenured professorships with whatever
salary, set by largely non-market forces. In Hollywood, there are only so many
recognizable stars at a time who can guarantee sales, so their salaries don't
really decline no matter how many people vie for their spots. And the same in
pro spots, music, politics, Harvard admissions, etc.

And the result is that without adding new jobs or dropping salaries, the
people hiring balance the market _somehow_. Unpaid internships, selling
expensive credentials, would-be singers selling hits to existing stars,
massive university donations by parents, vicious hazing on football teams,
horrifying abuse by casting agents; people who control these 1,000:1 (or
higher) seeker-to-slot fields manage to exploit that eagerness to all kinds of
horrible ends. [1]

Academia is probably less bad than many of those fields, perhaps because the
rewards at the top are less spectacular. But it's not a coincidence that Brian
Wansink's big scandal was about pressuring a grad student to find results in
null data, or that grad students and postdocs end up working miserable hours
to be third author on their own research.

I don't have an answer here, really. But I certainly agree that as long as
there are endless ranks of people struggling to make their names, people are
going to keep finding ways to get dramatic-looking results, data be damned.

[1] I suspect this stuff really kicks in at several hundred to one or higher.
Harvard doesn't reject this percentage, but note that it applies a time-and-
money filter before the applications even arrive.

------
petercooper
I got put off science at high school when I'd get marked down for having
"wrong" but truthful results to physical experiments, so I ended up going
around and averaging out other people's results (not their conclusions or
write-ups, just the raw numerical measurements) which yielded better marks. I
wonder if an element of that is at play in corners of the broader scientific
community.

~~~
Leszek
High school physics experiments are somewhat of a different case, as "wrong
but truthful" results are probably evidence of an incorrectly performed
experiment, performing which correctly is part of what one is assessed on.
That, or bad luck, but presumably the experiments are sufficiently simple that
performed correctly they all but guarantee a "correct" result.

~~~
leetcrew
i guess it depends how complex the experiments are at your particular high
school, but in my experience we did not have terribly precise instruments or
techniques, so the labs would have a high degree of random error. since the
lab usually had to fit into 75 minutes, the procedure usually didn't call for
many trials. labs where we would perform 5 or more (up to twenty in one lab)
trials for each step in college might have 3 or even just one trial in high
school.

if you have poor quality equipment and very few trials, the results are going
to be all over the place. if you average the whole class's values, you might
actually get close.

~~~
jerf
The time I learned how important it is to forge numbers is when we were
measuring gravity in high school by rolling a ball off the end of a table and
using a stopwatch to time how long it took to fall to the ground by starting
it when the ball fell off the end and stopping it when it hit the ground.

To free fall one meter on Earth is about .45 seconds.

If you are systematically off and get .35 seconds, you get a gravity of
16m/s^2. If you are systematically off and get .55 seconds, you get a gravity
of 6.5m/s^2. It isn't that hard to be systematically off in your button
presses by that amount even in ten trials. The noise is huge compared to the
signal, and averaging several attempts can only smooth out random noise, not
systematic miscalculations of when the ball leaves the table or lands.

I got the second result. I got a bad grade. I learned my lesson. Remarkably
every subsequent experiment I ever ran was within ten percent of the real
value. My turnaround was truly amazing.

~~~
grigjd3
Manually triggered stop watches are a terrible way to measure gravity. Human
error is going to dominate the result.

------
air7
I've had an idea about this problem that I'd be happy to hear your thoughts
about:

Simply put, a promise for an independant future reproduction study should be
part of the published paper.

Once a researcher achieves a publishable result, she looks for a peer-
researcher that will commit to perform a pre-determined reproduction study in
the near future. This promise is written in the original paper.

This ensures that a negative reproduction would definitely be published. It
incentivises the original researcher to not mess too much with the data post-
hoc, and to be as helpful as possible to their reproducing peer. The peer gets
a citing _before_ writing the paper, and all the help they'd want to get the
study done as quickly and easily as possible (q&a, analysis code etc).

~~~
Bartweiss
This is a really interesting idea.

Adversarial collaboration has produced some interesting results, and is
probably our best bet for settling arguments on topics where results are
consistently rejected over methodology. It's done good work on ESP, and shows
some promise on priming if anyone will actually sign on.

But that basically requires finding fields with conflicting viewpoints and
well-understood methodology spats, which means established debates. This idea
would get the same effects - experiment design that's accessible and
verifiable - on untested topics, while simultaneously baking replication
attempts into initial publication.

The more I think about this, the more impressed I am. It guarantees
replications, it guarantees data and methodology availability, and it makes
replications a publication-worthy step by making them part of the initial
'success'. It doesn't solve the file drawer or salami slicing problems, but it
does huge work to sidestep them by forcing another _p <.05_ which isn't
subject to them. And wildest of all, it might even be acceptable to journals
in a way that "publish replications and null results" isn't. Thanks for the
most creative approach to the replication crisis I've heard in ages!

------
afpx
Furthermore, it would be extremely useful if scientific research was
transparant from inception to outcome.

Scientific ‘papers’ are archaic forms of knowledge transfer suited for a time
when physical paper was the only way. These days, science would serve us
better if we could see everything involved in the process. I don’t want to see
only the results, I want to see the whole notebook and all the hurdles along
the way. Why can’t we follow the researchers and the accumulation of evidence
in real time?

~~~
radarsat1
Sounds nice, but as a scientist I can say that it would be a huge overhead to
have to constantly prepare half finished work for public consumption
throughout the entire process of a study. Scientific papers make sense because
the point is to summarize your results, discuss what they may imply, e.g what
theories they support or deny, and describe what you did in a reproducible way
so that the work can be verified. The writing of a paper is an act of
communication. It's important that it is an explicit thing that researchers
are trained to do properly, not something that they just shit out at the end
of a poorly done study that didn't work, or worse, in the middle of something
where they can't draw any conclusions yet. E.g. journalists already pick up on
wrong or hype-inducing interpretations of poorly written press releases of
published papers. Imagine if they could also pick up on half-finished work.
"Study being done could mean something amazing, maybe, if it works!" You think
fake news is a problem _now_? Imagine the world you propose.

Now, if researchers were encouraged more to describe in detail their process
and everything that went right and wrong throughout a study, along with data
and algorithms and everything, perhaps as appendices or in supplemental
material like blogs or videos etc., and and I find this is what is happening
lately, that would be fine, and a nice ideal to strive for.

But realize that scientists are _already_ required to not only perform the
study but write about it, and convince every skeptic that they are right, go
to conferences and get an article accepted by a journal which can take a year
or more. And now add to this that they are required to prepare the data and
software for public consumption, make videos and blog posts that describe
everything, answer all questions that the public has. Think about all that
overhead you are demanding that goes so far above and beyond _doing the actual
science_. It's not small. And they are not paid extra for it, in fact their
paycheck is probably half what they could make doing closed science for a for-
profit company. Meanwhile their job as an academic is only to explore new
ideas and convince their peers of their worth. Why should they go the extra
mile, for free, for every member of the public who demands answers and
transparency? Sorry, but it's too. much. work.

~~~
datenwolf
> Sounds nice, but as a scientist I can say that it would be a huge overhead
> to have to constantly prepare half finished work for public consumption
> throughout the entire process of a study.

How about writing into a (public) blog, instead of a lab notebook?

~~~
radarsat1
First, my notes are not nearly well-organized enough to have any kind of
coherence for anyone but me. Do you think it's interesting for me to answer
public questions, to defend ideas that I haven't even completely thought
through yet? But secondly, why would I want to publish my half-baked, unproven
ideas? I can see two possible outcomes: (1) I completely embarrass myself
because they are stupid ideas that I haven't verified, or (2) they are good
ideas and someone takes them and runs faster than I can and beats me to a
result. I see zero advantage for _me_ to do what you say.

I think this idea of let's do every step of science 100% in the open are
completely forgetting that science is a social process. Putting things out
there has consequences. The onus is on the scientist to verify things before
saying them. There is a word for "scientists" who do not. Personally I don't
believe that that is something that will change in the next 100 years.

~~~
bloomer
The current “science as a career” model of science with public grants etc
stems entirely from the development of the atomic bomb and World War II
resulting in Vanevar Bush proposal of the NSF. I sincerely doubt that science
will be practiced anything like it’s current form in 100 years. What it will
look like is an interesting question.

------
anonytrary
I fear that science is slowly going awry, particularly in fields where
outcomes have grave and immediate implications for businesses. In this
respect, physics is simpler than social science. Not getting the results you
expect is fine and often teaches you something.

~~~
grandmczeb
I don't think it's going awry but rather that certain fields have always been
suspect and we're just now realizing how bad it is. In mathematics, there are
proofs that have stood up for thousands of years; in physics there are models
that have remained useful for hundreds. How many fields can claim that kind of
longevity?

------
evandijk70
Posts like this are often put on hacker news. No one doubts that cherry-
picking from hypothesis is real. The same goes for 'spinning' negative
results.

However, the solution: "pre-register trials and only do what you say you are
going to do" oversimplifies things a lot. Testing and rejecting your
hypothesis is a very real part of doing science. But its also a scientists'
job to come up with a new hypothesis that explains the data better. I think
the real problem here is that writing something up and saying: "our initial
hypothesis was wrong, we suggest this and this factor is at play" is not the
way science is done currently.

~~~
empath75
If your study took a lot of wrong turns, include that in an appendix at least.

------
tells
I've been kinda thinking this for more than a decade after working at one of
the big pharmas. I witnessed several trials with subpar results that would not
go on to be published. I think all studies should undergo a simple national
pre-registration and require a summary at the end of each study. One of the
things that makes humans special is our ability to store information and pass
it to further generations and just throwing away unwanted results is not
helping anyone.

------
qubax
The problem with "studies/research" today is that most of it cannot
reproduced, not that it went nowhere. It's not really a matter of "cleaning
up" the research to make it "positive". In other words, most science today
isn't real science.

Throw in the issues of funding ( government - political issues, private -
corporate issues ) and there is very little incentive for real research. And
with the current academic environment at leading institutions like Yale,
scientists probably are too afraid to do research honestly on sensitive
topics.

Also, isn't this just a rehash of another nytimes article from last year?

[https://www.nytimes.com/2017/05/29/upshot/science-needs-a-
so...](https://www.nytimes.com/2017/05/29/upshot/science-needs-a-solution-for-
the-temptation-of-positive-results.html)

There are 3 or 4 nytimes articles on the frontpage. At this rate, how long
before the entire frontpage is just nytimes? Just redirect hn to nytimes and
be done with it?

~~~
thrower123
The NY Times just rotates through the same double-handful of subjects on
something like a six week timer. It does get a bit tedious, because there's
very little that's actually new to be said, and everyone just rehashes the
same tired arguments over and over and over.

------
Rainymood
I still think that science should be automated in some way, shape or form. I'm
imagining something like you have a dataset and you have to upload that
dataset to some third party that checks it for it's validity. Then you write
exactly WHAT you are going to do with the data and then you send the proposed
"routines" in, this should give automated generated output and an automated
report of what was done and why. Of course, this is a completely silly idea
but I'd love to know if someone has like any tangential related thoughts on
this

~~~
Miltnoid
My research area has actually started doing that. Sure it's a CS discipline,
but yeah we have a second component of our conferences where we automate our
benchmarks, and other people run those benchmarks and validate the results.

~~~
MaxBarraclough
For CS publications, it should be the norm to insist on inclusion of all
source-code and data-sets used, unless there's a compelling reason they can't
be included.

It's absurd that reproducing results be any more of a challenge than simply
re-running a freely-available program.

~~~
TropicalAudio
All but one program I wrote for scientific publications have been on the
payroll of corporate grants, and the IP rights hooked right back into the
company paying for the whole show. Can't exactly open-source things you don't
actually own.

~~~
MaxBarraclough
Then you should be denied the opportunity to publish. Corporate obstinacy
should not trump good science.

Software is unique in that it can be generally be duplicated and executed
trivially.

There's no way to make it trivial to reproduce a test on the strength
properties of a new ceramic. There _is_ a way to do this for software, and
it's rather silly that it isn't standard scientific practice to do so.

I realise I'm taking a strong line here, but I've never seen a good argument
against it.

~~~
TropicalAudio
The unfortunate answer is that instead of harder-to-reproduce science, no
science would have been published in all of those cases. Quite often, sharing
knowledge is a secondary goal to achieving the set goals of a project. In
niche fields, input from any source is welcomed gladly, as long as the papers
are sound. Banning corporate players from sharing knowledge in your journal if
that's the majority source for input in the field is not really an option.

------
tuxt
So, who is paying the bills?

~~~
MrEfficiency
For Engineers- Capitalism/Customers

For Academics- Whoever tells you to run the study.

Which one do you think is more often corrupted for centuries at a time?

~~~
leetcrew
"whoever tells you to run the study" is basically whoever approves the grant,
so either a business, the government, or a university (indirectly using the
governments money). all three are corruptible in their own ways.

~~~
MrEfficiency
What I'm saying is that Business is different from Government/University.

If a Business is wrong, they go out of business.

If a University is wrong, they teach it for the next 10 years.

If a Government is wrong, they go forward with the policy until the next
regime change.

One of these is different from the rest.

~~~
leetcrew
businesses certainly have a high incentive to get internal research correct.
they don't necessarily have the same incentives when it comes to publishing
results or funding external research. see tobacco health research
performed/funded by tobacco companies.

in general i trust businesses a bit more than governments, because I find it
easier to dissect their motivations, but either can be greatly incentivised to
distort scientific findings.

------
julienreszka
Can't learn from failure. Stop wasting people's time

