Hacker News new | past | comments | ask | show | jobs | submit login
First analysis of ‘pre-registered’ studies shows sharp rise in null findings (nature.com)
415 points by danso 4 months ago | hide | past | web | favorite | 162 comments

Geologist here. This is super-important for hypothesis science.

At the same time, I think one of the biggest inaccuracies in the public perception of science (and one of my pet peeves) is the idea that all science is hypothesis science. It turns out there's also still plenty of discovery science to be done -- and while it's less common than it was 100 or 200 years ago, it's quite important!

In geology, this often quite literally takes the form of a blank space on the map -- there are plenty of unmapped quadrangles on the geologic map of the world at 1:24,000 and finer scales, and the USGS will pay you just to learn how to fill them in [1].

This is one of the few subjects on which the misinformation is so pervasive that even the wikipedia article [2] is substantially inaccurate (I blame the overly-simplified formulation of the scientific method that most of us are first exposed to in elementary school). The one part that you can correctly infer from the wp article though, is that discovery science is having a bit of a resurgence recently thanks to the proliferation and reuse of large datasets.

[1] https://ncgmp.usgs.gov/about/edmap.html [2] https://en.wikipedia.org/wiki/Discovery_science

One of the simplest distinguishing characteristics is that there's no such thing as a negative result in discovery science: if you're mapping a blank area, whatever you find will be something we didn't know before.

One of the most painful parts of Economics is how many mathematicians there are in the field. They treat discovery science as hypothesis science, and try to force reality fit their model - instead of gathering data about the real world first. Thus, we get models of Marginal Demand that don't include "not having enough money to buy something".

Edit: if this comment is unhelpful, please suggest an update.

I find it helpful, or at least informative, but since it confirms the model I already had of the field my opinion may not be trustworthy.

> Thus, we get models of Marginal Demand that don't include "not having enough money to buy something".

Which models?

Most of them, especially those developed before the 2000s.

Can you be more specific? “Not having enough money to buy something” sounds like a pretty important factor to miss.

I think neuroscience is much like exploration. You can make hypotheses, but we barely know what is going on, so the hypotheses aren't really well grounded. Find hats there, publish the result, build the map, so to speak, so that future scientists can make more informed hypotheses.

Oh man, in neuro, discovery-centric science is really controversial. Not the findings themselves, but the funding. If you propose to just stain/viral-inject some tissue, electro-physiologically map some connections, etc, you'll just get back comments that say 'fishing expedition'. Maybe you can do this with NIH when you propose to contrast it with diseased tissues, but even then, it's unlikely (Well, less likely than the 'normal' low rate of proposals being funded).

I'm appalled. That fundamental science is called 'a fishing expedition' in a derogatory way, says whole volumes about how academia is failing.

Don’t be appalled. This is not a failure. There are limited funds. Hypothesis driven work is thought to be more probably productive.

If you were handing out the funding, assuming equally good teams, do you choose the work that targets Parkinson’s mechanisms or the functional map of a less-explored bit of CNS?

We are doing basic exploration as well. We are just prioritizing targeted work in a resource constrained environment.

I think the problem is, solving large complex problems (like Parkinsons) requires a very solid foundation of basic research. If you don't pay for the basic research, you don't get the foundation - and everybody is essentially leaping around in the dark.

I don't think the issue is about resources per se. I think it's more about Universities prioritizing high impact research, as in, the kind of research that appeals to non-scientists, in industry, media, or academia in general.

Something like 1 in 10 proposals submitted by universities are funded. Priorities are set by funding agencies, not universities.

In a field where we know nearly nothing about, putting any one of those above the other looks like a problem to me. A field should go from purely exploratory, to both, to mostly hypothesis driven as it matures, but no sooner (with fractal subdivisions where subfields go through all those stages too).

Thus either neurology is way more mature than I'm aware of (a real possibility, I'm far from knowledgeable on it) or we do have a problem.

Nit: neurology != neuroscience.

We do know a LOT about neuroscience, but we've no idea how much we've yet to discover. Likely, it's a fractal, so we're not anywhere close and never will be. Like in Physics, we know a LOT about electrons and how to build a bridge, but we know nothing about 'dark energy'. It's also true in neuroscience that we know a LOT about action potentials but nearly nothing about how astrocytes interact with synapses. It's not a smooth march, but a VERY spasmatic stumble.

I don't think fields should progress in any manner whatsoever. They progress as they do and at their own paces. Otherwise it wouldn't be research, it'd be following plans. It's not something that can be proscribed, but more described.

Maaaaybe. Truth be told, it's all about networking. Grants go to the friends of the reviewers more so than to those deemed most deserving [0]. Granted, it's really hard to read a bunch of proposals and determine whose is 'most' deserving. So reviewers act like the humans they are and give the grants to their friends.

[0]Total anecdata, I have no source for this.

I don't review grants for my friends. This is a strong community norm. It's unethical to do so. This causes a shortage of qualified reviewers, but that's less of a moral hazard than conflicts of interest.

If you admit you do not know this, then why say it?

I've had opportunities to review papers from people I have co-authored with before and declined the opportunity. I've reviewed a paper, then got an invitation from another journal to review the same paper. I declined, but sent the editor my former review. I thing there are strong social mores in the scientific community.

Not disagreeing with your sentiment, but not all academia is like this. The comment applies very specifically to NIH & the life sciences (which admittedly is a really big chunk of science, both by dollar amounts and labor force).

Oh man, that's like, 0.1% of the issues in academia.

Discovery science is definitely harder to fund than hypothesis science.

To take geologic mapping as the example again, the economic ROI has fallen over the past few centuries, when there were bigger unknown areas and you were more likely to literally or figuratively strike gold with only a coarse survey. So while there's still funding for pure mapping, there's not as much as there used to be.

I would argue that the scientific benefits of pure discovery are often still quite high, but it's typically easier to convince reviewers to fund a project when you can point to specific well-defined hypotheses that you'll be testing rather than just "because we don't know what's out there".

I suppose this kind of discovery science would be considered "preliminary results," meaning funding agencies like to see it done but won't fund it? Sigh.

Most neuroscience work have hypotheses, but I am arguing that it is more similar to discovery science even so.

Yeah. In areas where we know so little that it's hard to make informed hypotheses, there's probably a substantial component of discovery science.

It's a bit of a yin-yang situation IMO: discovery science often leads to testable hypotheses for future work, and things we learn from hypothesis-testing can open up new avenues for discovery

Genetics too. RNA sequencing data is almost exclusively considered exploratory and is expected to be replicated at least technically (by using the same biological samples for qPCR to measure a small number of genes) but also biologically with novel samples and experiments.

This would be a huge leap forward. I can't blame researchers for tailoring their research to help their own careers but this should help prevent that. A null finding might not be great for the researcher but it's a win for science.

> I can't blame researchers for tailoring their research to help their own careers but this should help prevent that.

It regularly astonishes me how easily people are willing to accept scientific malpractice with such excuses.

While I agree that mechanisms like registered reports are the way to fix these things at the core, I do think that these mentalities ("it's just the system, I can't do anything but cheat with my scientific publications, my career!") are a big part of the problem.

Good read on the issue: http://www.talyarkoni.org/blog/2018/10/02/no-its-not-the-inc...

> It regularly astonishes me how easily people are willing to accept scientific malpractice with such excuses.

I get that. But the other side is that we pump up a lot of kids about science, then filter for very ambitious, very dedicated people and shove them into a resource-constrained environment where some kinds of results are much more strongly rewarded than others. It's absurd to set up a system like that and expect that science will be done to the level of frankness that you (and I) want.

I know some really smart, hardworking, very honest PhDs who have shifted to the tech industry because the roll of the research dice didn't come up right for them early in their careers. They are far from happy about that. I wish they could have stayed in science.

And I'd note that the person you're replying to is not "accepting scientific malpractice". They're saying they can't blame people for trying to survive in the system that they're caught in. If you want to be mad at somebody, be mad at the people who have set up the system, who could change it but don't.

It's unscientific and illogical to expect humans to sacrifice their lives on the altar of your belief: statistically, that's not what they're like. Especially in a system that filters out the most honest, least career-focused people early on.

I think one of the problems is that it often doesn't feel like cheating. You have a hypothesis, but you don't preregister it. It lives only in your head. You do an experiment, you look at the results. They fail to confirm the hypothesis. But if you tweak the hypothesis just a little, the data suddenly confirm it. So why not publish the tweaked version?

This is a malpractice that effectively invalidates the research. But it doesn't feel so. It feels more like a thought-crime.

>> But if you tweak the hypothesis just a little, the data suddenly confirm it

This is 'data mining' right? And I've occasionally wondered about this, since I don't work in a scientific field but did once make use of the scientific method for some research I did. And yes the findings weren't especially conclusive but I'm not sure I could've tweaked the hypothesis to make it work.

So, had I found something really interesting that didn't fit the hypothesis, is the 'right way' to conduct a new experiment from scratch? So say I did that, and used the 'tweaked' hypothesis, of course I'd find something interesting, because it's already there.

In this new 'pre-registration' framework, how can I correct the problem and pursue the interesting idea but keep the science in-tact? Because, if I used some sort of cross-validation at the outset and I have all the data available I presumably can't change the sample, so the hypothesis presumably has to change.

Refining an experiment is not wrong. What is wrong, like you say, is going on a fishing expedition until you find a result you like.

There are methods to account for follow-up experiments. Bonfaroni correction [1], for instance, requires you to increase your significance level with each new test.

[1] https://en.m.wikipedia.org/wiki/Bonferroni_correction

It's harder than you might think to control for multiple comparisons. The Bonfaroni correction assumes that each experiment is independent, and so penalises correlated experiments unnecessarily harshly.

On the other hand, other tests typically require the researcher to make explicit assumptions on the correlation structure of the experiments despite the fact that it is not directly observable.

You are probably thinking of Sidak correction when you state independence is needed. Bonferroni correction does not need independence. You are absolutely right about Bonferroni being a severely conservative correction though -- at least the 'first order' one that uses only the first term of the Bonferroni inequality. One can take more terms to be less conservative but those aren't as easy to apply as you need to know the joint distributions over larger and larger tuples of events.

Another more recent technique for 'exploratory' yet correct technique is to exploit differential privacy and dithering.

You can also split the dataset into two parts. Use first part to form a hypothesis. Register it. Then use the second part to confirm/disprove it.

>This is 'data mining' right?

That would be datamining done wrong. Its perfectly fine to look at data to provoke new hypothesis. But you should not be using the same data to confirm the hypothesis that it provoked. Either use fresh data or make sure that you still ensure correctness if you are reusing the data.

> You do an experiment, you look at the results. They fail to confirm the hypothesis. But if you tweak the hypothesis just a little, the data suddenly confirm it. So why not publish the tweaked version?

If between tweaking the hypothesis and publishing it, you add the step "you perform another experiment which tests the tweaked hypothesis", you have just described the scientific method.

I'm sure the budget for this new experiment will be given straight away, no questions asked.

Though of course, there's a difference between "drug A doesn't work for condition B but seems to work slightly for C" and "drug A doesn't work for condition B but 10 of 12 individuals with condition C have shown significant improvement"

> I'm sure the budget for this new experiment will be given straight away, no questions asked.

Most of the cases: yes, although your example of clinical trials is slightly different, and in that case, I do think the data should be publicly available to other researchers even in the case of a null.

If the original experiment was large enough [1], you could almost always find some C such that "drug A doesn't work for condition B but 90% [2] of individuals with condition C showed significant improvement".

In fact, you could replace [2] by a number arbitrarily close to 100% by increasing [1] accordingly.

If the original experiment was large enough to do that, then somebody was given way too much money for the original experiment. So I'd expect that's a very rare case.

It can be much more sneaky than this.

You have an hypothesis, do an experiment, it fails. You mark the hypothesis false and move on, never putting the work into publishing it (why would you?).

At the same time, 19 other researchers have the same idea. Some 18 of them do the same as you, but one does the experiment and get a success. He will publish his work (why wouldn't he?), and it will be the only piece of literature available about the subject.

Where on this narrative did anybody do anything even remotely unethical?

> "It regularly astonishes me how easily people are willing to accept scientific malpractice with such excuses."

Agreed in theory. In practice such don't even qualify - by definition - as science. Using the word science in such contexts gives such things far more credit and legitimacy than they deserve.

Whether you blame researchers or not is inconsequential. People respond to incentives with a spectrum of responses from incredibly principled to entirely unprincipled. If the unprincipled are rewarded and the principled are penalized, the principled will leave the field for something else. Moralize all you like but the incentives define the field.

It regularly astonishes me how easily people are willing to accept scientific malpractice with such excuses.

Likewise. The BBC reports that 2/3 of scientific research cannot be reproduced. Let’s be honest about what that means: 2/3 of scientists are fraudulent or incompetent.


> Let’s be honest about what that means: 2/3 of scientists are fraudulent or incompetent.

No, it means that 2/3rds of the attempts to reproduce research fails.

When you work on the very edge of science, the lab you're working on it probably either the only one, of one of a very small number who are even able to perform the research that you're doing, due to a combination of highly specialised equipment, subject knowledge and just plain experience. Without that combination of factors, it's quite easy to mess up an experiment and therefore fail to reproduce the research.

If we want to give people the benefit of the doubt, it could also mean that scientific research is difficult and even competent individuals often make mistakes.

This is incorrect and unhelpful. Science in fields with lots of complexity, such as neuroscience and psychology, suffer from hard problems such as unmeasured causal variables and divergent sample characteristics. You might chalk this up to incompetence, but there are are in fact real concerns of feasibility. You cannot hope to measure or control all relevant cognitive factors, you just try to balance your sample groups on the more obvious measures, increase your sample size, then hope that other differences average out with random assignment.

To put it otherwise: for some problems, none of the 7 billions humans alive is competent enough right now. For some of them, we'll collectively eventually get there, if we don't stop trying.

Agreed - this should help "level the playing field" so scientists can at least be somewhat more evaluated on the quality of their hypotheses and scientific protocols as opposed to the (largely governed by luck) outcome of their studies.

> the (largely governed by luck) outcome of their studies

I've never heard that. Is there some evidence that it's largely luck? Does the theory imply that Einstein and Newton were extremely lucky?

You re almost creating a straw man. There have probably been scientists as intelligent or more intelligent than einstein, but worked on fields that proved fruitless, or didn't manage to finish their work within a lifetime. There is a large element of luck in any research, which has to align with capability. The matter at hand is how to keep those promising people in a science career and not the just the ones that publish a lot. Publishing negative but honest results is a step. Though i hope it doesn't end up becoming gamed as well.

Einstien did his Nobel prize winning work (the photoelectric effect) very early in life, and then went on to continually make big successes. If he had started out spinning his wheels he would have had plenty of time to switch.

that doesnt mean he was immune to unsuccessful investigations, in which he spent a large part of his later life


Not the best examples. They both came up with analytical models and tested them. They were also amazing outliers. In that that achieved so much.

For your everyday empirical research there is a lot of luck. You have to disprove a lot of possibilities to get to the true data. And as we only reward the positive findings your lucky if you’re test that one truth. If you’re sifting through drugs to find there possible other uses you’re going to have a lot of null results. But those are good they’re still extra knowledge.

Even if it doesn't, there's a whole world outside of theoretical physics.

So Darwin was very lucky? The questions remain for any field: Is there evidence, and is the implication that the leading scientists are consider very lucky?

Yes, he was the lucky person chosen to be the field geologist on the Beagle

Darwin was lucky that Wallace came to the same theory independently but contacted Darwin for feedback. This forced Darwin to publish his own results. If Wallace hadn't contacted Darwin before publishing, Darwin would probably be an unknown today.

Nah, I think you’re missing the point that you can google “bullshit psych studies” and get pages of results. This isn’t an indictment of science, this is an indictment of journals and academia today.

Had this registration been in effect 300 years ago, Newton/Darwin would still be fine.

They have the advantage of being in fairly trivially replicatable fields, so we can say with confidence they found something true.

As to luck, who can say?

I'm not 100% sure about Newton but Einstein didn't do any empirical research, which is the topic of this discussion.

> A null finding might not be great for the researcher

The fix for that would be to change science funding so a null finding doesn't hurt the researcher's career.

Never gonna happen. Think about it: the principle of science is mapping the empirical world with a logical one. A null result represents a failure to do this. In a binary case (either this or that), a null result gives us information, but the majority of null results tell us nothing more than "our hypothesis was wrong". Having a wrong hypothesis means you got unlucky sure - but repeated enough times and treated statistically it means you are a less skilled scientist than someone who gets more affirmative results.

> Think about it: the principle of science is mapping the empirical world with a logical one. A null result represents a failure to do this.

Let's say you try a particular educational intervention. it turns out that if has no effect on children's learning.

This is really useful information, because it means that we now know that there is no point in schools trying it.

You're in a maze of twisty passages. If someone can tell you 'turning left is just a wall' then you can focus on the other routes.

But in reality, it's more like you were blind in a forest, and that someone was operating a LIDAR with binary output (there is/isn't something less than 50 meters ahead). They tell you that at 120.001 degrees there is something. Then they tell you that at 120.002 degrees there is something. Etc. If null results would be treated equally to a positive finding, then people would just spam null results.

You're absolutely right. Isn't it good that my rubbish attempt was there to stimulate your useful one ^_^

There's something to be said for brute force on rare occasions though.

Well, we do still want to reward researchers for coming up with good hypotheses. It's just that we want to also make it harder to game the metrics.

"I have not failed 700 times. I have not failed once. I have succeeded in proving that those 700 ways will not work. When I have eliminated the ways that will not work, I will find the way that will work." -- Edison

By publishing null results you avoid 100 scientists going the same endearing but unproductive routes, which can also help build good hypotheses and speed up progress.

Getting a null result doesn't prove that the null hypothesis is true. It means that we can't say anything with confidence based on this experiment. There is a huge difference between failing to make X work and proving that X doesn't work and I fear that if people start conflating the two then many potentially productive routes will abandoned far too early,

"Absence of evidence is not evidence of absence."

A null result (no significant evidence for anything, a p value above 0.05) is truly null. It doesn't even confirm the null hypothesis.

Common wisdom says that Statistical Hypothesis Inference Testing doesn't work because researchers engage in "P Hacking". That's a half truth, researchers really don't understand the meaning of the p-value.

Which is completely bogus. Absence of evidence, when evidence is expected is indeed (possibly weak) evidence of absence.

The stronger "absence of proof is proof of absence" would be fallacious.

> Absence of evidence is not evidence of absence.

A more accurate way to state this is “Absence of evidence is not always evidence of absence”. Or “Absence of evidence may or may not be evidence of absence depending on the circumstance”, though it doesn’t roll off the tongue as easily.

Well, the long version is this:

"When trying to reject the null hypothesis (sensu Fisher), absence of evidence (for its invalidity) is not evidence of absence (of its invalidity)."

Fisher wasn't the only statistician, he wasn't the only confused one either. We can go with Neyman and Pearson instead:

"When trying to reject the null hypothesis with a sufficiently powerful experiment, absence of evidence is evidence of absence."

This requires a power analysis, which is something most studies lack. Constantly reminding everyone of these details and misguided philosophical differences is tiresome, so I prefer to go with Laplace, Bayes, and Jaynes:

"Everything is evidence."

The interests of humanity and the interests of the organizations that have the capital to fund science projects are not frequently aligned.

This must change.

Exactly! Stop government funded research!

Today, most science is funded by government agencies, like the NIH. This means, bureaucrats who know nothing about science allocate other people's money to research they don't care about. Since they don't understand the research, they need objective measures of "research quality", and they picked number of publications and impact factor of the journals as measures. What you measure becomes your objective, and we got "publish or perish" and all sorts of scientific misconduct.

Science needs a free market, too.

Most program managers at NIH are ex-scientists with expertise in the fields they are funding.

I'm going to hope this was poorly phrased sarcasm.

Is there anything you disagree with? Or do you simply dislike the conclusion?

Five million euro of tax payer money have been blown on a Neanderthal genome. All we got out of it is a hyper sensitive analysis that picked up an artifact (source: first hand experience with the raw data) and called it admixture. But it's a high profile publication thanks to the catchy headline and more tax payer money flows in the same direction.

This wouldn't have happened if the bureaucrats who presided over the grant money knew anything about the science and looked into the methods. Or maybe it would have happened anyway, because this was never science but just a publicity stunt. Either way, funding "science" this way is bad.

The fix is to use Bayesian statistics, where every finding updates you beliefs and therefore there is no such thing as a null result. The cult of the p value needs to end.

Are there any practical introductions/guides to applying this method? Interested for my own research.

E.T.Jaynes: "Probability Theory, The Logic Of Science"

A great introduction to the topic and a great general guideline. It's not a collection of recipes, more like a nudge in a direction where you don't need canned recipes anymore.

Surely that only works if your hypothesis is partly right?

I don't know how a hypothesis can be partly right, but I'll read this as

"Surely that only works if your prior belief in your hypothesis is neither exactly 0 nor 1?"

And that's correct. Prior distributions are never completely concentrated at either extreme, for those could never be updated. If your hypothesis is now totally wrong, your confidence in it (the posterior probability) will converge toward zero.

If a hypothesis is considered as a species of proposition, then yes, there are a variety of ways in which a hypothesis can be "partly true". Firstly, if the hypothesis is a conjunction, the some of the conjuncts could be true. Propositions can also contain vague predicates with ambiguous truth values. Finally, there is the truthlikeness of the propositions. Some propositions seem to be more true, or have more versimiltude, than others (for which I refer you to https://plato.stanford.edu/entries/truthlikeness/)

If that is indeed what GP meant, then no, truthlikeness is completely irrelevant to Bayesian reasoning.

Null results are, almost by definition, easier to find that statistically significant ones. What you need are highly improbably null-results. Null results that are truly surprising, surprising because the preponderance of other evidence, theoretical and realized, point to it being true.

In other words, its generally much easier for your head to come out of the barrel without an apple.

>The fix for that would be to change science funding so a null finding doesn't hurt the researcher's career.

I get what you're saying, but the counter to that is it would be too easy for scientists to conjure up studies that they have no reason to believe is true, do an experiment, publish the null finding (that frankly, they were expecting), and ask for more money.

It's almost trivial to think up experiments that will give you a null finding.

It already doesn't, null findings are simply not published.

Unless of course you want to reward null findings which might, depending on the circumstances, open another can of worms.

At least in my field (high energy physics in the LHC), null findings are routinely published. Everybody who is searching for physics beyond the standard model publishes their null findings left and right.

Above, a commenter makes a distinction between negative and null result, saying that null results are those without any statistically significant conclusion (neither negative nor positive). Those don't get published, the BSM papers usually rule out parts of parameter space which makes them negative results.

Science should be publicly funded. Capitalist funding of academic research has perverse incentives.

That in itself wouldn't change the incentives too much. Government grant boards don't like that you 'waste' their money either and come up with 'nothing'.

We can elect our officials; we do not choose who to make rich, or what they do with their money, unfortunately.

Most of it is.

Yes, I did not mean to imply otherwise, my apologies.

One thing that this achieves is that it makes null findings publishable. At least in my field, I pretty much have to discard null results because no reasonable venue will publish them; it's not worth the time to write them up if they'll be rejected anyway - at best, no-improvement results can get a mention, a line in a table, or possibly a paragraph in a published paper about something that actually does make a difference.

>I can't blame researchers for tailoring their research to help their own careers

What about a lowly restaurant-owner who skimps on food quality or sanitary costs to "help their career?" A psychopath or "can't blame them?"

Erroneous research can also have human costs.

White collar and blue collar workers should both be scrutinized equally.

I'm not sure where this acceptance of moral depravity among more-privileged peoples comes from, but I really don't like it.

"Moral depravity" is maybe overstating the case. There are some pretty egregious cases that have come to light, but the vast majority of the problem is just a natural result of how even people with the best of intentions will still be influenced by a bad incentive structure.

For example, a big part of how the file drawer effect happens is that writing papers is expensive and time consuming, and it's hard to get anyone to publish negative papers, and academics' careers operate under a rather brutal "publish or perish" regime. All that adds up to, if you get a negative result, you've got a whole lot of very concrete reasons to cut your losses and move on. The goody two shoes who's scrupulous about reporting all their findings is not going to get rewarded for their efforts with a job. Nor will they be rewarded with the satisfaction of knowing they've done their part to improve the average quality of the published literature. Out-of-work scientists don't get many opportunities to do research.

There is a massive difference between doing something that will clearly hurt someone and choosing to do only work that benefits you. Researchers are currently under no obligation to write up their null findings, and it would be hard to get null findings in prestigious journals. This should be fixed at an institutional level, like the pre-registration of studies in the article, not as a mandate to each individual researcher to write up and publish every study.

>There is a massive difference between doing something that will clearly hurt someone and choosing to do only work that benefits you.

This is the most-downvoted comment I've ever made on HN, and I think your explanation is in summary what people disagree with.

However, the problems of bias and integrity in scientific research can and do have costs in terms of harm to human life. It's just that the connection between just following incentives and bad scientific research is much more abstracted, and therefore is not clearly intentional negligence, as the case with something like food safety.

There's a great recent blog post on the basic point you're making: http://www.talyarkoni.org/blog/2018/10/02/no-its-not-the-inc...

Yea that looks to be spot on. Thanks, I appreciate the link.

Do you think they pre registered their study into the effect of pre registered studies on null hypothesis rate?

According to the article, they did not, but they plan to do another study with more data and preregister it.

This was a meta-analysis, not a study of its own.

You can totally p-hack a meta-analysis, you have many degrees of freedom in interpreting the data.

Preregistration of a metaanalysis makes sense and should be standard practice.

The whole framing of meta-analyses as the pinnacle of reliability is sort of concerning. They're a huge help with messy questions like "what the hell is up with priming?", but between choosing which studies to include, standardizing their controls, normalizing their results, etc. it's amazing how far they can be skewed.

One thing I don't know: what does preregistration even look like for a metaanalysis? You can state some conditions up front, but a lot of the degrees of freedom only come into play once you're deep in the work looking at specific studies.

What difference does that make?

The point, I think, is that it means something slightly different to per-register an hypothesis that depends on retrospective data. When you register to hypothesis, the data already exist and one could (in theory) have already started looking at it. One would hope that researchers would not explicitly cheat like this, but it would be hard for experts in the fields to not have some sense of the trends seen within their field prior to running the meta analysis.

Yeah but of course hypotheses don't arise ex nihilo but from a catalog of focused perceptions catalyzing conceptions which may suggest hypotheses encompassing novel ordering functions, innovations, ameliorations (or whatnot) depending on the field of study. If an individual considers a hypothesis of sufficient potential pragmatic value to research and document then, objectively, if negative results manifest these too need be reported as negative data germane to other researchers in the field. Problem is current academic culture is predisposed to reward only success. If your beautiful theory is murdered by an ugly fact (Huxley) then you don't get your doctorate or your departmental status diminishes since only positive results are reinforced. From this arises the temptation to be disingenuous or outright prevaricate, if you can get away with it. Does seem a cultural change is in order. Registering hypotheses which guarantees publication regardless of results may be useful.

> One would hope that researchers would not explicitly cheat like this

The broad problem isn’t cheating, it’s people not publishing null results. That applies equally to experiments and meta-analyses.

Except a null result would have been interesting here.

What the poster above meant is that if you do a meta-analysis on already published results you could already know if it's a null result or not. If you are inclined to not publish null results, you simply won't "preregister".

Of course to be a proper study they should pre-register and only report on future publications.

You are right about what the problem is, but cheating in this context would mean registering a hypothesis after you've already looked at the data and assured yourself you'll get a positive result. That in my mind is even worse than current status-quo.

Both are problems. Both are addressed by pre-registration.

Negative findings are just as good to publish so people don’t repeat mistake twice

The problem is that it is too easy to have wrong ideas. If there only metric is how many studies you (or your group) have published, some will try to game it.

In domains like chemistry, you hae this research process of "try every combo of material x process and see what you get".

Hopefully detailed null results will avoid too many people going down the same path. But even more hopefully, it will let other researchers read the process and perhaps find another way to actually get to success.

"We tried A x B in this way, that way, some other way, none of them worked" is pretty valuable info. Also pretty good when it comes to confirmation of theoretical work

>We tried A x B in this way, that way, some other way, none of them worked

How searchable is this data? Like, do I need to be an expert who is up-to-date on most proceedings in the subfield to know this, or is this information easy to pull up with a few searches?

In my experience, it's buried in dissertations.

You probably need to know enough technical language to enter the right search terms.

Not all negative results are created equal.

It is very easy to run a terrible study.

Even with a good idea, it is very easy to run a terrible study.

In my view, a negative finding needs to be as solid as a positive one, i.e., just as hard to publish. Since folks haven't until recently published negative results, we probably haven't developed standards for reviewing those results. If you're a reviewer of an paper that reports a negative result, how do you decide if the work was carried out with sufficient skill to make the result useful?

Should negative results be replicated?

Negative results generally carry weight IFF they are well powered (large samples), and conducted with real expertise and care.

A possible problem I see is that it's hard to distinguish truly bad idea from a low quality implementation of a good one.

" it is too easy to have wrong ideas."

I would be so published all the time...

"Negative" findings or "null" findings don't indicate a mistake. Let's say I'm studying whether or not a certain vitamin, lets say Vitamin HN, helps prevent the common cold. The null hypothesis would be that it does not help prevent the common cold, as most things do not and that's where we must assume our Vitamin HN will fall. If we find that it doesn't actually prevent cold, that is not a mistake, that is simply the calculation of a lack of treatment effect difference in two groups of a sample of a population. There is no "mistake" inherent to null vs. confirmed hypotheses.

Negative findings don't prove that the idea was a mistake- it's just failure to find sufficient evidence. You might have just been unlucky.

Meta-analyses based on lots of studies, though, benefit from having unbiased publication of study results.

This is critical - even if no individual study has positive results, the data from each study can be aggregated and that meta-analysis can find results much more easily. I remember reading (but can't find the link to) an article about a medicine that took many trials to have one trial with statistical significance, but a meta-analysis of the first five or so trials would have revealed the efficacy of the drug under test much sooner and at a much lower cost.

Meta-analyses suffer from publication bias, though. Studies with one outcome are more likely to be published than those with some other outcome, and therefore the input for the meta-anslysis is biased.

A negative result is not necessarily the result of a mistake. Null results can be as important for the progress of science as positive findings.

Hard to disentangle two possible mechanisms: (1) null findings that would normally get tossed are being submitted and accepted at higher rates (i.e. preregistration increases acceptability of null findings), (2) null findings that would normally get p-hacked into positive findings and published are not (preregistration working as intended). Both good.

If I'm understanding correctly, the novel studies had a higher hypothesis success rate than the replicated studies. I imagine this is because there were many (successful) "debunking" attempts?

There's the aspect of why you would choose to replicate a particular study instead of doing something else - there's always more ideas than your ability to execute them.

At least for me, there are two cases why I'd bother to do that and why I could write in a grant proposal that this work is necessary - either I want to build on that study; in which case most likely I wouldn't publish just a pure replication but rather a comparison of my changes with my replication of the original study, and this paper would be counted as a novel study. Alternatively, I'm doing a replication because I'm not certain whether the original study is actually true, because I have some solid reason to believe that it's wrong.

All studies need directly replicating, surely.

Possibly yes, someday, if they're still relevant. I mean, not every paper even deserves to be read, there's a lot of garbage published somewhere with an imitation of peer review. There's a lot of publications that have never been cited and likely won't ever be, much less replicated.

As I said before, at any point of time in any scale of research the next research steps that "need to be done" vastly outnumber the resources to do it, so obviously not all of these things can be done. The majority of reasonable grant proposals get rejected, so that research doesn't get done - it's all a matter of prioritization; unless there's solid argument that this research task is within, say, the top 20% of the important research tasks, it won't get done. And most studies are not so important to justify repeating the effort; perhaps it was justified to expend X resources to get to that result, it doesn't necessarily mean that it's worth to spend 2X resources to get to a slightly more certain result after replication. It only needs replicating if lots of people are going to build their research on top of these results, and that simply doesn't happen for most studies.

Here's one way to think of the value of positive or negative results: Everyone talks about the value of negative results, but will we ever see one of those studies on the front page of HN?

But wouldn't be beneficial to have these published in public somewhere? Else, I would imagine, there could be plenty of "re-invention" going on.

Also, at the very least, if I had a general idea, these "nulls" would help me to further refine what I might want to poke at. The way null is used here is wrong and misleading.

Null results are beneficial to the scientific community at large, but detrimental to the researcher(s) publishing the null result. This results from a system in which researchers are judged on a metric approximated by the percentage of positive results they publish.

Not so sound snarky (I know that's not welcome here) but that doesn't sound very scientific to me. If ultimately science is a method-based process to find truth, then hiding facts feels hypocritical.

No doubt, I agree with you. We all do. My rub is that I see repeated calls from science that the public bow to its mastery. Unfortunately, a institutional lack of transparency is hardly grounds for trust. Perhaps it's time for science to hold itself up to it's own standards?

> Allen adds that their analysis is exploratory, and that there could be other explanations for the findings.

I like how they're clear how their findings are dogfood at this point. They should pre-register a proper meta-study now :-)

In a more serious vein: Do they actually intend to pre-register meta-studies too?

I wonder how much of the bias for publishing positive results is a holdover from the days of having only paper journals:

On the Internet, we have unlimited room for publishing results; there's no reason not to publish the negative ones.

When research was published only in paper journals, there was scarce space even for positive results; possibly it would have been considered a waste to use that precious space for negative results. Also, people reading those journals had limited time and wanted the most important results; they may have expected what we would call a 'curated' collection of studies. These days the studies are published in databases and nobody expects them all.

The Internet may have unlimited space for publishing, but it does not have unlimited bandwidth in peer review. So, the increased publication of null results will still mean slightly fewer published positive results.

Will the negative results be peer reviewed?

Not to mention even a study that didn't pan out still has a lot of data that may be used for something else in the future.

Even a failed creation has parts that can be re used in other projects after all.

The obsession with positive findings is the most absurd thing when you think about it right?

Like, I've read anecdotes from people saying that the editors or whomever at some of these journals might turn down publishing null findings. Think about that for a second. What does that tell you about their mindset? What possible reason would you have for not publishing null findings?

Editor: "We gotta move these journals johnny! They need those spicy findings. If the findings aren't spicy, this stuff won't sell."

That's hyperbole, but that's essentially the only reasoning I can think of, and it's absurd. Like, what professionals reading journals are going to be like "Whelp, the findings in this journal haven't been spicy. I can't dab on this nonsense. I'm going to start reading the other journal." said no researcher ever; neither literally or in essence.

Null findings are often not as useful as you might think.

A point null hypothesis for a continuous variable is literally always false. Especially for the softer science, I've even seen studies mocked for having too many data points, since it's known that with enough data null hypotheses are false.

The story is better if your null hypothesis is an interval, but then you're really just obliquely using the interval to bound something you could be measuring more directly anyway.

What I'd like to see is moving away from null hypothesis testing altogether and focusing on measuring things. For example, focusing on measuring effect sizes, or the probability that a hypothesis is true.

But what does "measuring more directly anyway" mean if you're trying to measure something that may or may not exist?

For instance, in searches for new phenomena in high-energy physics, one usually puts an upper limit on deviations from the expectation of "known physics" (i.e., standard model). That essentially translates to statements like "if this particle exists, its mass should be higher than X TeV, or else we would have seen it already in our data". Of course, in reality, the particle probably does not exist, so you cannot really measure its mass!

Sorry this is so late, but you can measure the probability that the particle exists.

Null hypothesis tests basically try to calculate the probability of a data set given the null hypothesis. What you really want is the probability of a hypothesis given the data set.

So in that case, you want to estimate the probability of theories of physics, such as those that include the particle and those that don't.

s/probability/odds ratio/

You damn well know you can more easily dab on positive findings.

There should probably be a Nobel Prize for null findings to fix the incentive structure. But how do you grade and compare the different null findings? By effort? The ramifications of a null finding are likely more limited.

Since Nobel Prizes are rather detached from the discovery at hands, it might be possible to evaluate the impact of null findings on later positive findings.

Essentially: when awarding a Nobel Prize, map out which null findings narrowed the path sufficiently to support the Nobel Prize worthy work, and give them recognition as "supporting acts".

Null results often don't mean anything other than insufficient power, poor study design, failure to control for relevant variables, etc. The reasoning against publishing null results is that one cannot prove a null, and therefore there is no finding, AND that the null finding could be due to uninteresting reasons (e.g. sample size). With per-registered studies, there is greater focus on sample size and study design, since the study will get published either way. This balances out the concerns somewhat.

The obsession with p values is the most absurd thing.

You write as if there are "positive findings" and "negative findings". This isn't true in orthodox statistics. There are only "negative findings" (the null hypothesis is rejected) and "null findings" (nothing is rejected, nothing is confirmed). Only the negative findings get published.

What doesn't exist is "positive findings". Nothing is ever confirmed: not the null (it's assumed to be true) and not the alternative (it's not even tested).

Now who wants to print a journal in which 95 out of 100 articles say "we learned nothing" and the other 5 can't be reproduced? Much better to print a journal in which every articles claims a results, even if none of them can be reproduced.

All 100 are results.

The studies saying the five can't be reproduced, where are they.

If I'm designing an experiment to attempt to confirm a theoretical model then finding similarities in the 10 prior attempts that failed could give me clues as to what to try. Certainly if 10 respected labs have done things in exactly the way I was going to try then it's worth questioning long and hard whether I really need to repeat that procedure.

Why did these all fail to reject the null hypothesis. That's a powerful question.

"Why did these all fail to reject the null hypothesis. That's a powerful question."

Because that's nearly always the outcome. By conventional statistical metrics, the null hypothesis isn't rejected >=95% of the time.

"The studies saying the five can't be reproduced, where are they."

They don't get published, because of the aforementioned statistical problem. The bias toward positive results isn't irrational; it's a natural response to the fact that the vast majority of what any scientist produces will be a "negative" result.

The way you learn what not to try is by studying under experienced scientists, and talking to other current practitioners. For any field, there's a vast shared experience that guides experimentation. As a new researcher, a good place to find this kind of information is in review articles and book chapters. But mostly you get it by working with experienced people.

There's a much more simple reason. A null result can often just indicate a bad hypothesis, and there are lots of bad hypotheses.

On the other hand I think something that does support your point is that there are also a lot of bad hypotheses being published, increasingly even in reputable journals, after what is clearly extensive p-hacking. 'So, yeah we took these 29 variables measured in arbitrary, yet extremely specific, fashion, and lo and behold - our hypothesis is affirmed!' It's hard to see how these papers get published outside of the 'spiciness'.

Nobody ever won a Nobel prize for a null result... So it's not entirely without reason that people would prefer to get positive results.

While of course true, there have been a few groundbreaking null results such as the Michelson–Morley experiment.

Michelson did win the Nobel prize, though the citation reads "for his optical precision instruments and the spectroscopic and metrological investigations carried out with their aid", so not explicitly for the null result.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact