Hacker News new | past | comments | ask | show | jobs | submit login
Survey of scientists sheds light on reproducibility crisis (2016) (nature.com)
262 points by nabla9 on Jan 5, 2020 | hide | past | favorite | 173 comments

As a working scientist, I feel both sides of the problem every day. Most papers I come across turn out to be difficult if not impossible to reproduce, and I'm sure some of my own papers have fallen into that group at some points (although given that I'm a theorist, what "reproducible" means can be a bit fuzzy sometimes). At the same time, I'm confused every time I see people wondering about the scale of the problem or what to do about it. There is absolutely no mystery here whatsoever.

Scientists are generally fairly smart people. Put smart people in a hypercompetitive environment and they will quickly identify the traits that are being rewarded and optimize for those. What does academia reward? Large numbers of publications with lots of citations, nothing else matters. So, naturally, scientists flock to hip buzzword-filled topics and churn out papers as quickly as they can. Not only is there absolutely no reward for easy reproducibility, it can actually harm you if someone manages to copy you quickly enough and then beat you to the next incremental publishable discovery in the neverending race. This is absurd and counterproductive to science in general, but such is the state of the academic job market.

Anyone who purports to somehow address the situation without substantially changing the fundamental predicates for hiring and funding in the sciences is just wasting their breath. As long as the incentives are what they are, reproducibility will be a token most scientists pay tribute to in name only.

Biology postdoc here, seconded.

But what really gets me is the disconnect between "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible". This was mentioned in the survey and it conforms to my informal experience of attitudes.

I do not know how you square that circle. Maybe, because of the pressures you mention, we are all supposed to engage in an informal convention of pretending to believe most previously published work is true, was done correctly, and is reproducible even if we know damn well how unlikely that is. I find it hard to do.

One day the public is going to cotton on to all of this. I cringe every time I hear extremely authoritative lectures on "what scientists say" about highly politicized public policy matters. These are not my fields, but if they are as prone to error, bias, irreproducibility, etc, as my own, I'd exercise some humility. It is one thing for we scientists to lie -- errr, project unwarranted confidence -- to each other for the sake of getting grants, but it is quite another to do it directly to the public.

But when the public does figure it out, what do you think will happen to funding? It will get tighter, and make the problem even worse. We need reform from within before the hammer falls, and quickly.

I don't know that we can reform - if you spend too much time trying, the system will automatically cull you. Personally, I'm lucky enough to be at least slightly insulated from the true scale of the problem by working in theoretical/computational soft matter physics, where the costs/grants/impact factors are comparatively on the small end of things. Medicine and biology seem to be affected the most, for good reason - this is where the messiest problems intersect with the most interest from society at large, so you get the most acutely misaligned incentives.

That being said, communicating limits of applicability and degrees of certainty to a popular audience is hellishly difficult even if you're trying to be perfectly honest. Even in a hypothetical world where we've somehow fixed academia, this will always be a hard problem most scientists are ill equipped to tackle.

I bet a lot would change if the largest funding institutions publicly prioritized funding scientists who have a history of publishing reproducible work.

Sure - for some definition of "reproducible". Any metric you come up with that's easily measurable will be gamed into oblivion. Anything meaningful - like looking at dedicated reproduction studies, etc. - requires a complete change in how academia functions, because right now publishing a paper that just aims to reproduce (or fail to) a previous work is effectively impossible in most fields. Not to mention that given the incentive structure you'd be pretty crazy to even try.

It may well be some kind of law that every positive incentive creates at least one perverse incentive, but that doesn't mean positive incentives don't work or aren't worth trying.

I tend to argue that 10%-20% of a grant should go to reproducibility. This does two things:

1. It would help fund / ensure funding to several labs in the space.

2. It would help ensure reproducibility of results

Yes, it can be gamed, but generally it should be easier to reproduce results so 10%-20% of the funding to follow directions should be okay. Of course, this could lead to one group just constantly doctoring results to show something is not reproducible. In which case, a third lab would need to get some funding to check it out.

First they'd have to fund the reproducibility studies.

Well they shouldn’t have any trouble funding the first one...

They have plenty of trouble funding much of anything.

Completely agreed on all counts.

By "reform from within", I mainly meant the NIH, NSF, big funders, etc, need to pay more than lip service to this before Congress gets involved. Although there are people who have built careers on irreproducibility itself. Ioannidis for example. But that requires a lot of dedication and tends to piss a lot of people off.

Another counterintuitive possibility is that scientific publication could go more of the PLoS route and actually lower standards for initial publication -- and then rely more heavily on post-publication peer review. Then irreproducible papers would get publicly "downvoted". And conversely, papers that didn't seem useful enough at the time to make it into Nature, but turned out to be pathbreaking, would get the recognition they deserve.

Further incentives for publishing as much raw data as possible and, where applicable, code, would help too. The NIH has done a good job here. They require a lot of high-throughput datasets to be made available raw if they were collected with NIH funding. They provide hosting for this. It means people can go back and re-analyze it, and you don't have to trust the authors' analyses.

Full Disclaimer: I am CEO and founder of a software company that works on reproducibility.

The disconnect between "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible" has extended to every domain in my experience.

With the way the funding model currently works, my hypothesis is that doing more, completely reproducible studies will drown out the non-reproducible studies (and funding model) with reproducible studies (funders ultimately vote with their checkbooks).

That seems like a second logical disconnect and completely impossible if the cost model remained the same. However, the question is do studies operate in a way that is maximally efficient for the scientists or for the economy of science publication and knowledge dissemination. We, at MyIRE, believe that technology can tip the scales in the favor of scientists.

Read more about our platform and business plan if you're interested:


NB: your website looks like "My Ire" is that intentional?


I'll also plug https://www.protocols.io/ and https://codeocean.com/ here --- might be interesting platforms for folks concerned about reproducibility to have on their radar.

> "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible". [...] I do not know how you square that circle.

I don't understand why these seem contradictory? To me the obvious (maybe naive?) reading is that scientist believe reproducibility is essential to the advancement of their fields, but also believe that being unable to reproduce a result can often be explained by their lack of sufficient information (or resources) to reproduce said results, rather than always assuming the other person must be lying (or just pretty darn lucky). I'm guessing this must've happened because they have actually come across such situations and observed many results that are indeed correct, but just difficult to reproduce. Why can't this explanation square that circle?

It suggests that the scientists are evaluating results using social proof.

In most fields that is fine; social proof is about as good as anything else if someone is studying literature. But the hard sciences are specifically about studying things that are best proven using technical over social proofs.

If social proof is being used excessively then there are grave questions about what the standard of technical proof is. We don't really know from the outside looking in, but the problem being highlighted is that there is technical evidence of a problem and scientists are proceeding based on social proof that there is not. Not a good sign, but also difficult to evaluate from outside the field.

However the incentive structures built around academics do look risky. Most papers and citations isn't a good incentive for high quality work. It incentivises low quality work and collusion.

That is correct. Even when reviewing papers, it is true. As a reviewer, you do not question, for instance, whether the authors did the experiments exactly as stated, or whether they tried to analyze the results 20 different ways until they found the way that looked best.

You take it on trust that they did these things correctly, and focus on whether their conclusions are justified from their data.

If reviewers take so much on trust, how much more so readers, then? There are very, very few actual standards of technical proof that are in play here.

Particularly when a paper says something that isn't particularly novel. If the results match "prior probabilities" from the literature, the paper will be believed without much question. If it doesn't, it will get more scrutiny. People quickly learn that it is easier to publish when your result fits the status quo.

Take a look at slide 10 of [1]. There are numerous examples like this where physical constants were first measured as a value, and gradually trended up over a long period of time towards today's "true" value. If the experiments were really independent, they would, generally, scatter randomly before converging on the true value. The fact that they did not, suggests investigators were using methods to "smooth" the difference between their data and prior findings.

And thus, we get self-perpetuating cycles of groupthink. And that, in turn, is why supposedly independent experiments cannot so easily be taken as independent points of evidence for an overall hypothesis.

[1] https://www.pas.rochester.edu/~sybenzvi/courses/phy403/2015s...

I'm curious about the ethical implications of your last point.

The maximally pessimistic view, which you and Feymnan seem to be espousing, is that people explicitly put "put their thumb on the scale" so that they get the right number. That's clearly bad.

The PDF you linked presents it as a more emergent phenomenon, driven by how people usually work. It's possibly an argument for working more slowly and carefully, or the use of pre-registration, but it seems ethically neutral.

Finally, you could think about this as a form of Bayesian updating, which each experiment nudges our previous best estimate of the value. Obviously, it would be better to do this formally, but it does seem more rational than completely discarding the past.

The PDF takes the point of view that it is caused by various forms of cognitive bias. I generally agree with that.

The reason that I don't think it's totally ethically neutral is that it is a basic responsibility of scientists to be on guard against cognitive biases to the best of their ability. It's possibly even the main feature that separates science from non-science.

Cognitive biases can become ethically bad particularly when they intersect with a person's personal interests. For example, if an investigator thinks "I won't be able to publish this result as easily if it diverges too much from the historical values, so I'll just run this experiment again", this is a problem. Even if it occurs totally subconsciously, it is a breach of duty because the scientist should take great care to avoid this kind of thing.

It could be viewed as Bayesian updating, yes. But my main point is that it greatly complicates the process of literature review and knowing how much certainty to assign to a scientific finding. If there are 10 papers saying X, but each one is highly dependent on the last, there is much less evidence for X than there appears to be, particularly to an outsider looking in.

> You take it on trust...

Is it trust? Or is it closer to "I'll scratch your back today, if you scratch my back tomorrow."? That is, if you blow the whistle (so to speak) on someone that's bad community karma, and that will come bak to haunt you.

That's not trust. That's a cartel.

It’s trust.

Specifically, you need to trust that when the authors claim to have reared mice in a high-oxygen environment, trained a monkey to move a joystick, or whatever the paper says, they did something like that. You can—-and should—-ask to see data demonstrating that they did it well, like oxygen levels in the mouse cage or trajectories produced by the monkey. However, unless those values are bizarre, it’s virtually impossible to know if they’re real or completely made up. Realistically, no one is going to fly you out so you can “audit” an experiment or record and review thousands of hours of surveillance footage.

When the act is mutually beneficial to both it's not trust. There is a clear incentive here for the reviewer to be less than thorough.

Put another way, the starting hypothesis sound be: this study is flawed. The reviewer should then approach it as such.

Not only isn't it trust. It's a violation of the scientific method. Yeah, sadly ironic (read: hypocritical).

I’m not saying you shouldn’t be skeptical. The reviewers’ job is to ask whether the approach used and data collected make sense and if so, whether and how well they support the authors’ claims.

At the same time, peer review is meant to be an advisory process, not an adversarial one. You should certainly flag things that seem problematic, and ideally offer solutions, but you’re not expected to—-and cannot, really—-tear down the experiment and root out every possible mistake or malfeasance. All you’ve got is a day or so and a 6000 word description of the project, so if the manuscript claims that the mouse weighed 12.3 grams, you’ve basically got to take that number at face value. Trust might be too strong of a word—-you could certainly question a weight of 123 grams, which is improbably large—-but you should at least start in equipoise with respect to some stuff.

I also don’t see the mutual benefit, beyond nebulous things like enjoying consensus. A short review is also easier to write than a long one, but it’s just as easy to be dismissive as uncritical. Reviews are usually unsigned—-and the authors and reviewers may not even know each other[0] so it’s hard for there to be an explicit quid pro quo.

[0] Of the stuff I reviewed in 2019, I knew exactly one of the authors, and here ‘knew’ means ‘Once amiably chatted with them in a coffee line at a conference’.

Asking “does this make sense” when it’s monkey trained to use joystick is something very different than trying to question and confirm “our model which is entirely based on these 40 other models which were all results of estimations of samples taken over the last 100 years then extrapolated out”... you can ask to see the monkey using the joystick.

The model verification would be years of work - so everyone just “trusts” it’s right, although we basically know it’s not because the chance it isn’t is just much more likely.

It doesn’t have to be a quid pro quo any more than it has to be “ugh, that sounds really boring to confirm, I’ll just believe it”.

When you review a paper, no one knows who you are except the editor of the journal. If there are any incentives at all for the reviewer, it is not to be arsed to review at all, since it takes time and you get nothing out of it.

There is some truth to the idea that some reviewers are lazy and don't bother to examine the paper as well as they should because that takes time. But when they do that, it irritates the editor, who is trying to make an informed decision on the paper, so they get bad karma for that.

If you want to be cynical and look for community politics, you should be directing your wrath at study sections / grant review panels. That is a totally different ball game and there is a fair amount of corruption there. Peer review may be imperfect and prone to some cognitive biases, but it is not generally corrupt.

> but also believe that being unable to reproduce a result can often be explained by their lack of sufficient information (or resources) to reproduce said results, rather than always assuming the other person must be lying (or just pretty darn lucky).

Most certainly, when you go to replicate something, and fail, your first assumption is that you did something wrong. Even the second or third time. But after enough tries, you start to look for another explanation. "Irreproducibility", as Nature and the common scientist would use the term, doesn't mean "I tried once and failed", it means "I tried everything I could possibly think of and it still doesn't work".

Also, for many types of experiment it is questionable whether they are reproducible even in principle. One of the common analyses I do is called RNA-seq. It quantifies, for each of ~25K genes, whether and how much it went up or down with a given perturbation.

So the result of such an experiment is a 25K long list with numbers attached to each gene. What would it mean to replicate such a result? Surely the list of significant genes and pathways will not be identical even if a robot were to perform exactly the same experiment, due to biological variability.

Some people find this property of nonfalsifiability to be convenient, however...

> "Irreproducibility", as Nature and the common scientist would use the term, doesn't mean "I tried once and failed", it means "I tried everything I could possibly think of and it still doesn't work".

I’m not a scientist, but it is surprising to me that the term would not be regarded as at least ambiguous. The article quotes a microbiologist who says “there is no consensus on what reproducibility is or should be.”

> But what really gets me is the disconnect between "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible". [...] I do not know how you square that circle.

If I was an academic, I imagine I would think:

1. Well, most of the papers I read are widely cited, reputable papers in widely read, reputable journals. A paper with 50 citations couldn't possibly be unreproducable!

2. Everyone knows there are scam journals and conferences that will accept everything. I'm sure it'd be easy to get a bad work accepted there, but I don't read anything like that.

3. And anyway, aren't the problems mostly in other fields? Everyone knows there are problems in comparative international underwater sports broadcasting studies, not serious subjects like mine.

I don’t think the public is going to cotton on. The incentives that created the reproducibility crisis (the application of demand-driven managerialism to education and research generally) have also compromised the public’s ability to observe and care about the problem. Changes to the funding model for basic research will continue to be driven by bureaucrats, not politicians responding to public pressure.

I agree with the last two paragraphs of your post viz. humility and reform.

> the disconnect between "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible"

Could it be that many scientists think: "there is a crisis, but not in my field!"? To be honest, I personally think this quite often (and then realize I am being naive).

> But what really gets me is the disconnect between "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible".

It sounds very much like an academic equivalent of Gell-Mann Amnesia - the phenomenon in which you notice that every news article on a topic in your field is complete garbage, people you know in other professions report the same situation with articles on topics in their fields, and yet when you turn the page and see an article on something outside your field, you forget about the whole thing and treat it as a gospel.

I find the science crisis even more worrying now that society and Internet spread the notion of "sourced everything or it's false".

if even 'scientist' themselves fail, then society is heading toward an absurdly frightening faux-intellectual inquisitive period.

Agreed! Seen with “Fact checking” websites aren’t about nuance, just a political position that argues “the part of the truth we want you to spread”.

I wouldn't conclude that much. I think they just believe that a paper is an absolute truth but most of them don't have enough knowledge about the ways and history of the scientific field. To me it's mostly an anxiety based reaction due to this era lack of 'promises' and emerging problems. People are running for certainty.

I would.

Fact checking, unfortunately, isn’t what we think it is. Despite the superficial appearance, fact checking isn’t a helpful tool for determining the truth and for forming an accurate opinion. Instead, it’s actually an in/out group filter which segregates people by belief and value, while allowing each group to believe they hold the Factual High-Ground, and to claim any subsequent moral position which proceeds from being “factually correct.

> Despite the superficial appearance, fact checking isn’t a helpful tool for determining the truth and for forming an accurate opinion

Sure it is. Not the bare conclusions viewed uncritically, but the support with references, if it exists (which in most notable fact checkers it does) absolutely is.

Isn't it the exact opposite? Presumably the papers scientists are reading are primarily the ones in their own field.

That depends on how critical you are. Our current incentive structure is out of control across the board and everything is skewing towards maximizing ROI in about every aspect of life money is involved in.

I apply this perspective to everything I see ("how are they getting money from this?") then work backwards and it seems to lead to accurate predictions, at least anecdotally. Perhaps I'm just jaded and cynical but it works well, unfortunately.

> Perhaps I'm just jaded and cynical but it works well, unfortunately.

It works well for me too.

I've learned to extend the "how are they getting money from this?" question into a generalized, first-principles thinking - look at the incentive structures and think about what they imply, about what's the expected behavior of a system running under those incentives. Because while not everything is about the money, systems will evolve over time along the lines of the incentives contained in them.

I'm arguing that the phenomena seem identical in their underlying structures. In both cases, we're dealing with a situation in which a person faces mountains of evidence that a source (a newspaper, or scientific papers in a given field) keeps pumping out inaccurate or wrong publications, but despite all that evidence, they assume that whatever isn't explicitly pointed out as wrong must be 100% true, accurate and honest.

Ah, yes. Wishful thinking.

Ideally, but once you are deep in a field what a field even is becomes murky. You have a biology paper you submit for review, but methodologically it's not really a biology paper, but maybe a statistics paper, or a computer science paper, or a theoretical math paper. Who reviews that paper, the person who knows the biology in question or the methods through and through? Sometimes the only person in the world who knows the theory and the technology in question the best is the author.

> people you know in other professions report the same situation with articles on topics in their fields

The speculation referred to as the "Gell-Mann Amnesia effect" starts with a subject identifying a low-quality article riddled with errors from a section of the newspaper within their area of expertise. They then turn the page to another section of the newspaper outside their area of expertise and "treat it as gospel" without thinking critically.

Maybe it's true, maybe it's not. But you are adding another part-- colleagues who use relevant expertise to inform the subject that the other sections of the newspaper are also low-quality and riddled with errors.

With that addendum I'm confident that I can now beat the house. I will take the bet against this modified Gell-Mann Amnesia effect for any amount the casino is willing to let me wager.

Edit: clarification

> One day the public is going to cotton on to all of this.

I mean, the flat-earth, anti-vax, climate-denier folks are well and cottoned. In the US, one of the major political parties what controls a fair amount of federal and state legislature seats is pretty sure 'scientists' are just liars and should be treated as such.

These chickens aren't going to come home to roost, they are roosting and their chicks are hatching already.

In a thread where the topic is “lots of science might be wrong” your take away is “doesn’t matter, everything is republican’s fault” and not “we should slow on using complex non-reproducible science to drive public policy because we know lots can’t be confirmed and there is no great way to tell what is definite“ ... well I suppose you demonstrated why it’s a “crisis”.

”One day the public is going to cotton on to all of this. I cringe every time I hear extremely authoritative lectures on "what scientists say" about highly politicized public policy matters....I'd exercise some humility.”

You shouldn’t be cringing. You should be educating people about how science actually works, and how it simply doesn’t matter very much whether any particular paper is reproducible. It’s a straw-man argument, because most papers aren’t worth reproducing. I don’t know many good scientists who take what they read (in any journal) at face value. If you do, you’ve been mislead somewhere during your training (fwiw, I also have a PhD). At best, papers are sources of ideas. Interesting ideas get tested again. Most get ignored. Even if a few hokum theories become popular for a while, eventually they’re revealed for what they are.

The tiny percentage of subjects that rise to the level of public policy discussion end up being so extensively investigated that reproduction of results is essentially guaranteed. And yeah, you hear lots of silly noise from university PR departments, but that stuff is a flash in the pan.

For example, nobody legitimate is in doubt of the broader facts of global climate change or evolution or vaccination, even if 95% of (say) social science reaults turn out to be complete bunk. Yet climate deniers, anti-vaxxers and “intelligent design” trolls absolutely love it when this distinction is ignored, because it allows them to confuse the public on the legitimacy of science as a process.

It's true that science doesn't value every published paper equally, but it's also true that publish or perish is creating ever-growing mountains of worthless papers. This is a real problem, and drags the quality of everything down.

Besides, the fact that there isn't one reputable journal in most fields that remains untarnished by the replication crisis is both a practical problem and a problem of public trust. A lot of this BS science is paid for directly by the public's tax dollars, or else by their student loans. I wouldn't expect the public to be so forgiving if 95% of it is bunk.

The public is oblivious at the moment because science still has credibility . It s going to take a major catastrophic misstep of science (which will inevitably happen the way things are going) for the public to lose trust

“it's also true that publish or perish is creating ever-growing mountains of worthless papers.”

Is this true? Prove your claim.

”This is a real problem, and drags the quality of everything down.“

Let’s assume your first assertion is true. Is it automatically true that your second claim follows? Why?

I see no evidence that the individual productivity of scientists has changed much in the last 30 years, nor do I notice much of a change in the aggregate quality of science. Crappy science existed hundreds of years ago, and it continues to exist today. The main difference, as far as I can tell, is that we have a lot more scientists now.

In any case, these are just assertions, not arguments.

You would think crappy science would go down with progress? I think not.

You should look for papers in psychology/sociology, AI (recommendation engines, accuracy), economics, nutrition and medicine. Marketing papers are also interesting, I guess.

As an anecdote, I dug into sexuality, gender papers recently and was baffled at the amount of shit I came across. I couldn't believe someone published it.

> Crappy science existed hundreds of years ago, and it continues to exist today. The main difference, as far as I can tell, is that we have a lot more scientists now

And ability to influence a lot more people faster than ever before with increasing level of distrust. Not to mention, papers from US and EU affects other nations perhaps more. Many people blindly piggyback due to not enough funding to replicate or perform our own analysis. Funding being scarce promotes hype and flock people towards whatever media popularizes.

It results in a very weird disconnection on topics that are dependent on population, history, culture, and other location sensitive data. The base is contaminated, anything built on it is not going to suddenly turn into truth.

There have always been weaker scientists, but there hasn't always been the economic incentive to publish in order to maintain a teaching position. This is a relatively recent (few decades) thing and is due to structural factors in academia and society. If you require proof, I'm not really sure what to say to you, as evidence is not hard to find, but if you don't already see it, I'm unlikely to change your mind.

If you want to claim that “publish or perish” (which, btw, has been a part of academic life essentially forever) is somehow recently affecting the volume of papers being produced, you should be able to provide evidence of that in a straightforward manner. One obvious test: is the per-capita rate of publication increasing? (my experience says “no”, but I’m open to contrary evidence.)

You have a hypothesis of what’s going on, but you’ve provided no evidence for that hypothesis, and when challenged to provide some, you tell other people it’s their job to do it for you.

It’s not my job to prove your extraordinary claims.

The number of publications per year per academic seems to me to have increased over the last 50 years. I don't have a citation.

Regardless, my original claim was that the absolute number of papers is growing, and most of them are trash. I think the sheer volume of trash has consequences that were not so serious 100 years ago, even if the percentage of trash was the same. I strongly suspect the percentage of trash has been going up, as well.

My argument is that "publish or perish" makes less and less sense the more active scientists and researchers there are, even if the average quality and the rate of publication per academic were constant, because the appetite and rate at which research can be assimilated by society is limited, and does not scale with population, while the number of scientists does.

I don't think these claims are extraordinary, and if you do, I'm not going to go looking for extraordinary evidence to try to convince you. I don't think I'm the only that sees these effects, however.

”The number of publications per year per academic seems to me to have increased over the last 50 years. I don't have a citation.”

Yeah, that’s not evidence.

The absolute number of papers is growing - along with the number of working scientists. There’s been huge growth in academic science since the 1970s.

> One obvious test: is the per-capita rate of publication increasing?

It definitely is, in biology at least. My graduate mentor was really interested in publication metrics (as in, he published studies on them). The main driver is not necessarily crappy journals though, it is the increasing number of authors per paper.

I have no idea how you would evaluate something like "the average quality of papers is decreasing". I actually agree with GP that it is, but that's just, like, my opinion, man.

Author lists may (again: evidence required) be growing, but that’s not at all the same thing as per-capita publication rate, and doesn’t support the GP claim.

A priori, long author lists indicate collaboration, which is generally a good thing.

The number of citations/references has ballooned as well. So, if papers stand on more shoulders , most of which are crappy shoulders, they ll make crappy science

Reminds me of few antivaxers and trolls who grab those research papers to recruit people. If you see their arguments without contrary points, it's not impossible to fall for it.

> it simply doesn’t matter very much whether any particular paper is reproducible

If we are talking about highly abstract types of science, I agree entirely. The problem is that there are strong incentives for groupthink, even where public politics aren't involved.

For example, I'm involved in the aging field. One of the popular aging hypotheses was oxidative stress. Because of the number of scientists with careers invested in that hypothesis, research kept on for well over a decade after it was debunked. In fact, I work with a person who cowrote the paper that authoritatively debunked it over a decade ago, and that person still studies it and writes as if it were still true!

How much more so if the subject is politicized beyond a narrow community of scientists, then. I do not want to get into a political debate here, but evolution and vaccinations have over a century of scrutiny, whereas other fields do not.

Another example relatively close to my area is nutrition. Scientists have been totally convinced that fat is bad, sugar is bad, both are bad, all things are good in moderation... Even the public considers nutrition to be a joke for this reason. It is not enough for a community of scientists to agree on something, a fact needs time to "settle".

I agree totally with the "process vs individual paper" distinction, I would just propose the heuristic that "the more politicized the subject, the longer the process takes".

> I don’t know many good scientists who take what they read (in any journal) at face value

Sure, in journal club people tear apart papers. And maybe in private conversation. But then, on the other hand, look at the statistics in this survey. Or look at the way this same papers that they might privately pooh-pooh will be uncritically cited in a grant application or paper if it supports their hypothesis.

"The tiny percentage of subjects that rise to the level of public policy discussion end up being so extensively investigated that reproduction of results is essentially guaranteed."

I really don't think this is true.

Ideas are war in politics, and so the truth generally is the first casualty.

Golly I wish bureaucrats would pay more attention to the nuances of science.

Science is not politics, nor should it change in response to political forces.

The root problem is not that science has changed, but that you’re seeing political attacks on science.

> Golly I wish bureaucrats would pay more attention to the nuances of science.

Instead we get a PR campaign to put an self-described unstable 16yo who can “literally see invisible CO2” on the world stage frowning and yelling about stolen childhood.

I have no hopes for political reforms to actually look at nuance while this nonsense seems to work.

This rings true. But I think basic science is poor on interesting ideas. Thats due to the pursuit of ‘minimal publishable ideas’ and consequent lack of conviction/long term perseverance. It’s easier to follow the next trendy thing rather than exhausting a search space

Outside of academia, the patent situation looks much like this. In theory, patents have two parts -- the claims (what specific attributes of your invention are getting patent protection) and the teaching (the bulk of the text/diagrams, which are supposed to teach one of "ordinary skill in the art" what would be needed to duplicate the invention).

Turns out that you want claims as broad as possible and teachings as useless as possible -- so that nobody can read your patent and then patent other surrounding things.

It's gotten so bad that most companies now instruct their employees never to read any patents -- the potential liability increase because they "knew" that something potentially-patented was out there so fantastically dwarfs what you could learn by reading the "teachings"

It's all about incentives, as you say. And science today has more or less just as perverse incentives as the IP marketplace.

> Scientists are generally fairly smart people. Put smart people in a hypercompetitive environment and they will quickly identify the traits that are being rewarded and optimize for those. What does academia reward? Large numbers of publications with lots of citations, nothing else matters. So, naturally, scientists flock to hip buzzword-filled topics and churn out papers as quickly as they can.

Until the fundamentals change - which might never happen - it would help if the science-centric communities showed some humility, as well as some transparency. Even with proper incentives, the process is flaw. It's human-based so it will always be flawed. There's nothing wrong with that. It's the best we got. But perfection it is not.

It troubles me when I see those who question science shut down and marginalizes. As if science has some perfect track record. Per the OP, even science has question about science. Fair enough. But does it have to be wrapped in denial?

Maybe we should be asking the question: "Why are academics citing papers that can't be reproduced?"

A lack of reproducibility of cited papers IMO undermines the credibility of citing papers.

>undermines the credibility of citing papers

That credibility is already undermined by the fact that for citation-measuring purposes, there's no real difference between any of the following citations:

(1) "Introduction. This paper is a journey into the amazing consequences made possible by (Smith et al, 2019)"

(2) "Introduction. In this paper we show that (Smith et al, 2019) is a steaming pile of crap"

(3) "Footnote. This minor remark is vaguely reminiscent of (Smith et al, 2019)"

All contribute the same citation juice to Smith et al, 2019.

>smart people in a hypercompetitive environment and they will quickly identify the traits that are being rewarded and optimize for those. What does academia reward? Large numbers of publications with lots of citations, nothing else matters. So, naturally, scientists flock to hip buzzword-filled topics and churn out papers as quickly as they can.

For some other opinions, consider:

1) Paul Graham who states this is the biggesg lesson to unlearn in academia. He states academia selects for people who tend to 'hack,' not learn, to the easiest grades/favors: https://news.ycombinator.com/item?id=21729619

2) Feynman who often stated things like "the pleasure is in finding things out" and that "the whole academic department was rotten."

I'm not convinced academia is selecting primarily the attribute: intelligence.

This situation is also absolutely devastating for modeling science. Pretty much any model can be "based on empirical studies" since just about anything has been found to be statistically "significant". Unconstrained models are useless, as are unconstrained theories.

It's also not enough to say "this study is not reproducible". Why isn't it? Towards what direction should the experimenters move next. It's not enough for biology to do "trial and error" studies , but they should be continued and provide more depth to their findings. Sadly , this is not happening largely because of lack of faith: the scientists themselves don't have enough faith in their own results to be making long bets on them.

> Put smart people in a hypercompetitive environment and they will quickly identify the traits that are being rewarded and optimize for those.

Or you select for hypercompetitive people who don't mind bending the rules. Smart people also have options outside of college.

It’s kind of weird, though, I’d imagine that most of them didn’t get into academia to try to spend their time gaming a system, and they’re giving up quite a lot in material terms to be in academia. So why do it? Are they mostly in it for the prestige rather than the ability to actually move the state of the world forward?

Ideals clashing with reality. Some, like yours truly, give up and go to the industry. Some try to make it work, game the system a bit, do some good. Then they get kids, get older, generally run out of energy to do two things in one job. And guess which one can be sacrificed without upending your life.

Thanks for sharing, but if I might nitpick for a moment:

> Put smart people in a hypercompetitive environment and they will quickly identify the traits that are being rewarded and optimize for those.

This is true for anybody, not just "smart" people. It's basically human nature.

Well, there are two simple steps to change those incentives:

1. Journals should accept hypotheses/procedures, and commit to publish whatever conclusion results, before the experiment is even started. If the experiment is not completed for whatever reason, a transparency document should be published explaining why.

2. Journals should accept only a small percentage[1] of new research. The rest should be attempts to reproduce old research, again only evaluating the hypotheses/procedures, before the attempt to reproduce has started.

The challenge here is that there's a bit of a chicken and egg problem: journals won't want to commit to publish a result if no one has committed to fund the experiment, but funding sources won't want to fund experiments if the no one has committed to publish the result. So there would need to be some collaboration between

[1] Choosing this percentage is the proper usage of P values. The goal is to attempt to reproduce experiments until the product of the P-values of the results reaches a target aggregate P. Note that this target applies to P and P'.

Example 1: Your target P is P=0.01. You perform an experiment, get a P=0.3 result. Then you attempt to reproduce, and get a P=0.4 result, for an aggregate P of 0.12. You then attempt to reproduce again, and get a P=0.2 result, for an aggregate P of 0.024. Finally, you attempt to reproduce again, and get a P=0.4 result, for an aggregate P of 0.0096, below your target P. This proves the alternate hypothesis with the target confidence.

Example 2: Your target P is P=0.01. You perform an experiment, get a P=0.7 result (P'=0.3). You then attempt to reproduce, with a P=0.9 result, for an aggregate P' of P'=(0.3 x 0.1)=0.03. You then attempt to reproduce, with a P=0.9 result, for an aggregate P' of P'=(0.03 x 0.1)=0.003, below your target P. This proves the null hypothesis with the target confidence.

The example P values were chosen for a few reasons. First, it demonstrates that you can find fairly conclusive confidence values eventually from aggregating experiments with fairly inconclusive results. Second, it demonstrates that the P=0.05 that's frequently used now is actually a very low bar of confidence, when you consider that reproducing even very unsurprising results a few times gives you a much higher confidence.

> Scientists are generally fairly smart people. Put smart people in a hypercompetitive environment and they will quickly identify the traits that are being rewarded and optimize for those.

I don't understand this. Scientists could make a lot more money in industry, so this makes me wonder why these "corner-cutting" scientists are doing the work they are doing. Surely not for idealistic reasons.

Actually, it is mostly for idealistic reasons. Scientists compartmentalize and often don't view it as corner cutting with respect to the science. I'll try to explain.

Scientists generally like doing science. Nowadays, the higher you get in science the more you get buried under things that are not science - endless project proposals, reports, reviews, applications and yes, publications. Most scientists cut corners as much as they can get away with on most of these because if you don't, you will never get to actually doing science (others just work 100 hour weeks, but you can't really sustain that).

Now unlike all of the bureaucratic garbage publications are supposed to be about communicating your work to your peers, and integral part of science, but in the existing system it's really hard to actually retain that view. Why? Because publications are now intrinsically linked to all the metrics - you need X publications for project Y, containing the right set of buzzwords, submitted to journals that satisfy the right quantitative metrics and are listed in the right databases, with the right set of coauthors, references and acknowledgments and you need to submit them before the right deadline, or else. Spend a decade or two in that environment and it's easy to lump writing publications along with the endless proposals and reports - as mindless drudgery you have to dispense with in order to squeeze out some time for actual science.

Of course, you also need to devote enough attention to all the things I listed and much more in order to actually retain your ability to do science and not starve for it, so it's an eternal balancing act that breeds a lot of resentment.

> Scientists could make a lot more money in industry

That may be true, as well as accurate, for a field like machine learning. But for most of science, this is a highly misleading claim. Some science (e.g. drug development, materials science) can be done also in industry, although with a lot less freedom than in academia. But most science simply doesn't have an equivalent option in industry. Scientists are doing the work they are doing largely because they are interested researching specific questions, which can not be pursued anywhere else.

> Most papers I come across turn out to be difficult if not impossible to reproduce

How is this even possible? Isn't Reproducibility one the pillars of Science? What gives such a paper credibility over say, an eyewitness description of a UFO sighting?

It's one thing to not perform the actual reproduction experiments, but publishing claims that (almost) can't be reproduced is another.

i.e. take the money out of it then. Pay researchers the same regardless of number of publications. Cut %30, at least, of the administration of a major research University and use that money to fund research, raise professor pay, and lower undergrad tuition. Then put a block on admin raises unless researchers and professors get an equal % bump.

Then elect better politicians that actually understand at least a little bit of science and how that correlates to bettering society so they're get bills past to fund higher education and research. We've got the money for it, but it's all in bombs at the moment.

I have seen your point of view, shared over and over again.

Lack of intensives for reproducible research (and for verification of research) is the fundamental problem.

If you are in the shoes of policy makers (eg politicians). What changes would you propose (through both laws and funding/grants models)?

"Multiply the impact of empirical findings by (#positive reproductions - #negative reproductions)" sounds nice, probably bound it to some interval like [-4, +3].

But this means your measurement pipeline needs to understand more of the paper than just the bibliography section.

I briefly addressed this here: https://news.ycombinator.com/item?id=21964840

Ah the Bubka trick, thanks for pointing it out :)


Progress has to be incremental because reality is local. It’s the speed of progress (energy of the system) that is important. If you are too slow, people will pass you. If you are too fast, your competitors will catch up faster. That’s why the champions and hustlers alike don’t show all their tricks unless absolutely necessary to their challengers. Even industry / monopolies do incremental releases if they are ahead of their competitors... but if intel or nvidia slow down too much with their incremental progress amd can sneak in.

How to fix?

The answer is simple, but it's also one that's completely useless for anyone reading this post. What needs to happen is a gradual realignment in both the funding and hiring criteria across all of academia.

There needs to be less emphasis on quantitative metrics like citation counts, h-indices and other trivially gameable nonsense, and more emphasis on human judgment along with a set of basic criteria to provide a floor to research quality across the board (things that every good scientist should do, as opposed to things that you should relentlessly maximize). This would require the non-scientific managerial class to abrogate some of the power they now wield in the academia and relevant funding institutions, therefore is very unlikely to happen.

And - probably, most annoyingly to the HN crowd - there is nothing meaningful that technological or business disruption can accomplish here. These are systemic social problems that have been continuously and systemically building up since (approximately) WW2. At this point the only entities large enough to make a difference are probably the NIH and NSF in the U.S. and the relevant EU funding agencies in Europe. In my judgment, these are precisely the agencies least likely to institute such change, so here we are.

At my institute, there is only one metric that matters: dollars you bring in through grants. There are a variety of ways to get to the cash: you can do it with a small number of high-impact papers, a larger number of low-impact papers, by finding a valuable niche, or by chasing fads.

So I would say that fundamentally the problem is with study sections and how they think. They are incredibly conservative, but in all the wrong ways. They highly value institution of origin, preliminary data from pilots, no matter how shitty, and the sort of hypothesis that seems reasonable from prior literature. There are also a lot of political games on study sections.

This means that if you repeat a lie often enough in the literature, and you know people, then voila, you now have support for the same hypothesis in future grants.

Peer review is not working for funding. I would say the NIH (in my case) needs to hire completely independent full-time reviewers, and needs to place a real, high emphasis on reproducibility. It is in the NIH's interest to do so, because as I said, its funding will be cut if this continues.

Is this the case in elite institutions? I thought researchers there have more freedom in what they do.

Haha, no. The more elite the institution, the more money you are expected to bring in. It is compensated somewhat by the fact that a big name institution will give you an advantage in getting grants.

There is a bit of a difference between working directly for a university and working for a research nonprofit, like Dana Farber or MD Anderson or whatever.

If you work for a university, and you don't pull in a lot of money, what will happen is that you will be expected to teach more hours to "make up for it". You are less likely to be fired. At a nonprofit, if you don't cover your own salary and expenses through grants, you will quickly become unemployed.

> At this point the only entities large enough to make a difference are probably the NIH and NSF in the U.S. and the relevant EU funding agencies in Europe.

I m really not sure about that as the publishing bubble coincides with the rise in funding by public bodies [1]. Funding safe projects is a way to disperse funds to as many people as possible, and that's not necessarily good. At least in EU it feels that way. The politics of funding seem to naturally have gravitated to the current situation. I doubt the model of funding can change with the same stakeholders.

[1] https://phoenixrising.me/wp-content/uploads/NIH_Total-Fundin... , https://www.researchgate.net/publication/327917254/figure/fi...

There may be some tiny influence that the HN crowd can have through the impact of hiring influences in industry on respective incentives in academia, but overall, I agree completely. Solutions are easy to state but as always the difficulty is finding a lever and a place to stand to put them in practice.

If we put more weight on human judgment then we'll get more favoritism, nepotism, and focus on politically popular topics. So we might end up just exchanging one set of problems for another.

You won't get much more focus on politically (in the context of the academia) popular topics, the existing system already imposes that particular bias exceedingly well (how do you think you get citations?). But yes, you'll get more everyday human issues along with the judgment. In my view, that's far preferable to the existing situation, if only for the simple reason that the human problems will vary from place to place, whereas the existing system distorts virtually the entirety of academic research in roughly the same ways everywhere, to the uniform detriment of all.

I used to be more worried about favoritism, nepotism et al. but these days, I'm more worried about quantifiable metrics. The former are situational, and there's only so much nepotism you can engage with before your peers in other places start considering it to be in bad taste. The latter, however, can be ruthlessly and efficiently optimized, to the full exclusion of any value that's not captured in the metrics.

At least favoritism has some variability in it. The problem with metrics, besides being gamed, is that they are so uniform.

I wish people would stop using the word “simple” to describe problems like massively realigning all the incentives in a huge decentralised system against the wishes of the existing power-holders in that system. That’s not simple, it’s probably the most complex problem that exists.

That's the hundred-billion-dollar question, isn't it.

My radical proposal is to separate the research institutions from the universities -- especially the public universities. If you want public research institutes, that's fine -- but they should be their own entities.

The university system, with its mix of research and instruction, and its tenure-vs-non-tenure system creates terrible incentives.

Top researchers are rarely also the best instructors -- and so it makes little sense for the two occupations to be so intimately tied.

The university funding model is terribly destructive to top science labs as well -- the university sees them as a source of prestige, sure, but more importantly as a source of that all-too-critical grant money (entirely too much of which is bogarted by the rest of the university) -- putting many of the labs in this place where they have to keep securing grants at an ever-increasing pace no matter what, to keep the rest of the university flush with cash to spend elsewhere.

There's really no good reason why research needs to be done at a university, or why a research institution needs a body of lecturers and students.

I absolutely agree.

My wife did a PhD at an elite private university, undergrad tuition north of $50k/year not including room and board and other expenses. At one point three senior professors in her department went to the department chair and the dean with a proposal to revamp a notoriously unpopular intro class in their department. Rather than having it taught by adjuncts or postdocs, the three of them would team teach it. They would change the syllabus to bring it up to date with the current methods in the field--it had hardly been touched in two decades.

They were told to forget about it. The University said it was a waste of time for them to teach undergrads.

Presumably the kids applying to these types of universities are not aware of how little concern the institution has for the quality of their education (the perceived quality, on the other hand, is extremely important--but only loosely connected to the actual quality).

The current undergraduate university market is almost entirely a reputational marketplace -- people are shopping for reputation/prestige.

It turns out that prestige has very little to do with undergraduate education quality.

This used to make lots of sense -- for centuries, college was more important as a place to meet other people as it was a place to actually learn useful skills -- and so you were primarily trying to go to the highest-class place you could get yourself in to.

In this day and age, though, I don't think that's the right way to approach choosing a school.

I agree with all of this except the last sentence, which I entirely disagree with.

The gap between elites and non-elites is growing, and as a result, the value of breaking into the social circles of the wealthy and "legacy" students who dominate elite schools is also growing.

Meanwhile, high-quality educational content is becoming more widely available - often published for free by those same elite schools - so the value of the education itself is approaching zero. (For some programs, notably elite MBA programs, the value of the education is basically zero already and has been for some time.)

If a course is taught by a TA or adjunct who barely speaks the language in which it's conducted, rather than by the distinguished professors whose names are on the department website, that's an annoyance, but all it costs you is some time on Khan Academy or Coursera to learn the material on your own. Your well-connected classmate who introduces you to a hiring manager or angel investor will have a much larger impact on your life.

What do you think has changed? Is shopping for prestige a less useful strategy than it used to be? Now that more people go to university, arguably prestige is even more important.

Most research at universities are held afloat by incredibly cheap yet fairly capable and specialized student labor: everything from post docs to undergrads. I'm not sure how you get around that unless you can nationally and efficiently provide central administrative services associated university research institutions provide.

You could of course lower compensation for academics to incentivize other professionals to support research but you'll find academics fleeing to other opportunities in many cases.

The post doc that works at a university today could work at a research institution tomorrow in an otherwise-identical job.

Accreditation could be performed for research institutions just as it is for university departments today -- PhD candidates could work in research institutions just as well as they do in university departments. The few classes they take are all within the department that they're researching in anyway, and are also all separate from classes that undergrads take -- so moving that education out of the university and in to the research institute seems fine too.

That's not to say that instituting my plan would be easy. I did call it 'radical', after all

Academic compensation is already low compared to what a most capable researchers could earn doing less interesting but more profitable work in industry. The competitiveness of the field isn't because the pay is good, people are competitive for all the same reasons that they are willing to work in the academy to begin with.

Until university positions and research grants stop being given out based on prior research _results_ (which is what journals tend to look for), we won't be able to trust the research performed there.

There are millions of dollars on the line for researchers involved. It is the difference between a well paid career and a life of destitution while being a slave to huge student debt.

Professorship and research grants should be given out based on criteria that are only incidental to research results.

Evaluate profs and grants based on:

1. Domain knowledge (test the applicants) 2. Math skills (test the applicants, makes sure they know about how to avoid p-hacking, preregistration etc.) 3. Motivation and leadership. 4. Prior and current research _proposals_ (but without looking at the results or whether they have been published). 5. Other skills such as communication, interpersonal skills, outstanding achievements etc.

Universities should not rely on journals to evaluate their professors. This corrupts the whole system. Journals have different goals. They want to publish well done research with interesting results. Universities should hire researchers that do good quality research with interesting _questions_ regardless of the results.

If universities keep giving out jobs based on having generated interesting publishable results, they are going to keep getting researchers that ignore biases and and ignore bad science practices in order to generate publishable (but unreliable and often false) results.

Making a significant contribution to your field used to be a reasonable requirement for a Ph.D. However, we are churning out more and more Ph.Ds, and there just aren't that many new discoveries to be made. This is creating a pressure to create more and more splintered fields, and more and more useless publications that nobody will ever read. Furthermore the academic system uses publication to measure the value of professors or researchers, creating yet more useless crap published every year. Only changing the incentives can improve the situation and it seems academia left to its own devices never will. Maybe it's time for governments to take a more active role in basic research again. Maybe we need more "patent clerk" jobs where people have time to think without having to chase structured research grants.

Reward being right, penalize being wrong, and transfer information in a manner that facilitates determining what is right and what is wrong.

One page (https://www.kn-x.com/static/PWFeb17forum.pdf), two page (https://ssrn.com/abstract=2835131), and three page (https://ssrn.com/abstract=2713998) versions of this are available. Further details are provided at http://kn-x.com.

I doubt anyone on this thread is going to like this solution ... but I think most on this thread agree that the underlying problem is the incentives, that fixing the problem requires changing the incentives, and that changing the incentives within the existing system is very hard.

Not the OP, but as a working scientist, it seems to me there are two important issues, among many others.

(i) Reproducibility is not a criterion for tenure evaluation, therefore it bears little relevance for career advancement for young scientists. This is an overworked and underpaid group in academia. (In the bay area, they earn 1/3--1/2 the salary of an entry level software engineer but probably work 50% more hours.) They simply don't have the luxury to take time off to reproduce someone else's work.

(ii) No major funding agency would be willing to support the kind of work that reproduces published studies. When something is published, it is considered "done" and not worth spending more money on it.

There needs to be a sea change. But sadly, the academia is at best paying lip services to the structural problems in the "reproducibility crisis". I'd expect it to continue like this for most fields.

Historically, the sciences were limited to people that had the wealth to independently research and verify their work. These days, people perform scientific research for a living wage (which in itself, I am not saying is a bad thing --scientific progress is a good thing-- no doubt), but people have to optimize for income producing potential, which means pumping out papers, even if they are of dubious quality, which may or may not intentionally be done. Not all scientists are optimizing for number of publications, but a good number are.

It's already fixed to some degree. Any good scientist who can capture economic value spin-off to private labs.

It's the good scientist who produces economic value but cannot capture it that we have to support but maybe that person cannot easily be found. The state just has to overfund and accept that some large percent goes to nonsense.

In my mind, that's okay, though, since the guys who can capture economic value can do a lot with what we have.

Science shouldn't be funded this way

is short termish smart smart at all?

One factor that might increase reproducibility is to enhance negative reputational effects of publishing an un-reproducible study. That is, it should be so rare, so outré, to publish something unreproducible, your professional reputation is severely damaged, and if you do it again, you will leave academia.

It sounds harsh, but severe self-regulation is what prevents the catastrophic failure of any institution that is really only answerable to itself. Professors aren't doing any favors by rubberstamping weak papers, or giving weak students PhD's (or BS's for that matter).

EDIT: down-votes and no replies, as expected. Hey academics, sometimes to save the patient you have to amputate the limb. I care about science and want it to survive, so you guys continuing to be nice to each other even when you publish BS isn't serving that purpose.

The problem is not being reproducible is not necessarily a failure and does not necessarily indicate a bad actor, it may be bunk but it may show a worthwhile direction for others to investigate further. The problems come when an non-reproducible study gains some authority, citations in other studies or published in the media without proper vetting.

It's probably worth punishing people that consistently publish UN-reproducible work, but it shouldn't a rare career destroying event like you are suggesting.

>not being reproducible is not necessarily a failure and does not necessarily indicate a bad actor

I disagree with the first part and agree with the second. I would argue that failure to reproduce is perhaps the worst kind of scientific failure, because the activity cannot be called science anymore. And no, I don't mean anything personal when I say 'failure', you don't have to be malicious to do bad science -- after all 'doing bad science' is the default human condition.

An analogy with classical music: A musician that fails to reproduce the score with his violin consistently, will lose his job quickly. It doesn't mean he's a bad person, but it does mean he's not good enough for the orchestra. The metaphor for the "reproducibility crisis" in an orchestra is what happens if they let bad players stay, the orchestra sounds worse and worse, and finally the audience stops coming. In the apocalyptic scenario, cultural forces cause ALL orchestras to stop honestly evaluating the skill of their players, and all orchestras simultaneously lose their ability to accurately play any but the simplest music.

Standards are painful for those that can't meet them, but they make the world better, overall.

You can judge a musician in a 20-minute audition, but you can't tell if research will reproduce without doing the work. However, there are papers that are so vague and poorly written that they cannot possibly be reproducible, and this is becoming the norm in some fields, so I think you're right that standards would help there.

Also, sometimes strange things just happen randomly (like the CERN faster-than-light neutrinos) and there's no shame in publishing results that aren't correct as long as you're honest about it. Some early-stage, low-power studies are going to randomly show results that don't reproduce, and that's also fine. Suppressing these results would hurt progress just as much as making the opposite error.

'I would argue that failure to reproduce is perhaps the worst kind of scientific failure, because the activity cannot be called science anymore.'

This is only true if the results of research are deterministic. In many fields, if a study is estimating an effect that exists in the world as a distribution, a failure to reproduce might not be indicative of...anything.

No, that's unfair. If I pre-register my hypothesis, run good methods, and conclude at the p < 0.01, I did things well. I should expect to get some 1% of my studies to fail to reproduce. That means that I routinely do not fail to reproduce but failing a few repros over my career is normal.

The real problem is the other guy who didn't pre-register the same hypothesis that he tested. So we don't know that I got it by chance until the repro failed.

Non-reproducibility doesn't imply shoddy work so it isn't fair to damage people's reputations for this reason. A paper may be non-reproducible because it isn't descriptive enough, because their data is not published yet, because a software library is deprecated, because a URL has changed etc, all of which are only discovered when somebody goes to reproduce it. Researchers can strive to make their work reproducible and still unwittingly fail.

If Github goes bust in 20 years you'll find a significant chunk of current work is non-reproducible despite author's best efforts.

There is definitely an argument to be made for increased negative (reputational) effects. However, many unreproducable results are only discovered years, if not decades, after initial publication. This makes the discovery less impactfull for the original researchers. In that timespan, many have left academia; others have more recent "shiny" publications to point to instead. It's sad, but in addition to the publish-or-perish, there's also a publish-and-forget mentality in (some) academia, in which your project is finished and you move on to the next "hot" topic. Therefore, my concern is with the efficacy of such measures. What are your thoughts on how to implement such a reputational effect? Public shaming? Should we register fraudulentprofessor.com?

> Professors aren't doing any favors by rubberstamping weak papers, or giving weak students PhD's

Agreed. However, I think this is a sort of tragedy of the commons, in which "good" behavior of some professors will not stop the onslaught of "bad" behavior of others. Certainly the good ones will be outnumbered in terms of graduates and since university performance reviews rely mostly on sheer numbers (students, papers..), there is an argument to be made that universities self-select for the rubber-stamp professor over the honest ones.

I feel a fair solution is to re-imagine the universities' performance review system, although I have a hard time coming up with concrete and good measures of how to select the good from the bad professors.

>What are your thoughts on how to implement such a reputational effect?

I think it should be similar to how journalists are treated after they are involved in a major scandal (usually reporting something speculative as fact). Personally, I don't think there's any excuse for unreproducible results. They are worse than silence, they are lies that spread other lies.

(To continue your journalist analogy)

So what happens to "the journalist" when something speculative is reported in a newspaper piece that is written by 7 people five years ago? In addition, five of the seven writers are now working different jobs in the private sector, no longer writing in any newspaper.

Nothing happens in journalism much like in science and in many other fields like disgraced police officers who end up climbing the ladder still.

Case in point: Michael Riley who wrote Bloomberg's thoroughly debunked story on SuperMicro "The Big Hack", has now been promoted to manage all of the their cybersecurity coverage.

I think one problem with this approach is that maybe less scientists will take on fields / topics where the data is inherently elusive or just very hard to get. Could be rare natural phenomena, highly sensitive data, or whatnot. Maybe the collection of data is so prohibitively expensive or specialized, that other researchers are not able to recreate the process accurately enough, which in turn leads to irreproducibility.

Kinda-OT but: During my studies I've read papers from everything to organisational theory (MBA stuff) to pretty "pure" CS (like compiler construction etc), and have in that process read so many papers in many different fields that just don't provide any insight or are barely comprehensible.

Of course, sometimes it can probably be blamed on the reader (me) for not having enough insight, but many times the papers are just crap.

My naive imagination of academic papers before university was that they were like those I now read in the __top__ journals / conferences in each field: concise understandable writing with actual insights and clear methodology that can be repeated. But papers like these are probably something like less than 1% of everything published. Many papers seem to be almost approaching levels of the Bogdanov thesis's [0].

To relate to the topic: I'm wondering if we'll see a shift in the academic culture and system, where the pressure to publish is lowered (somehow), and where focus is on quality and actually providing insight and reproducibility.

The worst possible thing is that bad science gets influential on false grounds, which needs to be avoided. But increasing the signal-to-noise-ratio is probably not a bad idea either. And I suspect that they will both turn out to be improved by fixing the underlying incentive problems.

[0]: https://en.wikipedia.org/wiki/Bogdanov_affair

It's the same as with programming: many people now have a career in science because that's something that comes with prestige rather than that they are driven by their curiosity and need to understand how stuff works. Once established the treadmill needs to keep turning so you get tons of low value papers and the occasional gem. Publish or perish does nothing to improve the quality of what is published, rather to the contrary.

The bar for publication (like much of our media) used to be far higher. Particularly with the push to get everyone through college, even at the cost of lowering standards, there's a sort of regression to the mean with respect to publication quality. It happens with any difficult, technical field that initially attracts the more capable in society.

Hell, look at what the internet was back when it was still mostly needs with passion.

College students don't produce papers. Published papers and how many students go through college are unrelated.

But you have to go through college to do a PhD so I guess they are correlated.

I really think there's lost opportunity in the typical American master's program. Typically for a master of science you need to perform original research for your thesis, ostensibly to prepare you for the process of publication. But I think a master's could be a far more useful stepping stone between undergraduate academics and PHD or industry research if it were taken as an chance to have graduates reproduce important research. The grade will be based on the merits of the reproduction, which will be an excellent final introduction to research work, while also hugely benefiting the community by verifying previous research. Right now there simply isn't any incentive to reproduce anything but the biggest results.

Worth mentioning that scientific research, even if difficult to reproduce, is still better than, say, superstitious crap. Looking at you, chem trail antivax climate deniers.

> Worth mentioning that scientific research, even if difficult to reproduce, is still better than, say, superstitious crap.

No, it isn't, because once something is labeled as "superstitious crap" it's not going to have any great impact on public policy. But if something is labeled as "scientific research", even if it's of no better quality or reliability than superstitious crap, it does have impact on public policy.

The "reproducibility crisis" is not a crisis because many scientific results are difficult to reproduce. That is to be expected in scientific research, since research is supposed to push the boundaries of what is already known, and that means many results will be sketchy and uncertain.

The "reproducibility crisis" is only a crisis for those people who took sketchy and uncertain scientific results and declared them as "settled science" and used them to drive public policy--and then everyone found out how sketchy and uncertain the results were when they could not be reproduced. So now the public doesn't know when "science" can be trusted, because scientists themselves have squandered its trustworthiness.

Is it though? Some ideas are so obviously nonsense to most of us, whereas bad research can have a more robust veneer of legitimacy - which makes that research a potentially useful tool for bad actors. The internet is full of authoritative sounding tripe backed up by "science".

Bad research doesn't usually hold up. Even good research doesn't always hold up over time. Was Bohr a bastard for getting the atomic model wrong, or just offering his best interpretation given what was known at the time?

> Bad research doesn't usually hold up.

Your faith in the reliability of the scientific process is touching, but naive. Bad research is driving public policy in many areas.

> Even good research doesn't always hold up over time.

In the sense that it gets superseded by better models (as Bohr's original model of the atom did), sure. That's to be expected in a healthy scientific field. But one of the things that's needed to keep a field healthy in that respect is the ability to do controlled experiments with high accuracy. That's how scientists figured out that Bohr's original model of the atom wasn't right--it couldn't match the results of the experiments as they got more accurate.

Very strange you have a new account railing on “deniers” in regards to a topic that even what should be moderate complexity experiments are failing to replicate let alone the amazingly difficult models in climate science.

It’s not helping whatever point you think you are making. Instead... consider the idea that WHAT IF climate models were failing reproducibility left and right along with the other sciences - should that mean a policy for a global tax model should be pushed as hard as it is?

You don’t have to be a “denier” to be skeptical of the course that the “true believers” are taking.

Some of us think the “climate religion” is a bad thing for a good cause. Long term counter productive.

My feeling is that this reproducibility crisis points to deeper stuff just waiting to be collectively understood about how to use math in experimental sciences.

So many researchers rely on assumptions of linear dynamics. So many experiments and studies are designed without consideration of observer effects.

Is it any wonder that a model might fit one day but not the next?

There is so much to be learned from applied topology, dynamical systems theory and (quantum) information theory, but methods from these disciplines are only just barely starting to become more widely accessible.

Making the model more complicated doesn't add external validity. It seems like the problem often comes from trying to make do with collecting a minimum amount of data, collected in the most convenient way. (For example, doing experiments on local college students.)

It seems like a practice of at least collecting the data at two different colleges might help? But this would require more coordination.

Just because alternative methods aren't widespread, it doesn't necessarily mean they are "more complicated", just that rely on different assumptions.

I'm really excited, for example, about "model-free" time series methods emerging from non-linear topological data analysis. Takens' embedding theorem shows that low dimensional "shadows" of high dimensional attractors can be constructed from a time series alone, and that they can reveal deep, useful facts about system a time series is part of, even if that system defies geometric modeling. These kinds of methods are radically different from conventional statistical methods.

see here: https://www.youtube.com/watch?v=NrFdIz-D2yM

Previously, researchers found themselves flailing around with geometry, trying to come up with models that accurately described the dynamical nature of the systems they were studying (e.g. fish populations). A model might work well for a few years, then stop working. These topological methods cut through and get straight to the underlying relationships that drive various observables in a complex system, without relying on fickle geometric assumptions about how exactly those relationships will be expressed.

The whole, complex scientific field of metrology exists in order to make physical measurements anywhere on Earth comparable in a meaningful way. Trying to derive conclusions about the psychological traits of humans by observing n=200 of your local college students certainly seems... rather dubious.

IMHO the simple problems with reproducibility (of statictical studies) can be solved with pre-registration, which should become a standard. The rest really is a system crisis.

Imagine: Scientist A uses (a statistical) method M to assert X. People praise A until scientist B uses method M and cannot assert X (which BTW usually doesn't imply X is wrong). People wait for scientist C to use method M and re-check assertion X. People then take the majority vote (or meta-analysis) for true.

That's the way much of science works today. It becomes a problem only in the following situations:

0. People hunt for statistically significant results.

1. X is mostly irrelevant to anyone who does not belong to the in-group of A, B, C, and peers. So nobody else really notices or cares about X.

2. There is no other way to observe any meaningful consequence of X other than by method M -- which eventually results in #1.

3. Method M is really expensive and complex, it's an almost impossible undertaking so that B or C most likely won't get a funding.

4. Everything is ok, actually, but the people who pay for the study don't care about A's methodological fineprint as long as the results play well with their other goals.

The presented list of "factors that build reproducibility" focuses on #0, for which pre-registration is a simple and clean solution. IMHO the list is much to narrow and focused on academic practice, though.

Maybe because some of the "science" today is more about spreading some political view rather than anything else.

For example, there are a lot of gender studies that are presented as a science in my country which I believe is complete bullshit and an ideology.

People are pushing shit like that today and I believe we need more of the hard sciences and less of the soft sciences.

Another thing I'd like to ask from my fellow colleagues: please at least to some extent detail in your papers the things that you've tried that didn't work. I see this in my field (computer vision / deep learning) from time to time, but very rarely.

There are typically ablation studies which aim to determine how much each of the _successful_ improvements contributes to the result, but there's almost never any mention of things that looked promising on paper but didn't pan out in practice, nor is there any discussion of the reasons why, even though the authors often have a good idea post-facto.

Reproducibility is referred to as a "crisis," but I'd like to know if it's really a new thing. What if we tried to replicate the studies that surrounded great developments such as electromagnetism or thermodynamics. Did those advances emerge from an unbroken series of reproducible studies, or from a tangle of good and bad results?

Before looking for root causes that invariably turn a cynical eye towards the motivations of scientists, let's make sure the effects don't precede the causes.

I really like this idea of having students (not just advanced ones) reproduce classic experiments (ideally before getting the theory, so the class can attempt to puzzle out what's happening). My sense is that early science was very hands-on and readers of such results were also doers. Plus, doing the experiment before theory puts them in the same spot as the scientist, and might give them a little more respect for an achievement that is really quite magical (and too often is taught in a kind of arrogant hindsight that implies the discovery is/was sneeringly obvious to any human that draws breath)

Hmm... 40% of irreproducibility is due to fraud? Who in their right mind would put in all the effort that science requires if they think everyone else's work is commonly malicious bullshit?

Obviously this survey is biased towards people with a certain outlook.

Academics thinking little of other academics is not surprising.

Is there room for a genre of "failed to reproduce" journals? Contrarianism seems like it might be powerful enough to support a journal whose only purpose is to dispute major findings.

The problem is that's it's very easy to fall to reproduce a result.

Maybe even easier than the p-hacking used to produce an irreproducible result.

This is why I want to industry after MIT, science is really real when it manifests as a useful technology that someone will pay for. Also, I didn’t want to spend so much time writing grants...

Is a 30% reproduction rate unusual?

We know 19th-century science was productive. Was it similarly plagued by irreproducibility?

I’m sceptical of this argument. But we need a baseline to render judgement.

As a chemistry PhD, this sure looks a lot like the reproducibility is directly proportional to the complexity of the underlying phenomena being studied.

As somebody only used to CS research, can any explain to me how much the costs of reproducing is considered into unreproducibility?

There is a startup called http://www.myire.com that is working on this issue. They have a platform that provides an A-Z approach to publishing research.

Their marketing is probably the worst in class but the CEO is a passionate developer who has worked hard on this.

Really doubt that this is an issue that can be solved by one startup. It's a procedural and cultural issue that affects the whole scientific world; not a problem that can be solved by business or technology.

From my experience, math/computer-science type research is almost always exactly reproducible.

> math/computer-science type research is almost always exactly reproducible.

Math, including computer science, is almost entirely pure logic, not empirical science (despite the name of the latter), so it's not even in the same epistemological domain where reproducibility is conceptually a concern.

It would be surprising if it weren't. You are rerunning a list of instructions in a common syntax on common equipment shared among everyone. Every instruction and line is documented somewhere by someone.

Imagine if programming languages were a thing produced by nature and no one knew any of the commands, and new commands were found by just testing them out or theorizing that they might exist. You'd be getting the nobel prize for proving ls can take arguments.

Biology is not a clean science. Results are noisy. Results can be different depending on the manufacturers batch of the reagent.

On the other hand we have the government saying they don’t want other countries reproducing our results in AI, cryptography, genetics...

How would the state of social science progress if every academic had all Facebooks data available to them?

Weird how there can be a reproducibility crisis, but also climate change is 100% unquestionable science.

there were a few articles quoting the use of software like git, simple formats, Jupiter notebooks, etc etc

is there anything of the sort that helped a bit ?

very lightly funny to see this when nix and guix (and pure fp) poping at the same time in a very different context.

I wish more people would map their understanding of the reproducibility crisis to things like say, the sokal squared hoax. The current pressure to publish regularly in academia is very high, and it results in poor work and journals that don't do their due diligence, in every field.

Is there an incentive to make certain experiments harder to reproduce.

One explanation that I've heard (often in terms of global south v. global north) is to prevent more well-funded labs to quickly mirror your setup/method and, given their higher means/staff, deplete that line of research leaving your lab with no publications, patents, etc. and therefore no money. Science as a whole may benefit, but individual scientists still usually appreciate keeping their jobs.

Big labs don't work any faster than small labs, they just have more people to chase more projects; generally it's the same amount of people per project as a small lab. Chances are if you got scooped it was just because someone got started with the idea before you did. After all, the future directions at the end of the paper are usually active research by the time you are drafting and submitting for review.

Getting 'scooped' isn't so bad in biology at least. If you were thinking about XYZ and someone publishes X and Z, great. Cite that paper, now you don't have to bother so much validating X and Z and can focus on strengthening your Y argument or adding argument W to your paper.

> Big labs don't work any faster than small labs

Not true, at least in my field. If $BigLab can throw 100k CPU cores at a simulation it is going to finish way faster than $SmallLab 1k simulation. If $BigLab builds 10 dedicated test rigs they are going to finish that parametric study way waster than $SmallLab with 1 shared test rig.

Getting scooped may not a big problem for papers (except when you start getting rejections because lack of novelty), but it is a big problem for patents, licenses, royalties and such things that allow small labs to survive when their governments have little to no money to spend.

What you describe has a perfect analogy in how established actors squeeze out smaller actors in a race to the bottom in other areas of the economy. So it's certainly great that we don't have that in science, yet..

Not really in biology at least. If you invent a fancy new way to measure gene expression and use it in your paper, reviewers are going to ask for that result compared to the existing gold standard experiments to test gene expression. If there are no existing gold standards, you will need to support your claim with more experiments that lead to the same interpretation in different ways.

Reviewers are there to make sure you dotted your i's and crossed your t's, and will burn you if they can't easily determine that while skimming your paper over lunch break.

Note a spike in Earth science at 30% confidence. Probably reflects a lot of vested interest funding politically beneficial results in this discipline.

This is from 2016.

Can we get one of the moderators to add (2016) to the headline?

Done, thanks.

I guess it's time to evaluate and grade scientific findings?

It would be interesting to see how reproducibility varies by field, university, country, etc. Although I guess scientific reputation would already give a clue over the quality of scientific work?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact