Scientists are generally fairly smart people. Put smart people in a hypercompetitive environment and they will quickly identify the traits that are being rewarded and optimize for those. What does academia reward? Large numbers of publications with lots of citations, nothing else matters. So, naturally, scientists flock to hip buzzword-filled topics and churn out papers as quickly as they can. Not only is there absolutely no reward for easy reproducibility, it can actually harm you if someone manages to copy you quickly enough and then beat you to the next incremental publishable discovery in the neverending race. This is absurd and counterproductive to science in general, but such is the state of the academic job market.
Anyone who purports to somehow address the situation without substantially changing the fundamental predicates for hiring and funding in the sciences is just wasting their breath. As long as the incentives are what they are, reproducibility will be a token most scientists pay tribute to in name only.
But what really gets me is the disconnect between "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible". This was mentioned in the survey and it conforms to my informal experience of attitudes.
I do not know how you square that circle. Maybe, because of the pressures you mention, we are all supposed to engage in an informal convention of pretending to believe most previously published work is true, was done correctly, and is reproducible even if we know damn well how unlikely that is. I find it hard to do.
One day the public is going to cotton on to all of this. I cringe every time I hear extremely authoritative lectures on "what scientists say" about highly politicized public policy matters. These are not my fields, but if they are as prone to error, bias, irreproducibility, etc, as my own, I'd exercise some humility. It is one thing for we scientists to lie -- errr, project unwarranted confidence -- to each other for the sake of getting grants, but it is quite another to do it directly to the public.
But when the public does figure it out, what do you think will happen to funding? It will get tighter, and make the problem even worse. We need reform from within before the hammer falls, and quickly.
That being said, communicating limits of applicability and degrees of certainty to a popular audience is hellishly difficult even if you're trying to be perfectly honest. Even in a hypothetical world where we've somehow fixed academia, this will always be a hard problem most scientists are ill equipped to tackle.
1. It would help fund / ensure funding to several labs in the space.
2. It would help ensure reproducibility of results
Yes, it can be gamed, but generally it should be easier to reproduce results so 10%-20% of the funding to follow directions should be okay. Of course, this could lead to one group just constantly doctoring results to show something is not reproducible. In which case, a third lab would need to get some funding to check it out.
By "reform from within", I mainly meant the NIH, NSF, big funders, etc, need to pay more than lip service to this before Congress gets involved. Although there are people who have built careers on irreproducibility itself. Ioannidis for example. But that requires a lot of dedication and tends to piss a lot of people off.
Another counterintuitive possibility is that scientific publication could go more of the PLoS route and actually lower standards for initial publication -- and then rely more heavily on post-publication peer review. Then irreproducible papers would get publicly "downvoted". And conversely, papers that didn't seem useful enough at the time to make it into Nature, but turned out to be pathbreaking, would get the recognition they deserve.
Further incentives for publishing as much raw data as possible and, where applicable, code, would help too. The NIH has done a good job here. They require a lot of high-throughput datasets to be made available raw if they were collected with NIH funding. They provide hosting for this. It means people can go back and re-analyze it, and you don't have to trust the authors' analyses.
The disconnect between "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible" has extended to every domain in my experience.
With the way the funding model currently works, my hypothesis is that doing more, completely reproducible studies will drown out the non-reproducible studies (and funding model) with reproducible studies (funders ultimately vote with their checkbooks).
That seems like a second logical disconnect and completely impossible if the cost model remained the same. However, the question is do studies operate in a way that is maximally efficient for the scientists or for the economy of science publication and knowledge dissemination. We, at MyIRE, believe that technology can tip the scales in the favor of scientists.
Read more about our platform and business plan if you're interested:
I don't understand why these seem contradictory? To me the obvious (maybe naive?) reading is that scientist believe reproducibility is essential to the advancement of their fields, but also believe that being unable to reproduce a result can often be explained by their lack of sufficient information (or resources) to reproduce said results, rather than always assuming the other person must be lying (or just pretty darn lucky). I'm guessing this must've happened because they have actually come across such situations and observed many results that are indeed correct, but just difficult to reproduce. Why can't this explanation square that circle?
In most fields that is fine; social proof is about as good as anything else if someone is studying literature. But the hard sciences are specifically about studying things that are best proven using technical over social proofs.
If social proof is being used excessively then there are grave questions about what the standard of technical proof is. We don't really know from the outside looking in, but the problem being highlighted is that there is technical evidence of a problem and scientists are proceeding based on social proof that there is not. Not a good sign, but also difficult to evaluate from outside the field.
However the incentive structures built around academics do look risky. Most papers and citations isn't a good incentive for high quality work. It incentivises low quality work and collusion.
You take it on trust that they did these things correctly, and focus on whether their conclusions are justified from their data.
If reviewers take so much on trust, how much more so readers, then? There are very, very few actual standards of technical proof that are in play here.
Particularly when a paper says something that isn't particularly novel. If the results match "prior probabilities" from the literature, the paper will be believed without much question. If it doesn't, it will get more scrutiny. People quickly learn that it is easier to publish when your result fits the status quo.
Take a look at slide 10 of . There are numerous examples like this where physical constants were first measured as a value, and gradually trended up over a long period of time towards today's "true" value. If the experiments were really independent, they would, generally, scatter randomly before converging on the true value. The fact that they did not, suggests investigators were using methods to "smooth" the difference between their data and prior findings.
And thus, we get self-perpetuating cycles of groupthink. And that, in turn, is why supposedly independent experiments cannot so easily be taken as independent points of evidence for an overall hypothesis.
The maximally pessimistic view, which you and Feymnan seem to be espousing, is that people explicitly put "put their thumb on the scale" so that they get the right number. That's clearly bad.
The PDF you linked presents it as a more emergent phenomenon, driven by how people usually work. It's possibly an argument for working more slowly and carefully, or the use of pre-registration, but it seems ethically neutral.
Finally, you could think about this as a form of Bayesian updating, which each experiment nudges our previous best estimate of the value. Obviously, it would be better to do this formally, but it does seem more rational than completely discarding the past.
The reason that I don't think it's totally ethically neutral is that it is a basic responsibility of scientists to be on guard against cognitive biases to the best of their ability. It's possibly even the main feature that separates science from non-science.
Cognitive biases can become ethically bad particularly when they intersect with a person's personal interests. For example, if an investigator thinks "I won't be able to publish this result as easily if it diverges too much from the historical values, so I'll just run this experiment again", this is a problem. Even if it occurs totally subconsciously, it is a breach of duty because the scientist should take great care to avoid this kind of thing.
It could be viewed as Bayesian updating, yes. But my main point is that it greatly complicates the process of literature review and knowing how much certainty to assign to a scientific finding. If there are 10 papers saying X, but each one is highly dependent on the last, there is much less evidence for X than there appears to be, particularly to an outsider looking in.
Is it trust? Or is it closer to "I'll scratch your back today, if you scratch my back tomorrow."? That is, if you blow the whistle (so to speak) on someone that's bad community karma, and that will come bak to haunt you.
That's not trust. That's a cartel.
Specifically, you need to trust that when the authors claim to have reared mice in a high-oxygen environment, trained a monkey to move a joystick, or whatever the paper says, they did something like that. You can—-and should—-ask to see data demonstrating that they did it well, like oxygen levels in the mouse cage or trajectories produced by the monkey. However, unless those values are bizarre, it’s virtually impossible to know if they’re real or completely made up. Realistically, no one is going to fly you out so you can “audit” an experiment or record and review thousands of hours of surveillance footage.
Put another way, the starting hypothesis sound be: this study is flawed. The reviewer should then approach it as such.
Not only isn't it trust. It's a violation of the scientific method. Yeah, sadly ironic (read: hypocritical).
At the same time, peer review is meant to be an advisory process, not an adversarial one. You should certainly flag things that seem problematic, and ideally offer solutions, but you’re not expected to—-and cannot, really—-tear down the experiment and root out every possible mistake or malfeasance. All you’ve got is a day or so and a 6000 word description of the project, so if the manuscript claims that the mouse weighed 12.3 grams, you’ve basically got to take that number at face value. Trust might be too strong of a word—-you could certainly question a weight of 123 grams, which is improbably large—-but you should at least start in equipoise with respect to some stuff.
I also don’t see the mutual benefit, beyond nebulous things like enjoying consensus. A short review is also easier to write than a long one, but it’s just as easy to be dismissive as uncritical. Reviews are usually unsigned—-and the authors and reviewers may not even know each other so it’s hard for there to be an explicit quid pro quo.
 Of the stuff I reviewed in 2019, I knew exactly one of the authors, and here ‘knew’ means ‘Once amiably chatted with them in a coffee line at a conference’.
The model verification would be years of work - so everyone just “trusts” it’s right, although we basically know it’s not because the chance it isn’t is just much more likely.
It doesn’t have to be a quid pro quo any more than it has to be “ugh, that sounds really boring to confirm, I’ll just believe it”.
There is some truth to the idea that some reviewers are lazy and don't bother to examine the paper as well as they should because that takes time. But when they do that, it irritates the editor, who is trying to make an informed decision on the paper, so they get bad karma for that.
If you want to be cynical and look for community politics, you should be directing your wrath at study sections / grant review panels. That is a totally different ball game and there is a fair amount of corruption there. Peer review may be imperfect and prone to some cognitive biases, but it is not generally corrupt.
Most certainly, when you go to replicate something, and fail, your first assumption is that you did something wrong. Even the second or third time. But after enough tries, you start to look for another explanation. "Irreproducibility", as Nature and the common scientist would use the term, doesn't mean "I tried once and failed", it means "I tried everything I could possibly think of and it still doesn't work".
Also, for many types of experiment it is questionable whether they are reproducible even in principle. One of the common analyses I do is called RNA-seq. It quantifies, for each of ~25K genes, whether and how much it went up or down with a given perturbation.
So the result of such an experiment is a 25K long list with numbers attached to each gene. What would it mean to replicate such a result? Surely the list of significant genes and pathways will not be identical even if a robot were to perform exactly the same experiment, due to biological variability.
Some people find this property of nonfalsifiability to be convenient, however...
I’m not a scientist, but it is surprising to me that the term would not be regarded as at least ambiguous. The article quotes a microbiologist who says “there is no consensus on what reproducibility is or should be.”
If I was an academic, I imagine I would think:
1. Well, most of the papers I read are widely cited, reputable papers in widely read, reputable journals. A paper with 50 citations couldn't possibly be unreproducable!
2. Everyone knows there are scam journals and conferences that will accept everything. I'm sure it'd be easy to get a bad work accepted there, but I don't read anything like that.
3. And anyway, aren't the problems mostly in other fields? Everyone knows there are problems in comparative international underwater sports broadcasting studies, not serious subjects like mine.
> the disconnect between "most scientists agree there is a reproducibility crisis" and "most scientists believe most of the papers they read are basically true and reproducible"
Could it be that many scientists think: "there is a crisis, but not in my field!"? To be honest, I personally think this quite often (and then realize I am being naive).
It sounds very much like an academic equivalent of Gell-Mann Amnesia - the phenomenon in which you notice that every news article on a topic in your field is complete garbage, people you know in other professions report the same situation with articles on topics in their fields, and yet when you turn the page and see an article on something outside your field, you forget about the whole thing and treat it as a gospel.
if even 'scientist' themselves fail, then society is heading toward an absurdly frightening faux-intellectual inquisitive period.
Fact checking, unfortunately, isn’t what we think it is. Despite the superficial appearance, fact checking isn’t a helpful tool for determining the truth and for forming an accurate opinion. Instead, it’s actually an in/out group filter which segregates people by belief and value, while allowing each group to believe they hold the Factual High-Ground, and to claim any subsequent moral position which proceeds from being “factually correct.
Sure it is. Not the bare conclusions viewed uncritically, but the support with references, if it exists (which in most notable fact checkers it does) absolutely is.
I apply this perspective to everything I see ("how are they getting money from this?") then work backwards and it seems to lead to accurate predictions, at least anecdotally. Perhaps I'm just jaded and cynical but it works well, unfortunately.
It works well for me too.
I've learned to extend the "how are they getting money from this?" question into a generalized, first-principles thinking - look at the incentive structures and think about what they imply, about what's the expected behavior of a system running under those incentives. Because while not everything is about the money, systems will evolve over time along the lines of the incentives contained in them.
The speculation referred to as the "Gell-Mann Amnesia effect" starts with a subject identifying a low-quality article riddled with errors from a section of the newspaper within their area of expertise. They then turn the page to another section of the newspaper outside their area of expertise and "treat it as gospel" without thinking critically.
Maybe it's true, maybe it's not. But you are adding another part-- colleagues who use relevant expertise to inform the subject that the other sections of the newspaper are also low-quality and riddled with errors.
With that addendum I'm confident that I can now beat the house. I will take the bet against this modified Gell-Mann Amnesia effect for any amount the casino is willing to let me wager.
I mean, the flat-earth, anti-vax, climate-denier folks are well and cottoned. In the US, one of the major political parties what controls a fair amount of federal and state legislature seats is pretty sure 'scientists' are just liars and should be treated as such.
These chickens aren't going to come home to roost, they are roosting and their chicks are hatching already.
You shouldn’t be cringing. You should be educating people about how science actually works, and how it simply doesn’t matter very much whether any particular paper is reproducible. It’s a straw-man argument, because most papers aren’t worth reproducing. I don’t know many good scientists who take what they read (in any journal) at face value. If you do, you’ve been mislead somewhere during your training (fwiw, I also have a PhD). At best, papers are sources of ideas. Interesting ideas get tested again. Most get ignored. Even if a few hokum theories become popular for a while, eventually they’re revealed for what they are.
The tiny percentage of subjects that rise to the level of public policy discussion end up being so extensively investigated that reproduction of results is essentially guaranteed. And yeah, you hear lots of silly noise from university PR departments, but that stuff is a flash in the pan.
For example, nobody legitimate is in doubt of the broader facts of global climate change or evolution or vaccination, even if 95% of (say) social science reaults turn out to be complete bunk. Yet climate deniers, anti-vaxxers and “intelligent design” trolls absolutely love it when this distinction is ignored, because it allows them to confuse the public on the legitimacy of science as a process.
Besides, the fact that there isn't one reputable journal in most fields that remains untarnished by the replication crisis is both a practical problem and a problem of public trust. A lot of this BS science is paid for directly by the public's tax dollars, or else by their student loans. I wouldn't expect the public to be so forgiving if 95% of it is bunk.
Is this true? Prove your claim.
”This is a real problem, and drags the quality of everything down.“
Let’s assume your first assertion is true. Is it automatically true that your second claim follows? Why?
I see no evidence that the individual productivity of scientists has changed much in the last 30 years, nor do I notice much of a change in the aggregate quality of science. Crappy science existed hundreds of years ago, and it continues to exist today. The main difference, as far as I can tell, is that we have a lot more scientists now.
In any case, these are just assertions, not arguments.
You should look for papers in psychology/sociology, AI (recommendation engines, accuracy), economics, nutrition and medicine. Marketing papers are also interesting, I guess.
As an anecdote, I dug into sexuality, gender papers recently and was baffled at the amount of shit I came across. I couldn't believe someone published it.
> Crappy science existed hundreds of years ago, and it continues to exist today. The main difference, as far as I can tell, is that we have a lot more scientists now
And ability to influence a lot more people faster than ever before with increasing level of distrust.
Not to mention, papers from US and EU affects other nations perhaps more. Many people blindly piggyback due to not enough funding to replicate or perform our own analysis. Funding being scarce promotes hype and flock people towards whatever media popularizes.
It results in a very weird disconnection on topics that are dependent on population, history, culture, and other location sensitive data. The base is contaminated, anything built on it is not going to suddenly turn into truth.
You have a hypothesis of what’s going on, but you’ve provided no evidence for that hypothesis, and when challenged to provide some, you tell other people it’s their job to do it for you.
It’s not my job to prove your extraordinary claims.
Regardless, my original claim was that the absolute number of papers is growing, and most of them are trash. I think the sheer volume of trash has consequences that were not so serious 100 years ago, even if the percentage of trash was the same. I strongly suspect the percentage of trash has been going up, as well.
My argument is that "publish or perish" makes less and less sense the more active scientists and researchers there are, even if the average quality and the rate of publication per academic were constant, because the appetite and rate at which research can be assimilated by society is limited, and does not scale with population, while the number of scientists does.
I don't think these claims are extraordinary, and if you do, I'm not going to go looking for extraordinary evidence to try to convince you. I don't think I'm the only that sees these effects, however.
Yeah, that’s not evidence.
The absolute number of papers is growing - along with the number of working scientists. There’s been huge growth in academic science since the 1970s.
It definitely is, in biology at least. My graduate mentor was really interested in publication metrics (as in, he published studies on them). The main driver is not necessarily crappy journals though, it is the increasing number of authors per paper.
I have no idea how you would evaluate something like "the average quality of papers is decreasing". I actually agree with GP that it is, but that's just, like, my opinion, man.
A priori, long author lists indicate collaboration, which is generally a good thing.
If we are talking about highly abstract types of science, I agree entirely. The problem is that there are strong incentives for groupthink, even where public politics aren't involved.
For example, I'm involved in the aging field. One of the popular aging hypotheses was oxidative stress. Because of the number of scientists with careers invested in that hypothesis, research kept on for well over a decade after it was debunked. In fact, I work with a person who cowrote the paper that authoritatively debunked it over a decade ago, and that person still studies it and writes as if it were still true!
How much more so if the subject is politicized beyond a narrow community of scientists, then. I do not want to get into a political debate here, but evolution and vaccinations have over a century of scrutiny, whereas other fields do not.
Another example relatively close to my area is nutrition. Scientists have been totally convinced that fat is bad, sugar is bad, both are bad, all things are good in moderation... Even the public considers nutrition to be a joke for this reason. It is not enough for a community of scientists to agree on something, a fact needs time to "settle".
I agree totally with the "process vs individual paper" distinction, I would just propose the heuristic that "the more politicized the subject, the longer the process takes".
> I don’t know many good scientists who take what they read (in any journal) at face value
Sure, in journal club people tear apart papers. And maybe in private conversation. But then, on the other hand, look at the statistics in this survey. Or look at the way this same papers that they might privately pooh-pooh will be uncritically cited in a grant application or paper if it supports their hypothesis.
I really don't think this is true.
Ideas are war in politics, and so the truth generally is the first casualty.
Golly I wish bureaucrats would pay more attention to the nuances of science.
The root problem is not that science has changed, but that you’re seeing political attacks on science.
Instead we get a PR campaign to put an self-described unstable 16yo who can “literally see invisible CO2” on the world stage frowning and yelling about stolen childhood.
I have no hopes for political reforms to actually look at nuance while this nonsense seems to work.
Turns out that you want claims as broad as possible and teachings as useless as possible -- so that nobody can read your patent and then patent other surrounding things.
It's gotten so bad that most companies now instruct their employees never to read any patents -- the potential liability increase because they "knew" that something potentially-patented was out there so fantastically dwarfs what you could learn by reading the "teachings"
It's all about incentives, as you say. And science today has more or less just as perverse incentives as the IP marketplace.
Until the fundamentals change - which might never happen - it would help if the science-centric communities showed some humility, as well as some transparency. Even with proper incentives, the process is flaw. It's human-based so it will always be flawed. There's nothing wrong with that. It's the best we got. But perfection it is not.
It troubles me when I see those who question science shut down and marginalizes. As if science has some perfect track record. Per the OP, even science has question about science. Fair enough. But does it have to be wrapped in denial?
A lack of reproducibility of cited papers IMO undermines the credibility of citing papers.
That credibility is already undermined by the fact that for citation-measuring purposes, there's no real difference between any of the following citations:
(1) "Introduction. This paper is a journey into the amazing consequences made possible by (Smith et al, 2019)"
(2) "Introduction. In this paper we show that (Smith et al, 2019) is a steaming pile of crap"
(3) "Footnote. This minor remark is vaguely reminiscent of (Smith et al, 2019)"
All contribute the same citation juice to Smith et al, 2019.
For some other opinions,
1) Paul Graham who states this is the biggesg lesson to unlearn in academia. He states academia selects for people who tend to 'hack,' not learn, to the easiest grades/favors: https://news.ycombinator.com/item?id=21729619
2) Feynman who often stated things like "the pleasure is in finding things out" and that "the whole academic department was rotten."
I'm not convinced academia is selecting primarily the attribute: intelligence.
It's also not enough to say "this study is not reproducible". Why isn't it? Towards what direction should the experimenters move next. It's not enough for biology to do "trial and error" studies , but they should be continued and provide more depth to their findings. Sadly , this is not happening largely because of lack of faith: the scientists themselves don't have enough faith in their own results to be making long bets on them.
Or you select for hypercompetitive people who don't mind bending the rules. Smart people also have options outside of college.
> Put smart people in a hypercompetitive environment and they will quickly identify the traits that are being rewarded and optimize for those.
This is true for anybody, not just "smart" people. It's basically human nature.
1. Journals should accept hypotheses/procedures, and commit to publish whatever conclusion results, before the experiment is even started. If the experiment is not completed for whatever reason, a transparency document should be published explaining why.
2. Journals should accept only a small percentage of new research. The rest should be attempts to reproduce old research, again only evaluating the hypotheses/procedures, before the attempt to reproduce has started.
The challenge here is that there's a bit of a chicken and egg problem: journals won't want to commit to publish a result if no one has committed to fund the experiment, but funding sources won't want to fund experiments if the no one has committed to publish the result. So there would need to be some collaboration between
 Choosing this percentage is the proper usage of P values. The goal is to attempt to reproduce experiments until the product of the P-values of the results reaches a target aggregate P. Note that this target applies to P and P'.
Example 1: Your target P is P=0.01. You perform an experiment, get a P=0.3 result. Then you attempt to reproduce, and get a P=0.4 result, for an aggregate P of 0.12. You then attempt to reproduce again, and get a P=0.2 result, for an aggregate P of 0.024. Finally, you attempt to reproduce again, and get a P=0.4 result, for an aggregate P of 0.0096, below your target P. This proves the alternate hypothesis with the target confidence.
Example 2: Your target P is P=0.01. You perform an experiment, get a P=0.7 result (P'=0.3). You then attempt to reproduce, with a P=0.9 result, for an aggregate P' of P'=(0.3 x 0.1)=0.03. You then attempt to reproduce, with a P=0.9 result, for an aggregate P' of P'=(0.03 x 0.1)=0.003, below your target P. This proves the null hypothesis with the target confidence.
The example P values were chosen for a few reasons. First, it demonstrates that you can find fairly conclusive confidence values eventually from aggregating experiments with fairly inconclusive results. Second, it demonstrates that the P=0.05 that's frequently used now is actually a very low bar of confidence, when you consider that reproducing even very unsurprising results a few times gives you a much higher confidence.
I don't understand this. Scientists could make a lot more money in industry, so this makes me wonder why these "corner-cutting" scientists are doing the work they are doing. Surely not for idealistic reasons.
Scientists generally like doing science. Nowadays, the higher you get in science the more you get buried under things that are not science - endless project proposals, reports, reviews, applications and yes, publications. Most scientists cut corners as much as they can get away with on most of these because if you don't, you will never get to actually doing science (others just work 100 hour weeks, but you can't really sustain that).
Now unlike all of the bureaucratic garbage publications are supposed to be about communicating your work to your peers, and integral part of science, but in the existing system it's really hard to actually retain that view. Why? Because publications are now intrinsically linked to all the metrics - you need X publications for project Y, containing the right set of buzzwords, submitted to journals that satisfy the right quantitative metrics and are listed in the right databases, with the right set of coauthors, references and acknowledgments and you need to submit them before the right deadline, or else. Spend a decade or two in that environment and it's easy to lump writing publications along with the endless proposals and reports - as mindless drudgery you have to dispense with in order to squeeze out some time for actual science.
Of course, you also need to devote enough attention to all the things I listed and much more in order to actually retain your ability to do science and not starve for it, so it's an eternal balancing act that breeds a lot of resentment.
That may be true, as well as accurate, for a field like machine learning. But for most of science, this is a highly misleading claim. Some science (e.g. drug development, materials science) can be done also in industry, although with a lot less freedom than in academia. But most science simply doesn't have an equivalent option in industry. Scientists are doing the work they are doing largely because they are interested researching specific questions, which can not be pursued anywhere else.
How is this even possible? Isn't Reproducibility one the pillars of Science? What gives such a paper credibility over say, an eyewitness description of a UFO sighting?
It's one thing to not perform the actual reproduction experiments, but publishing claims that (almost) can't be reproduced is another.
Then elect better politicians that actually understand at least a little bit of science and how that correlates to bettering society so they're get bills past to fund higher education and research. We've got the money for it, but it's all in bombs at the moment.
Lack of intensives for reproducible research (and for verification of research) is the fundamental problem.
If you are in the shoes of policy makers (eg politicians).
What changes would you propose (through both laws and funding/grants models)?
But this means your measurement pipeline needs to understand more of the paper than just the bibliography section.
There needs to be less emphasis on quantitative metrics like citation counts, h-indices and other trivially gameable nonsense, and more emphasis on human judgment along with a set of basic criteria to provide a floor to research quality across the board (things that every good scientist should do, as opposed to things that you should relentlessly maximize). This would require the non-scientific managerial class to abrogate some of the power they now wield in the academia and relevant funding institutions, therefore is very unlikely to happen.
And - probably, most annoyingly to the HN crowd - there is nothing meaningful that technological or business disruption can accomplish here. These are systemic social problems that have been continuously and systemically building up since (approximately) WW2. At this point the only entities large enough to make a difference are probably the NIH and NSF in the U.S. and the relevant EU funding agencies in Europe. In my judgment, these are precisely the agencies least likely to institute such change, so here we are.
So I would say that fundamentally the problem is with study sections and how they think. They are incredibly conservative, but in all the wrong ways. They highly value institution of origin, preliminary data from pilots, no matter how shitty, and the sort of hypothesis that seems reasonable from prior literature. There are also a lot of political games on study sections.
This means that if you repeat a lie often enough in the literature, and you know people, then voila, you now have support for the same hypothesis in future grants.
Peer review is not working for funding. I would say the NIH (in my case) needs to hire completely independent full-time reviewers, and needs to place a real, high emphasis on reproducibility. It is in the NIH's interest to do so, because as I said, its funding will be cut if this continues.
There is a bit of a difference between working directly for a university and working for a research nonprofit, like Dana Farber or MD Anderson or whatever.
If you work for a university, and you don't pull in a lot of money, what will happen is that you will be expected to teach more hours to "make up for it". You are less likely to be fired. At a nonprofit, if you don't cover your own salary and expenses through grants, you will quickly become unemployed.
I m really not sure about that as the publishing bubble coincides with the rise in funding by public bodies . Funding safe projects is a way to disperse funds to as many people as possible, and that's not necessarily good. At least in EU it feels that way. The politics of funding seem to naturally have gravitated to the current situation. I doubt the model of funding can change with the same stakeholders.
My radical proposal is to separate the research institutions from the universities -- especially the public universities. If you want public research institutes, that's fine -- but they should be their own entities.
The university system, with its mix of research and instruction, and its tenure-vs-non-tenure system creates terrible incentives.
Top researchers are rarely also the best instructors -- and so it makes little sense for the two occupations to be so intimately tied.
The university funding model is terribly destructive to top science labs as well -- the university sees them as a source of prestige, sure, but more importantly as a source of that all-too-critical grant money (entirely too much of which is bogarted by the rest of the university) -- putting many of the labs in this place where they have to keep securing grants at an ever-increasing pace no matter what, to keep the rest of the university flush with cash to spend elsewhere.
There's really no good reason why research needs to be done at a university, or why a research institution needs a body of lecturers and students.
My wife did a PhD at an elite private university, undergrad tuition north of $50k/year not including room and board and other expenses. At one point three senior professors in her department went to the department chair and the dean with a proposal to revamp a notoriously unpopular intro class in their department. Rather than having it taught by adjuncts or postdocs, the three of them would team teach it. They would change the syllabus to bring it up to date with the current methods in the field--it had hardly been touched in two decades.
They were told to forget about it. The University said it was a waste of time for them to teach undergrads.
Presumably the kids applying to these types of universities are not aware of how little concern the institution has for the quality of their education (the perceived quality, on the other hand, is extremely important--but only loosely connected to the actual quality).
It turns out that prestige has very little to do with undergraduate education quality.
This used to make lots of sense -- for centuries, college was more important as a place to meet other people as it was a place to actually learn useful skills -- and so you were primarily trying to go to the highest-class place you could get yourself in to.
In this day and age, though, I don't think that's the right way to approach choosing a school.
The gap between elites and non-elites is growing, and as a result, the value of breaking into the social circles of the wealthy and "legacy" students who dominate elite schools is also growing.
Meanwhile, high-quality educational content is becoming more widely available - often published for free by those same elite schools - so the value of the education itself is approaching zero. (For some programs, notably elite MBA programs, the value of the education is basically zero already and has been for some time.)
If a course is taught by a TA or adjunct who barely speaks the language in which it's conducted, rather than by the distinguished professors whose names are on the department website, that's an annoyance, but all it costs you is some time on Khan Academy or Coursera to learn the material on your own. Your well-connected classmate who introduces you to a hiring manager or angel investor will have a much larger impact on your life.
You could of course lower compensation for academics to incentivize other professionals to support research but you'll find academics fleeing to other opportunities in many cases.
Accreditation could be performed for research institutions just as it is for university departments today -- PhD candidates could work in research institutions just as well as they do in university departments. The few classes they take are all within the department that they're researching in anyway, and are also all separate from classes that undergrads take -- so moving that education out of the university and in to the research institute seems fine too.
That's not to say that instituting my plan would be easy. I did call it 'radical', after all
There are millions of dollars on the line for researchers involved. It is the difference between a well paid career and a life of destitution while being a slave to huge student debt.
Professorship and research grants should be given out based on criteria that are only incidental to research results.
Evaluate profs and grants based on:
1. Domain knowledge (test the applicants)
2. Math skills (test the applicants, makes sure they know about how to avoid p-hacking, preregistration etc.)
3. Motivation and leadership.
4. Prior and current research _proposals_ (but without looking at the results or whether they have been published).
5. Other skills such as communication, interpersonal skills, outstanding achievements etc.
Universities should not rely on journals to evaluate their professors. This corrupts the whole system. Journals have different goals. They want to publish well done research with interesting results. Universities should hire researchers that do good quality research with interesting _questions_ regardless of the results.
If universities keep giving out jobs based on having generated interesting publishable results, they are going to keep getting researchers that ignore biases and and ignore bad science practices in order to generate publishable (but unreliable and often false) results.
One page (https://www.kn-x.com/static/PWFeb17forum.pdf), two page (https://ssrn.com/abstract=2835131), and three page (https://ssrn.com/abstract=2713998) versions of this are available. Further details are provided at http://kn-x.com.
I doubt anyone on this thread is going to like this solution ... but I think most on this thread agree that the underlying problem is the incentives, that fixing the problem requires changing the incentives, and that changing the incentives within the existing system is very hard.
(i) Reproducibility is not a criterion for tenure evaluation, therefore it bears little relevance for career advancement for young scientists. This is an overworked and underpaid group in academia. (In the bay area, they earn 1/3--1/2 the salary of an entry level software engineer but probably work 50% more hours.) They simply don't have the luxury to take time off to reproduce someone else's work.
(ii) No major funding agency would be willing to support the kind of work that reproduces published studies. When something is published, it is considered "done" and not worth spending more money on it.
There needs to be a sea change. But sadly, the academia is at best paying lip services to the structural problems in the "reproducibility crisis". I'd expect it to continue like this for most fields.
It's the good scientist who produces economic value but cannot capture it that we have to support but maybe that person cannot easily be found. The state just has to overfund and accept that some large percent goes to nonsense.
In my mind, that's okay, though, since the guys who can capture economic value can do a lot with what we have.
It sounds harsh, but severe self-regulation is what prevents the catastrophic failure of any institution that is really only answerable to itself. Professors aren't doing any favors by rubberstamping weak papers, or giving weak students PhD's (or BS's for that matter).
EDIT: down-votes and no replies, as expected. Hey academics, sometimes to save the patient you have to amputate the limb. I care about science and want it to survive, so you guys continuing to be nice to each other even when you publish BS isn't serving that purpose.
It's probably worth punishing people that consistently publish UN-reproducible work, but it shouldn't a rare career destroying event like you are suggesting.
I disagree with the first part and agree with the second. I would argue that failure to reproduce is perhaps the worst kind of scientific failure, because the activity cannot be called science anymore. And no, I don't mean anything personal when I say 'failure', you don't have to be malicious to do bad science -- after all 'doing bad science' is the default human condition.
An analogy with classical music: A musician that fails to reproduce the score with his violin consistently, will lose his job quickly. It doesn't mean he's a bad person, but it does mean he's not good enough for the orchestra. The metaphor for the "reproducibility crisis" in an orchestra is what happens if they let bad players stay, the orchestra sounds worse and worse, and finally the audience stops coming. In the apocalyptic scenario, cultural forces cause ALL orchestras to stop honestly evaluating the skill of their players, and all orchestras simultaneously lose their ability to accurately play any but the simplest music.
Standards are painful for those that can't meet them, but they make the world better, overall.
Also, sometimes strange things just happen randomly (like the CERN faster-than-light neutrinos) and there's no shame in publishing results that aren't correct as long as you're honest about it. Some early-stage, low-power studies are going to randomly show results that don't reproduce, and that's also fine. Suppressing these results would hurt progress just as much as making the opposite error.
This is only true if the results of research are deterministic. In many fields, if a study is estimating an effect that exists in the world as a distribution, a failure to reproduce might not be indicative of...anything.
The real problem is the other guy who didn't pre-register the same hypothesis that he tested. So we don't know that I got it by chance until the repro failed.
If Github goes bust in 20 years you'll find a significant chunk of current work is non-reproducible despite author's best efforts.
> Professors aren't doing any favors by rubberstamping weak papers, or giving weak students PhD's
Agreed. However, I think this is a sort of tragedy of the commons, in which "good" behavior of some professors will not stop the onslaught of "bad" behavior of others. Certainly the good ones will be outnumbered in terms of graduates and since university performance reviews rely mostly on sheer numbers (students, papers..), there is an argument to be made that universities self-select for the rubber-stamp professor over the honest ones.
I feel a fair solution is to re-imagine the universities' performance review system, although I have a hard time coming up with concrete and good measures of how to select the good from the bad professors.
I think it should be similar to how journalists are treated after they are involved in a major scandal (usually reporting something speculative as fact). Personally, I don't think there's any excuse for unreproducible results. They are worse than silence, they are lies that spread other lies.
So what happens to "the journalist" when something speculative is reported in a newspaper piece that is written by 7 people five years ago? In addition, five of the seven writers are now working different jobs in the private sector, no longer writing in any newspaper.
Case in point: Michael Riley who wrote Bloomberg's thoroughly debunked story on SuperMicro "The Big Hack", has now been promoted to manage all of the their cybersecurity coverage.
Of course, sometimes it can probably be blamed on the reader (me) for not having enough insight, but many times the papers are just crap.
My naive imagination of academic papers before university was that they were like those I now read in the __top__ journals / conferences in each field: concise understandable writing with actual insights and clear methodology that can be repeated. But papers like these are probably something like less than 1% of everything published. Many papers seem to be almost approaching levels of the Bogdanov thesis's .
To relate to the topic: I'm wondering if we'll see a shift in the academic culture and system, where the pressure to publish is lowered (somehow), and where focus is on quality and actually providing insight and reproducibility.
The worst possible thing is that bad science gets influential on false grounds, which needs to be avoided. But increasing the signal-to-noise-ratio is probably not a bad idea either. And I suspect that they will both turn out to be improved by fixing the underlying incentive problems.
Hell, look at what the internet was back when it was still mostly needs with passion.
No, it isn't, because once something is labeled as "superstitious crap" it's not going to have any great impact on public policy. But if something is labeled as "scientific research", even if it's of no better quality or reliability than superstitious crap, it does have impact on public policy.
The "reproducibility crisis" is not a crisis because many scientific results are difficult to reproduce. That is to be expected in scientific research, since research is supposed to push the boundaries of what is already known, and that means many results will be sketchy and uncertain.
The "reproducibility crisis" is only a crisis for those people who took sketchy and uncertain scientific results and declared them as "settled science" and used them to drive public policy--and then everyone found out how sketchy and uncertain the results were when they could not be reproduced. So now the public doesn't know when "science" can be trusted, because scientists themselves have squandered its trustworthiness.
Your faith in the reliability of the scientific process is touching, but naive. Bad research is driving public policy in many areas.
> Even good research doesn't always hold up over time.
In the sense that it gets superseded by better models (as Bohr's original model of the atom did), sure. That's to be expected in a healthy scientific field. But one of the things that's needed to keep a field healthy in that respect is the ability to do controlled experiments with high accuracy. That's how scientists figured out that Bohr's original model of the atom wasn't right--it couldn't match the results of the experiments as they got more accurate.
It’s not helping whatever point you think you are making. Instead... consider the idea that WHAT IF climate models were failing reproducibility left and right along with the other sciences - should that mean a policy for a global tax model should be pushed as hard as it is?
You don’t have to be a “denier” to be skeptical of the course that the “true believers” are taking.
Some of us think the “climate religion” is a bad thing for a good cause. Long term counter productive.
So many researchers rely on assumptions of linear dynamics. So many experiments and studies are designed without consideration of observer effects.
Is it any wonder that a model might fit one day but not the next?
There is so much to be learned from applied topology, dynamical systems theory and (quantum) information theory, but methods from these disciplines are only just barely starting to become more widely accessible.
It seems like a practice of at least collecting the data at two different colleges might help? But this would require more coordination.
I'm really excited, for example, about "model-free" time series methods emerging from non-linear topological data analysis. Takens' embedding theorem shows that low dimensional "shadows" of high dimensional attractors can be constructed from a time series alone, and that they can reveal deep, useful facts about system a time series is part of, even if that system defies geometric modeling. These kinds of methods are radically different from conventional statistical methods.
see here: https://www.youtube.com/watch?v=NrFdIz-D2yM
Previously, researchers found themselves flailing around with geometry, trying to come up with models that accurately described the dynamical nature of the systems they were studying (e.g. fish populations). A model might work well for a few years, then stop working. These topological methods cut through and get straight to the underlying relationships that drive various observables in a complex system, without relying on fickle geometric assumptions about how exactly those relationships will be expressed.
Imagine: Scientist A uses (a statistical) method M to assert X. People praise A until scientist B uses method M and cannot assert X (which BTW usually doesn't imply X is wrong). People wait for scientist C to use method M and re-check assertion X. People then take the majority vote (or meta-analysis) for true.
That's the way much of science works today. It becomes a problem only in the following situations:
0. People hunt for statistically significant results.
1. X is mostly irrelevant to anyone who does not belong to the in-group of A, B, C, and peers. So nobody else really notices or cares about X.
2. There is no other way to observe any meaningful consequence of X other than by method M -- which eventually results in #1.
3. Method M is really expensive and complex, it's an almost impossible undertaking so that B or C most likely won't get a funding.
4. Everything is ok, actually, but the people who pay for the study don't care about A's methodological fineprint as long as the results play well with their other goals.
The presented list of "factors that build reproducibility" focuses on #0, for which pre-registration is a simple and clean solution. IMHO the list is much to narrow and focused on academic practice, though.
For example, there are a lot of gender studies that are presented as a science in my country which I believe is complete bullshit and an ideology.
People are pushing shit like that today and I believe we need more of the hard sciences and less of the soft sciences.
There are typically ablation studies which aim to determine how much each of the _successful_ improvements contributes to the result, but there's almost never any mention of things that looked promising on paper but didn't pan out in practice, nor is there any discussion of the reasons why, even though the authors often have a good idea post-facto.
Before looking for root causes that invariably turn a cynical eye towards the motivations of scientists, let's make sure the effects don't precede the causes.
Obviously this survey is biased towards people with a certain outlook.
Maybe even easier than the p-hacking used to produce an irreproducible result.
We know 19th-century science was productive. Was it similarly plagued by irreproducibility?
I’m sceptical of this argument. But we need a baseline to render judgement.
Their marketing is probably the worst in class but the CEO is a passionate developer who has worked hard on this.
Math, including computer science, is almost entirely pure logic, not empirical science (despite the name of the latter), so it's not even in the same epistemological domain where reproducibility is conceptually a concern.
Imagine if programming languages were a thing produced by nature and no one knew any of the commands, and new commands were found by just testing them out or theorizing that they might exist. You'd be getting the nobel prize for proving ls can take arguments.
Biology is not a clean science. Results are noisy. Results can be different depending on the manufacturers batch of the reagent.
is there anything of the sort that helped a bit ?
very lightly funny to see this when nix and guix (and pure fp) poping at the same time in a very different context.
Getting 'scooped' isn't so bad in biology at least. If you were thinking about XYZ and someone publishes X and Z, great. Cite that paper, now you don't have to bother so much validating X and Z and can focus on strengthening your Y argument or adding argument W to your paper.
Not true, at least in my field. If $BigLab can throw 100k CPU cores at a simulation it is going to finish way faster than $SmallLab 1k simulation. If $BigLab builds 10 dedicated test rigs they are going to finish that parametric study way waster than $SmallLab with 1 shared test rig.
Getting scooped may not a big problem for papers (except when you start getting rejections because lack of novelty), but it is a big problem for patents, licenses, royalties and such things that allow small labs to survive when their governments have little to no money to spend.
Reviewers are there to make sure you dotted your i's and crossed your t's, and will burn you if they can't easily determine that while skimming your paper over lunch break.
Can we get one of the moderators to add (2016) to the headline?
It would be interesting to see how reproducibility varies by field, university, country, etc. Although I guess scientific reputation would already give a clue over the quality of scientific work?