Hacker News new | past | comments | ask | show | jobs | submit login
Peer review is essential for science. Unfortunately, it's broken (arstechnica.com)
54 points by rbanffy 4 months ago | hide | past | favorite | 48 comments



> And at the height of the COVID-19 pandemic, I watched in alarm as public trust in science disintegrated.

It is not that common that science airs their dirty laundry out in the public. With covid, the debates and disagreements were very public.

Was covid-19 a severe disease or "just a flu"? Does it spread primarily by droplets, or is it airborne? Do masks work, or not? Were the lockdowns useful, or not? Did the border closures help, or not? Did the virus likely escape from a laboratory, or jump to humans from a natural animal population? Is long covid a serious condition, or is it just psychological?

There isn't a single policy-relevant question about covid-19, where you don't have one camp of highly-credentialed experts in medical science and epidemiology arguing for A, and another camp of equally highly-credentialed experts arguing for B.

People are not that stupid. They can see that science doesn't agree on pretty much anything on covid. And not just tiny minutiae, but about the big, relevant questions. So the loss of public trust is well-earned. Epidemiology just isn't nearly as well developed a science as we thought it to be.

And if other sciences only take a meta level approach of "please trust the science", like the article author Paul Sutter here, it doesn't help. Maybe, if other sciences would call epidemiology out "please trust our science at least, we're not as bad as epidemiology", it could help some.


The demonstrated lack of scientific agreement was not the problem. People are ok with that. The problem was that those who held the levers of power instituted polices as if there there was no disagreement. If you’re not sure whether community mask mandates will materially alter the trajectory of a pandemic, you can encourage masks, but you cannot — at least in a liberal democracy — require masks. Same goes for public messaging. If you’re not sure vaccinations will stop viral transmission in its tracks, you cannot claim it will, as Dr. Anthony Fauci did.

Trust in public health authorities thus has been severely compromised, as well as trust in scientists who as a whole proved too much invested in marginal protections to their personal health and community standing, and too little in their role as dispassionate scientific observers.


> Was covid-19 a severe disease or "just a flu"? Does it spread primarily by droplets, or is it airborne? Do masks work, or not? Were the lockdowns useful, or not? Did the border closures help, or not? Did the virus likely escape from a laboratory, or jump to humans from a natural animal population? Is long covid a serious condition, or is it just psychological?

> There isn't a single policy-relevant question about covid-19, where you don't have one camp of highly-credentialed experts in medical science and epidemiology arguing for A, and another camp of equally highly-credentialed experts arguing for B.

A big problem with this is that a lot of these questions had your experts agreeing on the premise but the point being lost with disagreements above the actual premise.

> Was covid-19 a severe disease or "just a flu"?

The answer is both because the flu is also a severe illness, especially without vaccination (however still potentially even with it). So you had one side arguing that we should be treating it like the flu and the other side arguing for much harsher restrictions that we should also be taking for dangerous new flu variants.

> Does it spread primarily by droplets, or is it airborne?

And here again the answer is both. As a technicality it spreads via droplets however it still effectively spreads on very fine, aerosol-ized droplets so within the context of a normal person it's basically airborne (as just being in the same confined space can cause infection).

> Do masks work, or not?

Again the answer is both. Masks are extremely effective at preventing spread of disease. But they generally do little to protect you unless you have very specific types of well fitted masks. So do they work? Yes. Will they keep you from getting sick? Probably no.

> Did the border closures help, or not?

And both again. Did they help? Yes. Were they done too late for true containment? Yes. Did they help slow the spread of variants across borders? Yes. In the way they were done (i.e. too little too late), were they worth the cost? Maybe not.

> Did the virus likely escape from a laboratory, or jump to humans from a natural animal population?

Skipping this one because it's less medical science and more a question for investigation/history/politics.

> Is long covid a serious condition, or is it just psychological?

Both. Long term damage from covid is serious. Especially for those who got the more severe covid variants without a vaccine. But then there's also much harder to quantify damage from covid that sticks around for a long time. And in a lot of cases it looks a lot like Chronic Fatigue Syndrome which unfortunately also has the misfounded "it's just psychological" stigma attached to it.

And so the bigger issue during this whole event was that you had professionals trying to hold these discussions at a level that your average American could follow along with from home. This unfortunately meant throwing away a lot of nuance and as a result disagreements over smaller points were reinterpreted as disagreements over the fundamental premise instead.


This criticism is mostly outdated- including the analysis code is now required for most good journals. As a computational biology academic PI, I review a lot of papers, and don’t approve anything without code unless the analysis is trivial- like a standard off the shelf statistical test. In the past this objection has been overruled by editors, but not in the last few years.

That said, people misunderstand what peer review is and trust it too much. It is just a quick sanity check, and serves mostly to get feedback on how to make a paper more understandable. Good faith is assumed, reviewers do not look for fraud or hidden mistakes, that isn’t necessary or the point of peer review- fraud will eventually come out and consequences are severe. In most cases when I review a paper I have less than an hour to look it over, and mostly try to suggest things to make the paper easier to understand.


I can see your analysis being used by participants cited as paper co-authors...

"...my name's on the paper, but I mostly just cheered from the sidelines. I didn't dive into the data or run any analyses. Honestly, most co-authors don't. We just add names to look impressive. If the paper's not great, it's because real work is hard, and I was more of a supportive figure rather than an actual contributor..." :-)


I don't think the criticism is outdated, in spite of what you mention. Indeed, in the last few decades, releasing the code has gone from uncommon to standard practice. But in my view, both you and the post author are greatly overstating the importance of releasing the code. Sure, releasing it is clearly better than not releasing it, it's a step forward. But it makes very little difference to peer review, if at all, since peer reviewers hardly ever actually open the code. You already are lucky if you can find reviewers that read the whole paper rather than skimming through the dense central sections - good luck finding reviewers that will actually check the code, considering that it takes much more time and effort than reading a paper. When I'm myself a reviewer, I also penalize papers with no code - but papers with code might as well link the Minesweeper source and most of the time I wouldn't notice, because I don't have time to check - and I devote more time per review than most people I know.

Regarding your second paragraph, that is what peer review actually is in the present time, but the point is that it's supposed to be much more than that. And the problem is not only outright fraud that might have severe consequences, the post itself mentions "soft fraud" that will hardly ever have any consequence (who is going to notice that someone made several experiments and only published the one favoring their conclusion?). And there are also problems not related to fraud - for example, peer review is supposed to be not only about correctness but choosing the best work for top-tier venues, good but not that impactful work for second tier, etc. Shallow peer review often means that work that is not fraudulent, but not stellar is accepted in better venues due to famous authors/institutions or just authors that are especially skilled in "pitching" their work, and vice versa.


In practice fraud and file-drawer effects observed (estimated from all the large N replication projects that have sprung up) are much less common than what the replication crisis headlines imply, a lot of it seems to simply be that in practice a "null hypothesis" is better understood as a continuous distribution because the null model doesn't match reality perfectly. See "The replication crisis is not a crisis of false positives" Coleman et al 2024, https://osf.io/preprints/socarxiv/rkyf7


In my experience, getting someone else's code from a paper to run is a PITA. In life science, the person who wrote the code is often not really that experienced in this kind of thing, and certainly didn't have ease of sharing and running in mind when they started the project.

I've long had in the back of my mind some kind of cloud-based platform for writing, running, storing, and sharing scientific code. It would be the standard, and journals would (in time) ask that analysis be done using it, or that reasons are provided as to why it could not be used.

Maybe one already exists?


> I've long had in the back of my mind some kind of cloud-based platform for writing, running, storing, and sharing scientific code.

And now there are 15 competing standards.

Everyone uses the computing resources available at their institution, and each institution makes its own choices. Running someone else's code almost always involves migrating it to a new environment. Even using something as common as Docker is problematic, as it doesn't always interact nicely with the rest of the environment (such as Slurm).


One success story is that of the imageJ/Fiji project, which provided/provides a standard way of writing and running image analysis workflows.


I just don't really think peer review is that important... it doesn't need to be more thorough, you don't need people to double check every little thing or try to replicate things before publication. Good scientists are already skeptical, and will remain so about publications no matter how much peer review it goes through.

In the past, it mostly helped journals weed out papers that weren't interesting, to save print costs. Nowadays, as a scientist, I'd rather everything just get posted quickly on preprint servers, and make my own judgement. Reviewers won't ever look deeply, but you can bet another scientist planning several years of work based on someone else's paper will look really deeply.

IMO when I've had papers not published because of bad reviews, it was usually from competitors that were bitter about me beating them to something important, or using a different approach than they would... or inappropriate reviewers that weren't qualified to understand the paper. I don't think more aggressive gatekeeping by peers would do anyone any good, and should probably just be eliminated entirely.

Having the code is important because the code is part of the methods. Without the code, the work isn't always even possible to reproduce.


In a perfect world, studies should not only be required to include the code, any of the following should also be a reason to reject a publication:

* The code can't be run outside the original researcher's undocumented special snowflake system

* A fellow scientist who understands the domain can't understand what the code is doing and so is unable to verify that it isn't making any mistakes

* It'd be really nice if there were some sort of test cases that could verify that the code in question produced results in line with what the scope of the paper in question considers to be established science in that domain before it was applied to the experimental data - probably not always possible, but it'd be good to at least try

Yeah I know this doesn't touch the dozens of other reasons the whole institution is in trouble, and for those reasons usually nobody who has the skill and time to properly review the code actually reviews it carefully. For that matter, most scientists aren't very good at writing code that's easy to understand and will run everywhere. Just one more set of things that ought to go on the list of things that really should be changed.


Most scientists unfortunately lack the technical skill to make their code easy to run elsewhere... however I think it is getting better.

Someday I would like to see it a requirement that the code has to run anywhere with a single command, e.g. 'docker compose up' or such, and then automatically produce all of the figures and analysis in the paper. There have been some pushes to start journals where the articles themselves can't have any plots or numbers directly, but must automatically generate them from code on the journals servers, e.g. with something like R markdown. It hasn't caught on because the work and technical skill required is too much.


That's not the code from my point of view. That's the script that processes the output of the code. The actual code often needs nontrivial time and/or money to run, and you may need to spend some effort to match the resources you have with the resources the code needs.


The vast majority of papers, even computational ones can run in that way. If there are compute intensive steps, the same process can be done, with a step that includes the previous output, such that it only gets regenerated if the user manually deletes it. I usually do this with Makefiles.


This article reads like nonsense - even the header is clickbait material.

Claiming science needs peer review is like saying a painting isn't art until a critic reviews it. The results existsindependently of external validation.

Peer review is not essential for science. Science can be done, whether or not it's peer-reviewed.

Especially when you're dealing with scientific frontiers, where no one really can verify things - and peers don't even exist? How does peer review work then?

The idea that something can be 'less scientific' just because it's not been peer-reviewed is nonsense. Unreviewed scientific work can stand on its own merit. Moreover this insistence on peer review really opens up more room for academic dogma and politics to creep into the hard sciences - not a good thing.

Given the amount of scientific fraud going on, even with 'peer review', it's just not necessary for true science and it's something that should be done away with.


There are many problems with how many people perceive peer-review process that I gathered from my interactions with non-academics.

1. They assume that peer-review means reproducing the paper.

This is not the case. It just means that it is a work that followed the scientific method and does not have serious and obvious problems (like being a complete non-sense). Well there is a quality curve for that of-course depending on the editors/journals and peer-reviewers

2. Academics have to do peer review as part of their jobs.

It is always surprising for them to know that peer review is a volunteer based work in vast majority of cases. It is even a burden for many people that affects their actual funded work. The publishing industry has one of the highest profit margin for a reason.

3. The peer review process is the same across fields.

There is no way that CERN papers will be treated the same and a neuroscience lab at an average research institution. Not because of one being better and higher quality but because of different fields and journals..etc.

4. Peer-review is an essential part to confirm the results.

In reality peer-review is a filter to help scientists navigate the sea of new papers in their fields. And by filter I mean filter the non-sense and the low quality research not the correct results. Something also about making sure that the methodology and results make sense.

Those points are usually what I see people discussing and forming opinion based on. Any solution for the current academic problems will not only need a regulation and new policies. The real need will be a change the structure of funding and how it is allocated (with an actual increase in budget). This is where many people will start to think that there are other priorities for a budget increase.

By actual I mean an increase that really do more than barely catching with the inflation rate.


One receives grants for doing new research and writing papers about that research. There's no financial incentive for reviewing the existing body of work, and since one needs money to eat, guess what doesn't get done?

It's basically a case of lots of people are talking and few of them are listening. It's understandable that a lot of trash talk gets ignored.


That's not true, because a very common way of getting a new idea is to start from something existing and then extend it.

Thus a pretty standard task for any PhD student is to reproduce someone's published result as the first step towards testing the new idea. And even if you start by assuming some previous work is correct, frequently you will end up reproducing it in some what because you want to sanity check the extension you built on top of it.


The overwhelming majority of the groups doing those extensions are the same groups who did the original research, and the increments are oftentimes so small that they don't expose the flaws in the underlying body of work.


I guess… how do we get people paid for reviewing? It’s an interesting point that the incentives are not helping us here.


The whole academic world gets increasingly ignored due to the lack of care from its participants. We all need to eat but I believe all are far from hunger and many seek good and wealthy life on top. Ironically they chose wrong profession then, there are better paying professions and occupation with less brainpower to spend.


We should seriously consider and question what peer review is.

Peer review is very obviously not essential for science, as the most astounding science, which gave us the modern era we all have come to enjoy, was done without peer review.

In fact, it's well worth reading about Robert Maxwell and his influence in birthing the scientific publishing industry (yes, strangely Ghislaine Maxwell's father) and the inception of peer review. And it's well worth asking how effective peer review is at encouraging scientific progress, how effective it is at producing gatekeepers, and how it incentivizes those gatekeepers. I'd suggest it's far better at the latter, and produces poor incentives.

But in today's discourse, peer review has become unquestionable and that should make us ask more questions.


Agree. People act like peer review is this unshakeable tenant of the scientific method, when it only became widespread in the 70s and 80s. To my mind, papers written these days are usually a lot worse than those written before peer review, often due to the demands of that process (soften your language here, obscure your point there, cite thirty other papers unnecessarily just in case one of the authors is the reviewer).


I have come to realize that when we do a peer review for Journal X, all we are really doing is protecting the journal's reputation by helping weed out substandard papers.

There seems to be a belief that the peer review process is like a universal system, so if a paper is 'rejected' during peer review for Journal X then it will be thrown in the garbage and never trouble us again. But in reality, the authors will submit it to as many journals as necessary until it sneaks through, and eventually it will sneak through.

I repeat: Every rejected paper will eventually be published somewhere. Perhaps not in Nature, or Top Journal A, but it will appear in Respectable Journal B or Mediocre Journal C. And it will bare the seal of "peer reviewed".

Ironically, I guess my point is that peer review is not broken; it's working great for the journals.


I've found it interesting to hear about the history of peer review, which started as a way to keep out "the unqualified masses" from the review process, and also as a way to sell more journals. The really interesting part of this is that "peer review" wasn't a formal thing before the 1960s. This means that the lion's share of what we consider the greatest advances in scientific endeavor never went through "peer review" in the sense it's meant now. Not Newtonian mechanics, not Maxwell's electromagnetism, not Einstein's relativity, not quantum mechanics... Peer review became about segmenting and exploding the number of journals universities had to purchase to be "current." It was really a publishing company coup on the scientific process.

Eric Weinstein talks about this in his critique of the whole Terrence Howard kerfuffle.


> I repeat: Every rejected paper will eventually be published somewhere. Perhaps not in Nature, or Top Journal A, but it will appear in Respectable Journal B or Mediocre Journal C. And it will bare the seal of "peer reviewed".

I agree, but I dont know if it'a a bug or a feature. We had one paper that took a while to get published.

I prefer to use "peer reviewed and published in a good journal" where I'm not sure if "good" means A, B or C in your scale. But there are also D, E, F, G, ... depending on how finely you want to clasify the crap and predatory journals.

The problem is that outside my area, I really don't know which journals are good and which journals are bad. One of my criteria to not send the journal to the Z slot is if I recognize the publisher of the journal. Sometimes the impact index is useful, but it changes too much from area to area. I take a look at the other papers in the same jorunal. Too many single author papers or too many unrelated topics are red flags.

At the end of the day, for research or for posting angry comments in HN, the only solution is to RTFA. I read the abstract, skip the introduction that is usually too optimisitc, and go directlry to the tables and graphics. I learned that Ctrl+F "exclusions" is important in medicine, becuase I've seen weird methods to filter the patients.


Honestly I think it's a good system at it's core, it just needs some adjustment on incentives (mainly that reviewers need to actually be fairly compensated for their time).

The way I see it, journals are more or less independent reviewing orgs. That a paper has been peer-reviewed means nothing. It's all in the reputation of the reviewer and what they said.

Personally I'd actually like to see journals publish their reviews including rejections because I think it's way more important to know why something was or wasn't accepted than that it was.


Publishing a record of rejections and reasons is a very interesting idea! I will be adding that to my "deserves more thought" pile!


From an earlier comment:

I have long maintained that the NIH should set aside 20% of its budget to fund spot checking of the research output of the remaining 80% of funded research. Even if this funding is only sufficient to review a handful of papers each year, it stands to have a transformative effect on the level of hype and p-hacking in many fields, and could nearly eliminate the rare cases of actual fraud. It would also improve the overall quality and reliability of the literature.


What I'm going to say is tangential to the article's main issue (scientific fraud), but I take issue with the "essential" in the headline. Not only there was science going on without peer review, there are some very proeminent examples of it, too. None of Einstein's annus mirabilis were peer reviewed. Einstein didn't care much for peer review either. Later on, when Physical Review tried to peer review one of his papers with Rosen, he chose to submit the paper elsewhere.

More about it in: https://hsm.stackexchange.com/questions/5885/how-did-the-pub...


I presume that the current review process was essential for and formulated in the ages of printed journals. It is good to know before going into the press if a work worthwile the costs and hustle. Beyond the simplicity of publications that the writing mentions.

Meanwhile journals went online for presenting the papers yet procedures remained in the Gutenberg era, with pre-publishment reviews.

Why not using the advancement of modern technology not just for decreasing the printing costs and pushing pdf on a site (and keep charging a lot for the service), but extend it with post-publishment review possibility. Add further reviews later at the source of the article itself when relevant thing happens (advancement, refutement, reproduction (hehe), fraud, ...). Something of an advanced and improved commenting service from the journals, or better yet, a consortium of journals to form a common ground for Web 2.0 style interactions for scientists and perhaps other stakeholders and the general public too, where all can review posts and rate the journal, with weight of their own influence factor based on familiarity into the topic, likely determined by the quality and amount of their own publications reviewed, hence the weight of a presently sought out experienced researcher for peer-review have higher impact factor with his/hers attached review than Average Joe from the Lowerville Technical Society buying access to the review system. Oh yes, the system is not free to maintain, so those publishing and want review have subscription for it to the appropriate level (government support is not excluded obviously, this is a thing for the benefit of mankind afterall, in theory :) ).

It will not solve the fundamental replication and fraud crisis but may make it more traceable, discovering quality easier, if the system is built right. Which naturaly is an immence difficulty on its own, how to build right and reliable with current players not really interested in change, but probably worth trying, no?


I'm always baffled that only putting "PH. D." in big letters is apparently enough to grant an author universal authority, somehow the actual subject one has studied is of no importance to a lot of people.

Anyway, this piece doesn't actually have much to say about the practical aspects of peer review, this is about it:

> Peer reviewers don’t have the time, effort, or inclination to pick over new research with a fine-toothed comb, let alone give it more than a long glance.

Basically, "it's broken, there's massive fraud" is a statement proved by assertion which ironically is the type of shoddy work the text is supposed to oppose.

For a different perspective, see "The replication crisis is not a crisis of false positives" Coleman et al 2024, https://osf.io/preprints/socarxiv/rkyf7


From the article: "How am I supposed to judge the correctness of an article if I can’t see the entire process?" If you can't, then don't. When software developers do code reviews, obscurity of the changes made is sufficient to deny a pull request from being merged.

If a peer reviewer can't follow a scientific paper then they shouldn't be rubber stamping it. I'd say it's the responsibility of the paper's authors to ensure it has enough information in it to determine its validity. Reviewers shouldn't be afraid to send back papers that don't.


What isn’t broken nowadays?


Everything has been broken forever, we just get to hear about it nowadays.


In the field I used to work in (cogsci, neurocognition), peer review has been broken for over 30 years. Peer reviewers didn't care about the conclusion. If there was data and a p-value that met superficial scrutiny, any conclusion would be accepted. Combined with shoddy methodology and a disdain for statistics, this led to a deluge of completely untested theories, and the reproducability crisis.


The competency crisis will march along slowly, then explode all at once.


Competence crisis? Crikey, that's a lot of crises! Sounds like we're in a crisis crisis.


iPhones?


I'm not sure I follow the author. He seems to complain that peer review is not sufficient to counter incentives to game the h-index[1].

It sounds like the h-index has ceased to be a measure and is now a target. He doesn't explain the incentives around higher h-index(which decisions are based on h-index, which benefits does it confer), and what we might do to replace it as a target.

1. https://en.wikipedia.org/wiki/H-index


Create Nobel prize category for proving existing research false


Off topic but... Why post ARS links in the long form instead of https://arstechnica.com/?p=2030357 ?


So many ads, and this feels like an AI wrote it.


> So many ads

It's dangerous to go alone. Take this: https://ublockorigin.com/


yeah...i have it off right now but generally it's on.


Definitely doesn't seem like an AI based on sentence, paragraphs and word structure.


Have you tried the latest Anthropic model? Sonnet 3.5 is pretty stunning. I really like reading what it produces.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: