Hacker News new | comments | show | ask | jobs | submit login
Understanding Bias in Peer Review (googleblog.com)
97 points by runesoerensen on Nov 30, 2017 | hide | past | web | favorite | 18 comments



I'm extremely glad that they measured this, and are sharing it. Even though I think it's obvious ahead of time, some people will be surprised about it. Which makes it doubly important to share, IMHO.

Why does Google get a ton of press for so many things? Because they are Google and they attract attention. Why does MIT get so much press for so many things? Because they are MIT and they get a ton of attention. Reputation matters a lot and is self reinforcing.

Peer review is only partly about saying "this seems to be well evaluated science." For journals and conferences, it's much more about "this is interesting to the audience." That is a hugely subjective evaluation, based on all sorts of cultural clues.

It would be extremely odd if scientists were somehow able to ignore that human tendency in peer review. If you can see names and institutions, and you have reputations associated with them, that will without doubt enter into the highly subjective evaluation of peer review. I would guess that even if you know about this human bias in yourself, you would still be subject to it subconsciously. I'm not sure there's any way to get around it except things like double blinding. Even then, for many sciences, the techniques and data sets used may very well give away who did the work.


I’m in history but it’s the same problem in our field - all the peer reviewing I’ve done has been double blind, but I can guess the author perhaps half of the time based on things like self-citation. It’s a hard problem to solve, for sure.


Yeah, double blinding is ineffective in almost every field. A good article on double blind peer review is [1].

Being aware of the bias is one thing, although I'm afraid that indeed only partially mitigates it. Making peer reviews public after the fact could help somewhat in allowing people to judge for themselves and make the biases less damning, likewise for post-publication peer review. In any case, that would make it more feasible to do research on peer review and perhaps find better solutions.

[1] http://blogs.plos.org/absolutely-maybe/2017/10/31/the-fractu...


This is very cool, but I'm having trouble following the method behind the results.

Say concretely, the result that single-blind reviewers give higher reviews to papers from prestigious sources. Can anyone help me understand? https://arxiv.org/pdf/1702.00502.pdf

They give each paper a "blind" score bqps from the blind reviewers only. Then they fit the vector Theta to the model in 5.1 and present Theta in Table 5. They also present a p-value for each entry of Theta. In particular, for com (top company), uni (top university), and fam (famous author) the p-values are below 0.05.

Bu I didn't see how these p-values are computed or what they mean, or what the entries of Theta are supposed to mean. What's the null hypothesis? If the single-blind authors and double-blind authors acted the same way, would we expect the entries of Theta to all be zero except for the coefficient of bqps and the constant? (If so, why?)

...

In other words, I don't understand how the exponential odds-ratio model at the beginning of 5.1 maps to the claim in the introduction "We find that the likelihood for single-blind reviewers to enter a positive review is significantly higher for papers" from say top companies.

(Edit:) One thing that would help me is if the authors fit the same model in the reverse direction, i.e. computed bqps from the single-blind reviews and fit a Theta' in predicting the double-blind reviews. I am tempted to say that all their result shows is that, conditioned on the double-blind reviews, famousness is useful in predicting single-blind reviews and the trend is positive; but this could conceivably be true in the reverse case as well.


It looks like they fit a (multinomial?) logistic regression. Unfortunately, the interpretation of the model coefficients is difficult because each coefficient is a log likelihood - therefore on a weird metric.

I would suggest to look at the odds ratio column for a coefficient that is more interpretable. An odds ratio of 1 represent equal odds of being accepted. A coefficient greater than one represents better odds of acceptance (e.g., a paper from a university is 1.5 times as likely as a paper not from a university to be accepted or, said in another way, a paper for a university has is 50% more likely than a paper not from a university to be accepted).

Coefficients less than 1 have a similar but opposite interpretation. For example, women are 0.78 times as likely as men to be accepted (or 22% less likely).

I wish they had reported marginal probabilities of acceptance because odds ratios can be misleading when marginal probabilities are small or large (e.g., if group 1 has a marginal probability of acceptance of 1% and group 2 has 3%, their odds ratio will be 3 but the marginal difference is basically negligible).


Thanks for the help. The odds ratios are easier to interpret, but the big question I still have is how this shows a (statistically significant) difference between double-blind and single-blind reviews....


Devil's advocate: sometimes, knowing the author of a work really can allow a reviewer/reader to make a more accurate assessment of its value.

As an extreme example, if Stephen Hawking submits a paper that calls into question the speed of light as a universal constant, I'm much more likely to take it seriously (and in the peer-review case, would be much more likely to recommend it for publication) than the same paper from some unknown "up and coming" physicist. This is because Hawking is an expert in the field, and I trust that if there were any obvious flaws in the analysis he would be more likely to have discovered them than a randomly-selected unestablished researcher.

No reviewer has time to independently investigate and verify every single claim a paper makes, and knowing that the paper comes from an elite institution might very reasonably be used as a signal that it should be given the benefit of the doubt.


>Devil's advocate: sometimes, knowing the author of a work really can allow a reviewer/reader to make a more accurate assessment of its value.

>As an extreme example, if Stephen Hawking submits a paper that calls into question the speed of light as a universal constant, I'm much more likely to take it seriously

I'm sure Mr. Hawking, being so skilled in his field, can appropriately cite related work and lay out what he sees as it's flaws without resorting to an appeal to authority.

If you need such crutches to know when someone's paper is high quality, perhaps you are not fully invested in the reviewing process and should allow someone willing to take the time to fully evaluate all submitted works.


How are you being a devil's advocate? As far as I can tell, your viewpoint is completely described in the conclusion, quoted here:

"In conclusion, the heart of our findings is that single-blind reviewers make use of information about authors and institutions. Specifically, single-blind reviewers are more likely to bid on papers from top institutions and more likely to recommend for acceptance papers from famous authors or top institutions, compared with their double-blind counterparts."

"The primary ethical question is whether this behavior is acceptable. In one interpretation, single-blind reviewers make use of prior information that may allow them to make better overall judgments. As a consequence, however, it may be that other work is disadvantaged, in the sense that two contributions of roughly equal merit might be scored differently by single-blind reviewers, in favor of the one from a top school, while double-blind reviewers may not show this bias as strongly."

Or are you arguing that there is no ethical concern at all? The introduction seems to cover the main pro and con arguments, as part of the justification for the need for experimental evidence.


> How are you being a devil's advocate?

Popular opinion among scientists, engineers, etc. is that science stands (and should stand) on its own without needing support from appeals to authority, bandwagon effects, etc.


Okay, but how was kcorbitt being a devil's advocate with respect to this paper?


He was being devil's advocate with respect to popular opinion, not this particular paper.


The paper was pretty clear that the popular opinion (among those with knowledge of the topic) is mixed, with advocates for both single blind and double blind, and that that vast majority of evaluations is done single blind - that is to say, aligned with the kcorbitt's position.

What do you mean by "devil's advocate"? Because I can't make sense of how to apply either my understanding of the term, nor the description at Wikipedia, to kcorbitt's comment.


A scientific review should focus on the findings alone. Using the author's name as a shortcut around (or an excuse to avoid) thoroughly reviewing the content should not be encouraged.


> Compared to double-blind reviewers, we saw about a 22% decrease in the odds that a single-blind reviewer would give a female-authored paper a favorable review, but due to the smaller count of female-authored papers this result was not statistically significant. In an extended version of our paper, we consider our study as well as a range of other studies in the literature and perform a “meta-analysis” of all these results. From this larger pool of observations, the combined results do show a significant finding for the gender effect.

This sounds like p-hacking. They're testing for significance, and when that fails, use the same data again along with other data to achieve significance. Am I mistaken?


Are you referring to meta analyses? They are often used to find p-hacking:

[1] http://training.cochrane.org/resource/introduction-meta-anal...

[2] http://journals.plos.org/plosbiology/article?id=10.1371/jour...


Having submitted to that conference, I really wish they had told authors which reviews came from each arm. Very frustrated with that submission.

Good, important findings for the experiment.


It is a bit odd that the study did not include zero-blind (both the reviewers and the authors know each others' identity) process. I'd be very interested in seeing how the results compare to the two conventional peer review processes.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: