
Understanding Bias in Peer Review - runesoerensen
https://research.googleblog.com/2017/11/understanding-bias-in-peer-review.html
======
epistasis
I'm extremely glad that they measured this, and are sharing it. Even though I
think it's obvious ahead of time, some people will be surprised about it.
Which makes it doubly important to share, IMHO.

Why does Google get a ton of press for so many things? Because they are Google
and they attract attention. Why does MIT get so much press for so many things?
Because they are MIT and they get a ton of attention. Reputation matters a lot
and is self reinforcing.

Peer review is only partly about saying "this seems to be well evaluated
science." For journals and conferences, it's much more about "this is
interesting to the audience." That is a hugely subjective evaluation, based on
all sorts of cultural clues.

It would be extremely odd if scientists were somehow able to ignore that human
tendency in peer review. If you can see names and institutions, and you have
reputations associated with them, that will without doubt enter into the
highly subjective evaluation of peer review. I would guess that even if you
know about this human bias in yourself, you would still be subject to it
subconsciously. I'm not sure there's any way to get around it except things
like double blinding. Even then, for many sciences, the techniques and data
sets used may very well give away who did the work.

~~~
benbreen
I’m in history but it’s the same problem in our field - all the peer reviewing
I’ve done has been double blind, but I can guess the author perhaps half of
the time based on things like self-citation. It’s a hard problem to solve, for
sure.

------
bo1024
This is very cool, but I'm having trouble following the method behind the
results.

Say concretely, the result that single-blind reviewers give higher reviews to
papers from prestigious sources. Can anyone help me understand?
[https://arxiv.org/pdf/1702.00502.pdf](https://arxiv.org/pdf/1702.00502.pdf)

They give each paper a "blind" score bqps from the blind reviewers only. Then
they fit the vector Theta to the model in 5.1 and present Theta in Table 5.
They also present a p-value for each entry of Theta. In particular, for com
(top company), uni (top university), and fam (famous author) the p-values are
below 0.05.

Bu I didn't see how these p-values are computed or what they mean, or what the
entries of Theta are supposed to mean. What's the null hypothesis? If the
single-blind authors and double-blind authors acted the same way, would we
expect the entries of Theta to all be zero except for the coefficient of bqps
and the constant? (If so, why?)

...

In other words, I don't understand how the exponential odds-ratio model at the
beginning of 5.1 maps to the claim in the introduction "We find that the
likelihood for single-blind reviewers to enter a positive review is
significantly higher for papers" from say top companies.

(Edit:) One thing that would help me is if the authors fit the same model in
the reverse direction, i.e. computed bqps from the single-blind reviews and
fit a Theta' in predicting the double-blind reviews. I am tempted to say that
all their result shows is that, conditioned on the double-blind reviews,
famousness is useful in predicting single-blind reviews and the trend is
positive; but this could conceivably be true in the reverse case as well.

~~~
pacbard
It looks like they fit a (multinomial?) logistic regression. Unfortunately,
the interpretation of the model coefficients is difficult because each
coefficient is a log likelihood - therefore on a weird metric.

I would suggest to look at the odds ratio column for a coefficient that is
more interpretable. An odds ratio of 1 represent equal odds of being accepted.
A coefficient greater than one represents better odds of acceptance (e.g., a
paper from a university is 1.5 times as likely as a paper not from a
university to be accepted or, said in another way, a paper for a university
has is 50% more likely than a paper not from a university to be accepted).

Coefficients less than 1 have a similar but opposite interpretation. For
example, women are 0.78 times as likely as men to be accepted (or 22% less
likely).

I wish they had reported marginal probabilities of acceptance because odds
ratios can be misleading when marginal probabilities are small or large (e.g.,
if group 1 has a marginal probability of acceptance of 1% and group 2 has 3%,
their odds ratio will be 3 but the marginal difference is basically
negligible).

~~~
bo1024
Thanks for the help. The odds ratios are easier to interpret, but the big
question I still have is how this shows a (statistically significant)
difference between double-blind and single-blind reviews....

------
kcorbitt
Devil's advocate: sometimes, knowing the author of a work really _can_ allow a
reviewer/reader to make a more accurate assessment of its value.

As an extreme example, if Stephen Hawking submits a paper that calls into
question the speed of light as a universal constant, I'm much more likely to
take it seriously (and in the peer-review case, would be much more likely to
recommend it for publication) than the same paper from some unknown "up and
coming" physicist. This is because Hawking is an expert in the field, and I
trust that if there were any obvious flaws in the analysis he would be more
likely to have discovered them than a randomly-selected unestablished
researcher.

No reviewer has time to independently investigate and verify every single
claim a paper makes, and knowing that the paper comes from an elite
institution might very reasonably be used as a signal that it should be given
the benefit of the doubt.

~~~
eesmith
How are you being a devil's advocate? As far as I can tell, your viewpoint is
completely described in the conclusion, quoted here:

"In conclusion, the heart of our findings is that single-blind reviewers make
use of information about authors and institutions. Specifically, single-blind
reviewers are more likely to bid on papers from top institutions and more
likely to recommend for acceptance papers from famous authors or top
institutions, compared with their double-blind counterparts."

"The primary ethical question is whether this behavior is acceptable. In one
interpretation, single-blind reviewers make use of prior information that may
allow them to make better overall judgments. As a consequence, however, it may
be that other work is disadvantaged, in the sense that two contributions of
roughly equal merit might be scored differently by single-blind reviewers, in
favor of the one from a top school, while double-blind reviewers may not show
this bias as strongly."

Or are you arguing that there is no ethical concern at all? The introduction
seems to cover the main pro and con arguments, as part of the justification
for the need for experimental evidence.

~~~
humanrebar
> How are you being a devil's advocate?

Popular opinion among scientists, engineers, etc. is that science stands (and
should stand) on its own without needing support from appeals to authority,
bandwagon effects, etc.

~~~
eesmith
Okay, but how was kcorbitt being a devil's advocate with respect to this
paper?

~~~
humanrebar
He was being devil's advocate with respect to popular opinion, not this
particular paper.

~~~
eesmith
The paper was pretty clear that the popular opinion (among those with
knowledge of the topic) is mixed, with advocates for both single blind and
double blind, and that that vast majority of evaluations is done single blind
- that is to say, aligned with the kcorbitt's position.

What do you mean by "devil's advocate"? Because I can't make sense of how to
apply either my understanding of the term, nor the description at Wikipedia,
to kcorbitt's comment.

------
lancebeet
> Compared to double-blind reviewers, we saw about a 22% decrease in the odds
> that a single-blind reviewer would give a female-authored paper a favorable
> review, but due to the smaller count of female-authored papers this result
> was not statistically significant. In an extended version of our paper, we
> consider our study as well as a range of other studies in the literature and
> perform a “meta-analysis” of all these results. From this larger pool of
> observations, the combined results do show a significant finding for the
> gender effect.

This sounds like p-hacking. They're testing for significance, and when that
fails, use the same data again along with other data to achieve significance.
Am I mistaken?

~~~
tomw2005
Are you referring to meta analyses? They are often used to find p-hacking:

[1] [http://training.cochrane.org/resource/introduction-meta-
anal...](http://training.cochrane.org/resource/introduction-meta-analysis)

[2]
[http://journals.plos.org/plosbiology/article?id=10.1371/jour...](http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106)

------
bagrow
Having submitted to that conference, I really wish they had told authors which
reviews came from each arm. Very frustrated with that submission.

Good, important findings for the experiment.

------
greatfireball
It is a bit odd that the study did not include zero-blind (both the reviewers
and the authors know each others' identity) process. I'd be very interested in
seeing how the results compare to the two conventional peer review processes.

