
A/B testing the gnupg developers. Submitting patches as female and male pseudonyms - slyall
https://twitter.com/isislovecruft/status/811502983615840256
======
Latty
This is one of those things that makes you uncomfortable, because you always
wonder what would happen if you were one of those people who rejected that
patch.

Unconscious bias is a real thing, and we all know that, and most of us try to
ensure we are as meritocratic as possible. The problem is, it's really hard to
measure your own success at that. I'd like to think that I'm going to be fair
and equal, but it is always hard to inspect your own bias. How are you meant
to ensure you are inspecting that code as well as you would if the author had
a different name?

Of course, it's possible that there are devs that are just straight up
bigoted. It seems like the person tweeting really dislikes the project - it'd
be good to see more than tweets to get a feel for the full story and see the
numbers. If it's 100% of a non-trivial sample size, that's pretty damning.

~~~
Sacho
I speculate it's extremely unlikely the person tweeting this conducted a
meaningful study. First off, they call it A/B testing, but unless they were
sending the same patch with a different pseudonym, then they're incorrectly
using the term.

Second, how did they determine pseudonyms as "male" or "female" sounding? If
you want to study unconscious bias, you'd need to know whether the things that
supposedly trigger it actually do so.

Furthermore, did they account for their own unconscious bias? I can imagine,
using the same unconscious bias theory, that they sent self-assessed lower
quality patches with "female" pseudonyms unconsciously, to prove their theory
correct.

Lastly, the 100% is very close to unbelievable, having an implicit assertion
that identity politics are the primary driver of motivation in the developer
team, way above any coding merit. To me, this is more likely to be projection
from an identity politics activist(for whom, by definition, identity politics
are the primary driver).

~~~
isislovecruft
First, you're correct: I never meant to conduct a meaningful study. It was
purely of my own curiousity that I did this. However that does not preclude
someone else, of any gender, doing so more scientifically. And to answer your
accusation: yes, I did submit the same patch twice, under two differently
gendered names (although never publicly, I would copy-paste the patch and
privately email it).

Second, my primary source for whether a name is "female- or male- sounding" is
historical lists of most popular baby names for each biological sex, found
through google searches.

Furthermore, I certainly have bias. Unfortunately, I have so much bias! As
someone who, in grade school, was named Charlie, and had the shit beaten out
of me for it, I very strongly know how it feels to be a man/boy who doesn't
fit the socially-prescribed norms. However, when I am writing patches for code
which determines a user's security, I do not play games; I want that code
accepted, no matter what. The code submitted as a "female" was usually either
rewritten by a certain gnupg developer (often to include further
vulnerabilities and UX pitfalls) or later accepted as a "male".

Lastly, if you submit 10-30 patches over a decade, it's not very hard to make
the acceptance/rejection rates come out to nice even numbers. I don't think
there's anything as high-level as "identity politics" at play here; I'm merely
suggesting that there is bias and that all of us, no matter what our gender or
how equitable we are, may have it.

~~~
petertodd
It'd be good if you published a list of those patches.

Also if anyone else repeats this experiment, note that there exist crypto
techniques to make it more difficult for the experimenter to do the experiment
dishonestly:
[https://twitter.com/petertoddbtc/status/811559259590660097](https://twitter.com/petertoddbtc/status/811559259590660097)

(I would have said "prove the experimenters honesty", but those techniques
aren't a 100% thing - you could for instance do the whole commitment/random
beacon process multiple times, and only publish the one set that gets the
results you want.)

edit: To be clear, submitting the _same_ patch twice isn't a good way to do
this - if I were a maintainer and learned that an identical patch had been
received by two different pseudonyms, I'd certainly consider rejecting it
purely on the basis that something fishy was going on; that test doesn't say
much about maintainer bias in normal situations.

------
slyall
Quotes for the tweet thread:

 _Have I mentioned I 've been A/B testing the gnupg developers over the last
decade by submitting patches as both female and male pseudonyms_

 _SPOILER ALERT: they denied 100% of the patches with female-sounding
pseudonyms_

 _the way i submit patches now is i email one of the decent devs and ask them
to pretend they wrote my code and send it to werner_

 _so far my data shows deviance is high, so it 's mostly that there's the
really awful projects and the ones with decent human beings_

------
uitgewis
Any substance to this, other than a couple of tweets?

~~~
uitgewis
Found a paper linked to by the same person:
[https://peerj.com/preprints/1733/](https://peerj.com/preprints/1733/)

~~~
Sacho
That paper describes the opposite effect to what OP is claiming - women as a
whole get their pull requests accepted more often than men. I just skimmed the
paper, but there was a large amount of explanations offered as to why this
might be the case(e.g. only more experienced women contribute, women are more
"well-known", women are making smaller pull requests, etc, etc) and all the
ones I skimmed were rejected by the data. While there is probably a meaningful
explanation(with lots of cators) why women are more likely to be accepted than
men, at the very least, the incredulous proposition that gnupg developers were
rejecting __100% __of "female" patches is not supported by this paper in any
way.

~~~
mijoharas
The paper states that the acceptance rates are only higher when they are not
identifiable as women

> However, women's acceptance rates are higher only when they are not
> identifiable as women

~~~
Sacho
Thanks, this is the relevant part of the paper for that claim:

> For outsiders, we see evidence for gender bias: women’s acceptance rates
> drop by 10.2% when their gender is identifiable, compared to when it is not
> (2(df= 1;n= 18;540) =131;p < :001). There is a smaller 5.7% drop for men
> (2(df= 1;n= 659;560) = 103;p <:001). Women have a higher acceptance rate of
> pull requests overall (as we reported earlier),but when they are outsiders
> and their gender is identifiable, they have a lower acceptance rate than men

Accepting arguendo that this is clear evidence of a gender bias, its effects
are not as clearly pronounced as the tweet tries to imply they are.

~~~
Latty
Sure, except the author of the tweets also states that they believe this
project in particular is a particularly bad outlier, with projects varying:
[https://twitter.com/isislovecruft/status/811510038166695937](https://twitter.com/isislovecruft/status/811510038166695937)

