
Scraping Google's GitHub - throwaway9342
Google does a lot of their development in the open. Almost all members use either a real name, a real photograph, or a handle that can be traced back to their identity. I assume these Googlers are all programmers.<p>I went through all 1494 users at https:&#x2F;&#x2F;github.com&#x2F;orgs&#x2F;google&#x2F;people and classified them into Male&#x2F;Female&#x2F;Unknown (this took about ten hours). I then used the GitHub API to pull information about each user, as well as contributions data from all of Google&#x27;s Github repositories.<p>I&#x27;m not entirely sure what to make of these results. They&#x27;re potentially pretty incendiary, which is why I&#x27;m posting under a throwaway. Screenshot: http:&#x2F;&#x2F;i.imgur.com&#x2F;bJXyTqX.png, OpenOffice spreadsheet: https:&#x2F;&#x2F;filetea.me&#x2F;#n3wTD4xSK6rTn6xxZMtkxLq2w.<p>It&#x27;s pretty easy to grab the data from GitHub&#x27;s API if anybody wants to try to replicate this.
======
mtmail
What question does scraping this data aim to answer? What is your own opinion
after examining the raw data you collected?

~~~
throwaway9342
My opinion is that this is very surprising given their PR. When I think of
Google's gender ratio in engineering I think "20% women".

If you look at their GitHub, it is immediately obvious that the skew is much
more than that. I expect them to play it up a bit, but the actual disparity is
insanely high.

I would not have predicted that women would account of only 5% of their open
source developers and 1% of the contributions. And I'm still suspicious that
something weird must be going on to get this result.

