The more interesting finding is that there are more women in high status (first or last) author positions in the open science community (when the number of authors is smaller) and there is an increasing trend in this community of women occupying such positions. This would be predicted by theories that STEM is an individualistic enterprise less likely to attract people with communal values. Women are more likely to have communal values and this is often provided as a cultural explanation for gender gaps in STEM. But the open science community is a part of STEM that sees communal practices (specifically, the publication of data and code along with information about findings) as key to improving science. This is in contrast to the reproducibility community which has legitimate criticisms of established scientific practices but does not emphasize pro-social practices in the same way as the open science community.
In sum I think the paper is useful by
1. Showing that while "open science" and "reproducibility" have some superficial similarities they are distinct communities with interesting differences.
2. Showing ways that the open science community seems more collaborative and communal and thus it seems attractive to women (and likely may be this way because women are helping to drive it).
The paper also has some shortcomings. Names are not gender and gender isn't binary. There's a lot of discussion about diversity and team science which honestly doesn't seem to have much to do with the empirical contributions of the paper.
Is this true even when excluding fields and papers where names go in alphabetic order?
I think the reason this works for pure maths and not some other fields is that pure maths is quite “slow”, in that people don’t release papers nearly as frequently as other fields, and the papers tend to be longer.
From time to time, I review scientific papers for Machine Learning journals.
In the modern machine learning theoretical papers of the type: We looked in problem A, here is a pattern B and it is the case because <math.>
Nearly all papers that I get for a review are applied papers: We took dataset A, model B, tweaked knob C and our result is better than paper D, but we do not know why. I am fine with this approach. Many good ML ideas were introduced in this way.
But I give reject if authors did not provide the dataset and code that reproduces results.
It is not an issue for other reviewers and my reject is often overridden by editors and papers get accepted, even if there is not way that anyone ever will be able to reproduce it.
If someone asked me for a list of where to find pseudo-science, it would be fairly small set: social psych, nutrition, AI.
if your field uses RCTs, it's a science. Whether the results are occasionally, due to incompetence or stats, wrong.
The issue with social psych et al. is that they are methodologically /incapable/ of producing reproducible results. It's association modelling of extremely complex domain's. Essentially superstitous.
Even then in many fields like comp. sci. the role of RCT's are pretty fuzzy (and many results like OO, FP are more superstitious than scientific). So I do consider much of C.S. to be more akin to philosophy than science. Still in say Physics the concept of RCT's don't apply directly, but the effect is the same underlying hypothesis -> experiment -> re-hypothesize -> experiment cycle. That cycle requires predictions of novel events which fulfills the same role, otherwise it's just data fitting.
One "technology" that could strongly improve our ability to do proper science without RCT's or even improve RCT or hypothesis formulation would be properly integrating causation into statistics (particularly via Bayesian methods). See the work of Judea Pearl  for a digest of current work on causal maths. Then scientific fields can clearly define the degree of effect or causation for various results.
Even in medicine/biology RCT's don't always confirm a theory as you can have overlapping effects where the tested variable does work but not for reasons in a given hypothesis. For example see Statins & cholesterol which work to a degree but largely because cholesterol is a necessary but not only ingredient to plaque formation and heart disease. RCT's proved early results of Statins, but new more powerful ones failed to materialize benefits. It turns out it's much more complicated (one random google result plucked for context ).
I imagine those cases in medicine as similar to "microbenchmarking" in C.S. where it's easy to see that small scale microbenchmarks are true but fail to improve the overall system performance. Sometimes they make the overall system worse. Medicine in particular doesn't do enough real science. RCT's are only performed at a "microbenchmark" level but rarely performed for the overall system, partly due to cost and difficulty. Hence why I think something like causal maths could really help society formally and explicitly estimate causal properties from groups of RCT's.
Sorry for the semi-rant, but it's been bouncing around my head for a few weeks now!