
Differences between open science and reproducability - squibbles
https://doi.org/10.1073/pnas.1921320117
======
groceryheist
This paper is about differences between the academic community discussing open
science and reproducibility. One main takeaway is perhaps unsurprising: the
open science community is apparently more collaborative, with a more connected
co-authorship network and more use of "prosocial" language in paper abstracts.

The more interesting finding is that there are more women in high status
(first or last) author positions in the open science community (when the
number of authors is smaller) and there is an increasing trend in this
community of women occupying such positions. This would be predicted by
theories that STEM is an individualistic enterprise less likely to attract
people with communal values. Women are more likely to have communal values and
this is often provided as a cultural explanation for gender gaps in STEM. But
the open science community is a part of STEM that sees communal practices
(specifically, the publication of data and code along with information about
findings) as key to improving science. This is in contrast to the
reproducibility community which has legitimate criticisms of established
scientific practices but does not emphasize pro-social practices in the same
way as the open science community.

In sum I think the paper is useful by

1\. Showing that while "open science" and "reproducibility" have some
superficial similarities they are distinct communities with interesting
differences.

2\. Showing ways that the open science community seems more collaborative and
communal and thus it seems attractive to women (and likely may be this way
because women are helping to drive it).

The paper also has some shortcomings. Names are not gender and gender isn't
binary. There's a lot of discussion about diversity and team science which
honestly doesn't seem to have much to do with the empirical contributions of
the paper.

~~~
dependenttypes
> there are more women in high status (first or last) author positions in the
> open science community

Is this true even when excluding fields and papers where names go in
alphabetic order?

~~~
devaboone
Which fields put names in alphabetical order? In medicine, at least, deciding
who goes in the first and last name spots is a really big deal, and fights
over this sometimes ruin professional relationships.

~~~
joppy
Author names in alphabetical order is pretty ubiquitous in pure mathematics.
There are also usually few (1-5) authors on a paper, and some people who
contribute a small amount to a paper (say, a nicer proof of a lemma) are not
listed as an author, but thanked elsewhere in the paper.

I think the reason this works for pure maths and not some other fields is that
pure maths is quite “slow”, in that people don’t release papers nearly as
frequently as other fields, and the papers tend to be longer.

------
johndoe42377
There is no science without reproducability. Merely stories, or formalized
stories people call models.

~~~
ternaus
Not all people share your point of view. :(

From time to time, I review scientific papers for Machine Learning journals.

In the modern machine learning theoretical papers of the type: We looked in
problem A, here is a pattern B and it is the case because <math.>

Nearly all papers that I get for a review are applied papers: We took dataset
A, model B, tweaked knob C and our result is better than paper D, but we do
not know why. I am fine with this approach. Many good ML ideas were introduced
in this way.

But I give reject if authors did not provide the dataset and code that
reproduces results.

It is not an issue for other reviewers and my reject is often overridden by
editors and papers get accepted, even if there is not way that anyone ever
will be able to reproduce it.

------
analog31
My take-away from the abstract is that this is not about open science and
reproducibility _per se_ but about academic disciplines that have emerged to
study those things.

~~~
austincheney
Some academic fields have serious problems with reproducibility, such as
social psychology. The preference for social interaction over reproducibility
can in part be explained by subjectivity of academic field.

~~~
amitport
I just want to add that almost all fields have serious problems with
reproducibility. In my case: distributed machine learning... You may think
otherwise because subjectivity is not involved, but many "top-tier" papers
publish code and release parameters that, _if runnable_ , do not reproduce the
paper's results.)

~~~
mjburgess
Err.. well social psych and ML don't constitute "almost all fields".

If someone asked me for a list of where to find pseudo-science, it would be
fairly small set: social psych, nutrition, AI.

~~~
elcritch
Biological fields (cancer drugs, early stage drug research, etc) often don’t
perform much better. In many other hard science fields, like say graphene
production via CVP (as a friend worked in), researchers intentionally make
their work almost impossible to replicate. The papers look fine but the
authors quickly learn which parameters to exclude to make it almost impossible
to reproduce. Otherwise they’d loose competitive advantage, and potential
money from chip manufacturers, etc. Pretty much all people from toddlers to
phd researchers are masters of game theory, be it conscious or not. Luckily
there are also a decent number of scientists who try and counterbalance those
urges, but I believe the current socioeconomics are pushing them out.
Hopefully I’m wrong there.

~~~
mjburgess
RCTs are category-changing

if your field uses RCTs, it's a science. Whether the results are occasionally,
due to incompetence or stats, wrong.

The issue with social psych et al. is that they are methodologically
/incapable/ of producing reproducible results. It's association modelling of
extremely complex domain's. Essentially superstitous.

~~~
elcritch
Agreed, the amount of superstitious in many fields is mind boggling..

Even then in many fields like comp. sci. the role of RCT's are pretty fuzzy
(and many results like OO, FP are more superstitious than scientific). So I do
consider much of C.S. to be more akin to philosophy than science. Still in say
Physics the concept of RCT's don't apply directly, but the effect is the same
underlying hypothesis -> experiment -> re-hypothesize -> experiment cycle.
That cycle requires predictions of novel events which fulfills the same role,
otherwise it's just data fitting.

One "technology" that could strongly improve our ability to do proper science
without RCT's or even improve RCT or hypothesis formulation would be properly
integrating causation into statistics (particularly via Bayesian methods). See
the work of Judea Pearl [1] for a digest of current work on causal maths. Then
scientific fields can clearly define the degree of effect or causation for
various results.

Even in medicine/biology RCT's don't always confirm a theory as you can have
overlapping effects where the tested variable does work but not for reasons in
a given hypothesis. For example see Statins & cholesterol which work to a
degree but largely because cholesterol is a necessary but not only ingredient
to plaque formation and heart disease. RCT's proved early results of Statins,
but new more powerful ones failed to materialize benefits. It turns out it's
much more complicated (one random google result plucked for context [2]).

I imagine those cases in medicine as similar to "microbenchmarking" in C.S.
where it's easy to see that small scale microbenchmarks are true but fail to
improve the overall system performance. Sometimes they make the overall system
worse. Medicine in particular doesn't do enough real science. RCT's are only
performed at a "microbenchmark" level but rarely performed for the overall
system, partly due to cost and difficulty. Hence why I think something like
causal maths could _really_ help society formally and explicitly estimate
causal properties from groups of RCT's.

Sorry for the semi-rant, but it's been bouncing around my head for a few weeks
now!

1: [http://cdar.berkeley.edu/wp-content/uploads/2017/04/Lisa-
Gol...](http://cdar.berkeley.edu/wp-content/uploads/2017/04/Lisa-Goldberg-
reviews-The-Book-of-Why.pdf) 2: [https://www.statnews.com/2019/04/03/statin-
heart-disease-pre...](https://www.statnews.com/2019/04/03/statin-heart-
disease-prevention-more-than-medicine/)

