
How a coding error fueled a dispute between two condensed matter theorists - SiempreViernes
https://physicstoday.scitation.org/do/10.1063/PT.6.1.20180822a/full/
======
southern_cross
This article harkens back to discussions of the "reproducibility crisis" in
science, as discussed extensively here just recently (see link below). Where,
in this case, not coughing up the code used in the simulations in a timely
manner led to an apparently unnecessary multi-year dispute.

[https://news.ycombinator.com/item?id=17702380](https://news.ycombinator.com/item?id=17702380)

~~~
BeetleB
I'm glad this is getting to the fore. Depending on which forum I'm in, I'm
heavily downvoted when I talk about reproducibility problems in hard sciences
- particularly physics. My thesis was in condensed matter, and (in those
days), no one published code. There were a _lot_ of pointless arguments like
the one in this submission (although usually not as heated) - all because
professors were against sharing their code. It was routine to hear researchers
discuss whether they believed a particular paper. Why should belief be a
factor?

Simulation code contains a _lot_ of numerical tricks and hacks - and the
papers _never_ discuss them.[1] So reproducing the results of a paper were
mostly pointless, and the general practice was "Let's see if we can reproduce
their findings using _our_ technique". If you got somewhat similar results,
you'd publish. If you didn't, you'd sometimes publish (remember, there's a
bias against negative results).

[1]In fact, one reviewer even insisted that those details be removed because
the journal was about publishing interesting new findings in science - not
interesting techniques to make the computation more amenable.

------
ApostleMatthew
My PhD adviser was a good friend of Chandler's, and a lot of my doctoral work
was based on previous work done in his lab. I'm honestly really surprised one
of his students/post-docs made a mistake of this magnitude -- generating a
velocity distribution which does not follow the Boltzmann distribution is not
something I'd expect to be a result of code coming from his lab. But we all
make mistakes, and this really just lends more weight to the idea that all
code used in producing published results should be posted along with the
article itself, just like data sets often are (at least from studies with
government funding).

~~~
ur-whale
Refusing to share their code falls under the "we all make mistakes" umbrella
you think?

~~~
whateveryou381
Sharing code is not always the answer. Independent thinking implies
independence, in code and more abstractly.

The real 'mistake' is allowing the drama to escalate to the point that it is
toxic. People who are truly interested don't care about who is right and who
is wrong.

When disagreements like this come up, having common and good test cases are
probably the most important (and is indeed the way in which the problem,
generation of non-Boltzman behavior, was found).

In the end ST2 itself is flawed, and Princeton admitted that the discussion of
water was not significantly advanced through this drama. Is it worth it?

Understanding the argument and its importance should be the focus.

~~~
YorkshireSeason

       Sharing code is not 
       always the answer
    

Why let the perfect be the enemy of the good?

Sharing code is not the _full_ answer, especially in case of Monte-Carlo
simulations in physics, because that kind of algorithm is hard to test: What
is the testing oracle? What is the specification?

But sharing code is _part_ of the answer.

Setting up a culture where it is unacceptable to submit a paper _without_
open-sourcing the code _and_ suitable testing (for simple edge ases), _and_
suitable scripts that make reproducing the software simulations easy, is good
scientific 'hygiene'. See for example [1, 2] for efforts towards reproducible
software submissions in computer science.

Reproduciblility is the very essence of the scientific method.

[1]
[http://evaluate.inf.usi.ch/artifacts](http://evaluate.inf.usi.ch/artifacts)

[2] [http://www.artifact-eval.org/](http://www.artifact-eval.org/)

------
fwdpropaganda
Having been in the academia I can tell you that from my experience there's two
main reasons why scientists are often reluctant to share their code:

A) embarassement at having other people see how bad it is. Not necessarily
wrong, but bad in other ways.

B) wanting to keep an advantage by continuing to publish on top of the work
already done, whereas anyone else wanting to get in on the same idea would
have to first re-build the original work.

~~~
ApostleMatthew
In my case, it is very much column A.

~~~
ur-whale
If you're embarrassed by your research code, how do you know it works, and how
do you know what you're going to publish isn't built on quicksand?

~~~
ApostleMatthew
Embarrassed by its cobbled-togetherness. Not in whether or not it works.

~~~
ur-whale
Not really answering the question, which was "how do you know it works?"

I would contend that if you have proven to yourself that your code works and
are therefore capable of proving it to other folks (via e.g. solid testing),
you should not be ashamed of the spit and glue.

It is research code, we all know what that looks like. But if, on the other
hands you haven't proven to yourself it works, then it's definitely something
to be ashamed of - scientifically speaking.

~~~
ApostleMatthew
It’s much the same reason why you wouldn’t go to work wearing a days old,
stained shirt and ripped pants — yeah, you’re more than likely going to work
just as hard and as well than if you were wearing clean clothes, but that’s
not going to stop people from judging you based on your appearance.

Most academics write code to just work. Not work well, or to be generalized,
or to be efficient, just work. And while that’s absolutely fine, as your
results being reproducible from the code is all that really matters, a lot of
people don’t see it that way and will only see code slapped together
haphazardly and dismiss you because of it.

~~~
mannykannot
This article shows very clearly why that does not work. The Princeton team
could not reproduce the Berkeley results, yet the inaccessibility of the code
meant that the latter persisted as a road-block for almost a decade.

Imagine if this reproducibility excuse were applied to experimental results
and technique: we don't have be careful or explain in detail what we are
doing, as reproducibility will take care of any errors. One consequence would
be that, as the current state of knowledge became less certain, it would
become less clear what to do next.

------
ur-whale
Modern science journals, in the same way they don't accept articles without
proper reference to previous work, should not accept research articles unless
there's a link to the source code in the article.

------
bigiain
> “I had and was very willing to share the code,” he says. What he didn’t
> have, he says, was the time or personnel to prepare the code in a form that
> could be useful to an outsider.

That's what this license is for:
[http://matt.might.net/articles/crapl/](http://matt.might.net/articles/crapl/)

~~~
nikanj
Once you release code, people googling you will find that code - and use it as
a measuring stick of your coding skills.

In todays rockstar-obsessed recruiting hell, I would not want any quick-and-
dirty hacks published under my name.

~~~
adrianN
Condensed matter theorists usually don't look for a job that measures them on
the quality of their research code.

~~~
nikanj
Tenure track is a cruel funnel, and if you don't make the cut, it's time to
find a job outside of academia.

