I am one of those people who actually did the extra weeks/months to properly test/review/document/release my code and data sets (you can apt-get install my "research artifacts").
In retrospect, it was a poor use of my time and a poor use of my sponsoring institution's time. "apt-get install" is NOT what we mean by "reproducible" in science.
High-quality or easy to install code is not necessary for a result to be reproducible. True reproduction would mean coding the algorithm from scratch by following the written description, and that's what it literally means in most other fields of science.
You can't download & install a Large Hadron Collider in an afternoon. Does that mean the LHC experiments are "not reproducible"? Of course not.
But that's not even the important point. The really important point is that, in most cases, high-quality code is not even sufficient for a result to be reproducible! See: the article we're discussing.
IMO, the "blindly rerun the code" definition of reproduction is actually a HUGE barrier to creating a true culture of reproducability in computer science. It results in super lazy reviewing where "public source code that's easy to install and puts the correct-sounding shit into STDOUT" becomes a stand-in for "paper actually describes a novel idea in enough detail that it can be truly reproduced".
> So what is the optimal outcome here?
An optimal allocation of scientists' time and effort.
As a scientist who has actually done that leg work, I don't think packaging code so that it runs with a single click is the best use of public money in science in 99.9999% of cases. That time is much better spent on writing and other dissemination explaining the ideas that make the code work (in some cases well-documented source code is the best description but in other cases prose is much more effective and illuminating). Or on coming up with new ideas that are even better than the old ones.
Which I guess is just another way of saying that scientists should spend their time on science, not engineering.
P.S. When shitting on "scientists" for not being good enough software engineers, please remember who's going to be doing the actual work you're demanding. It's mostly phd students who make $30K/yr. And they have to do this work in their free time because their 60 hr/wk day job is fully allocated to doing the actual science. I.e., treat scientists who maintain their code as you would treat FOSS contributors who are making 5x-10x+ less than you while working longer hours. Because maintaining high-quality code is something they are almost certainly doing in their free time.
>
High-quality or easy to install code is not necessary for a result to be reproducible. True reproduction would mean coding the algorithm from scratch by following the written description, and that's what it literally means in most other fields of science.
Which failed here. However, you're going to have a difficult time convincing many people that the attributes you described would be bad properties, just that they might not justify their expense.
> You can't download & install a Large Hadron Collider in an afternoon. Does that mean the LHC experiments are "not reproducible"? Of course not.
By the same token though, the LHC repeats experiments and solicits feedback on how to improve their methods, which they go to great lengths to publish and simulate, because they're aware of this problem.
> IMO, the "blindly rerun the code" definition of reproduction is actually a HUGE barrier to creating a true culture of reproducability in computer science. It results in super lazy reviewing where "public source code that's easy to install and puts the correct-sounding shit into STDOUT" becomes a stand-in for "paper actually describes a novel idea in enough detail that it can be truly reproduced".
Ah, yes. Yes. "If this code is TOO reproducible then people might reproduce it, and handwave handwave the quality of papers would decline.
That's certainly NOT the case in pure CS papers, which have only improved since the days when folks felt that "Lenses, Bananas and Barbed Wire" was how folks should go about writing papers.
Now, physics might be different. But there is surely a middle ground between, "I've shipped you a LHC just plug in in lol" and "This paper doesn't even remotely describe how we achived the results."
If you believe that wasting the time of scientists is bad, then surely you're for clear papers with accurate descriptions of the methods so that those who go and reproduce your work are not sent on wild goose chases?
> As a scientist who has actually done that leg work, I don't think packaging code so that it runs with a single click is the best use of public money in science in 99.9999% of cases.
No, we got that part. But surely someone does and maybe you can design your work to leverage that rather than reproducing and discarding scaffolding. My big concern here is that a lot of scientists (like you claim to be) are underqualified and unpracticed at software, and thus are surely seeing at least some aspect of their work distorted by software and hardware issues.
> Which I guess is just another way of saying that scientists should spend their time on science, not engineering.
Scientists are not going to be able to escape engineering. No one else is going to build what they need besides them.
> please remember who's going to be doing the actual work you're demanding. It's mostly phd students who make $30K/yr. And they have to do this work in their free time because their 60 hr/wk day job is fully allocated to doing the actual science.
Yeah, I'm aware. I suspect their lot would be better if your attitude wasn't that their work is disposable and unimportant.
I am one of those people who actually did the extra weeks/months to properly test/review/document/release my code and data sets (you can apt-get install my "research artifacts").
In retrospect, it was a poor use of my time and a poor use of my sponsoring institution's time. "apt-get install" is NOT what we mean by "reproducible" in science.
High-quality or easy to install code is not necessary for a result to be reproducible. True reproduction would mean coding the algorithm from scratch by following the written description, and that's what it literally means in most other fields of science.
You can't download & install a Large Hadron Collider in an afternoon. Does that mean the LHC experiments are "not reproducible"? Of course not.
But that's not even the important point. The really important point is that, in most cases, high-quality code is not even sufficient for a result to be reproducible! See: the article we're discussing.
IMO, the "blindly rerun the code" definition of reproduction is actually a HUGE barrier to creating a true culture of reproducability in computer science. It results in super lazy reviewing where "public source code that's easy to install and puts the correct-sounding shit into STDOUT" becomes a stand-in for "paper actually describes a novel idea in enough detail that it can be truly reproduced".
> So what is the optimal outcome here?
An optimal allocation of scientists' time and effort.
As a scientist who has actually done that leg work, I don't think packaging code so that it runs with a single click is the best use of public money in science in 99.9999% of cases. That time is much better spent on writing and other dissemination explaining the ideas that make the code work (in some cases well-documented source code is the best description but in other cases prose is much more effective and illuminating). Or on coming up with new ideas that are even better than the old ones.
Which I guess is just another way of saying that scientists should spend their time on science, not engineering.
P.S. When shitting on "scientists" for not being good enough software engineers, please remember who's going to be doing the actual work you're demanding. It's mostly phd students who make $30K/yr. And they have to do this work in their free time because their 60 hr/wk day job is fully allocated to doing the actual science. I.e., treat scientists who maintain their code as you would treat FOSS contributors who are making 5x-10x+ less than you while working longer hours. Because maintaining high-quality code is something they are almost certainly doing in their free time.