
Keeping computers from ending science's reproducibility - yummyfajitas
http://arstechnica.com/science/news/2010/01/keeping-computers-from-ending-sciences-reproducibility.ars?utm_source=rss&utm_medium=rss&utm_campaign=rss
======
deckard
Here's a practical solution I have proposed in my community (autonomous
robots):

1\. package code and data into tarball or VCS repo.; 2\. place package on
long-lived website; 3\. compute SHA1 hash or similar from package (if git is
used, this is the revision ID, conveniently); 4\. publish the URI and hash in
any paper that makes claims based on that code or data;

5\. as a reviewer, prefer papers that follow this method, all else being
equal; 6\. as an editor, suggest that submissions use this method.

(Edit 2: In case it's not obvious, the purpose of the hash is to allow users
to be pretty confident that the code they downloaded is indeed exactly the
code used in the paper. By putting the hash in the paper, I make this promise.
If I want to make an improved version available, I just put it up at the same
site, but I _must_ make the exact original available and identifiable as such.
This simple method of ensuring identifiability is our contribution.)

My group does this with every paper. I have a paper describing this method
coauthored with a student under review now at a good journal, and I'm looking
forward to seeing the response. (Edit 1: see link in comment below)

I'd also appreciate feedback from HN.

~~~
plinkplonk
"My group does this with every paper."

Any links?

~~~
deckard
Sure. We started doing this last summer, and this is the first paper accepted
for publication that uses the method.

<http://autonomy.cs.sfu.ca/doc/wawerla_icra10.pdf>

and here's a draft of our paper on the methodology, with rationale.

Long paper: <http://autonomy.cs.sfu.ca/doc/wawerla_submitted_2009.pdf>

Original short workshop paper:

<http://autonomy.cs.sfu.ca/doc/wawerla_rss09_workshop.pdf>

For context, here's the lab's publication list:

<http://autonomy.cs.sfu.ca/publications.html>

We plan to continue this, even if the idea doesn't catch on.

~~~
plinkplonk
From your draft paper,

"A few months ago, a graduate student in another country called me (Vaughan)
to ask for the source code of one of my multi-robot simulation experiments.
The student had an idea for a modification that she thought would improve the
system’s performance ... we were able to offer the requesting student some
code that may or may not be that used in the paper. This was better than
nothing, but not good enough, and we suspect this is quite typical in our
community."

Fwiw I've had a similair experience.

A few years ago I was reading Daphne Koller's "Using learning for
approximation in stochastic processes" and it was a very elegant and powerful
idea, but it didn't seem to me at the time that there was enough detail in the
paper to implement the algorithm and I wrote to Dr Koller and asked if she had
source code available. She replied that her student had departed a decade ago
and the code was effectively lost.

Dr Koller was very helpful and clarified some doubts I had and in the end I
managed to re implement the algorithm and so a happ ending after all but if
code was archived and made public for every paper, it would have really
helped.

Awesome that you guys are adopting this methodology.

------
andrewcooke
i had this problem many years ago (15?). at the time i was working as a
postdoc, calculating the evolution of the ionizing background with redhshift
from the inverse effect (lyman alpha clouds near quasars get fried by the
quasar; the extent of this gives an indirect way to measure the ionizing
background at that redshift).

i had a bunch of perl scripts (ah, those were the days) that mangled various
files before feeding them into fortran least-squares stats code that took a
day or so to run.

by the end, it was pretty much chaos. i was a self-taught programmer, these
were probably the second or third "significant" programs i had ever written.
nothing was documented, everything took so long that i couldn't check much...
i had bugs, of course.

in the end i published. maybe 6 months later i got an email from someone in
the states. they were trying to reproduce my results. in the end, they did (as
far as i know).

so the system worked.

incidentally (perhaps the only useful point here) they must have used
different data. that's something worth explaining in more detail - "my" data
came from years of painstaking work by a bunch of people working for my thesis
supervisor. yet 6 months later the results could be duplicated from a week or
so of data from a much more powerful telescope (the keck). so data in research
often aren't as critical as you might think. things progress at such a rate
that even if you don't share data, it's trivial to reproduce just a short time
later... (and i am pretty sure that this is true in gene sequencing, for
example)

~~~
frossie
But this is a problem that _can_ be tackled, and those who take it seriouslt
already do so. For example, in our pipelines we use an infrastructure that
always adds every command executed on the file, with every exact parameter, to
the metadata of the file, starting from one canonical archived file - and
hence, one can indeed reproduce manually the result of the pipeline given
sufficient time and dedication. [Edit: we also write the git shar1]

The same way that say, biology labs have processes they engage in in order to
convince us that their samples are not contaminated, we can have processes
that lead to the reproducibility of data.

~~~
andrewcooke
[is that frossie frossie? hi!]

i'm not sure what problem you're talking about - someone reproduced the
results with separate code and data, so what's to worry about? (i don't mean
that because it was confirmed it was ok, but rather that if it had been wrong,
we would have known in the end... after all, people make mistakes all the time
- science is a collective enterprise that relies on many overlapping,
interlocking pieces)

~~~
frossie
[Yeah, didn't I share an office with you 15 years ago? :-) ]

But data can't always be reproduced - Shoemaker-Levy won't hit Jupiter again
any time soon, and while big results will cause a rush for verification, the
little steps are usually believed as is. Moreover astronomy is a special case
where - let's face it - if we go down the wrong path for a decade or so,
nobody's bleeding.

Take on the other hand the processing of climate data - if crap engineering
(not malice) causes garbage to come out of the data, this is a problem for
everybody. So I think the OP is right in that proper auditing of computer-
processed science data should be possible, I was just pointing out that it's
not as hard (technically) as people think to achieve.

------
bioweek
Speaking of reproducibility, my friend getting a PHD in finance told me he was
writing a paper using the data from some brokerage. I asked if he would
publish the data and he told me it's confidential.

I talked myself blue in the face trying to explain how science doesn't work if
you don't give people enough information to reproduce your research! I
couldn't get him to understand though. Arggh so frustrating.

~~~
dantheman
It depends what type of science you're doing, and what stage the science is
in. When we are at the gathering data hypothesis building stage observations,
and case studies are important and are often based on confidential data.

------
jerf
I am often told that I should keep my nose out of other science domain's
business because they know more than I do. However, I think when they start
building their science on top of computers, I start getting a say again.
Here's what concerns me about this increasing use of computers:

It seems like the vast bulk of these simulations are iterative, and therefore
subject to mathematical chaos. How many of these researchers have any clue
what door they are walking through? How many of them know what a strange
attractor is? I'm sure the answer is non-zero; I'm equally sure the answer is
nowhere near 100%.

Small errors cascade even if you consider a non-chaotic classical model. (That
is, not that there is such a thing as an iterative model that is not
potentially subject to chaos, but rather than even if you don't understand
chaos you can see that small errors can cascade. Chaos just makes it worse,
and weirder.) A simulation will have bugs like any other large problem. A non-
programmer approaches bugs by banging on the program until it seems to
generate expected results. (About 50-80% of programmers do that too.)
Therefore, many of these simulations are simply reflections of the simulator's
expected result, due to the effect of the researcher's selection mechanism
running on the results of the simulations they run. How do we verify that this
is not the primary factor in the result of the simulation? _This need not be
conscious_. It need not be ideological, either; I can easily envision a
simulation that "should" return a boring or trivial result being monkeyed with
until it produces something "interesting", because the simulators think the
boring result should not obtain.

A lot of algorithms you can use in these simulations are fundamentally
unstable when used iteratively; some exacerbated by floating point errors,
some mathematically unstable even with perfect real numbers. How many of these
simulations use something unstable without even realizing it, given that it
could take a professional mathematician to work out whether that's the case?
Even algorithms thought to be stable and reliable can fall apart under
pathological situations, and one of the odd things about mathematics is just
how often you end up hitting those pathological situations when programming;
far more often than it seems like should be the case.

In information theory terms, a simulation can not contain more information
that the sum total of the input data and the content of the simulation
algorithm. How many simulators understand the full implications of that
statement? _I_ sure don't understand the _full_ implications of that, but what
I do understand makes me pause a bit. Very simple simulations with rules that
can be verified and initial data that is very solid I can deal with; for
instance, I like the cell-automata based social theories that show the spread
of information or political views or something, especially when it is clear
the researchers understand that it's only an approximation. But as the initial
data starts getting sketchy or the simulation grows enormous, I start getting
nervous about the actual information content of the output. Just because the
output _appears_ to be information doesn't prove that it is. It is vitally
necessary to be able to check the simulation against _real data_. For
instance, physical simulations of, say, cars crashing can be verified. How
many simulations can actually be verified, though? Frequently the reason
computers were reached for in the first place is the inability to do the real
experiment. Any simulation that can't be verified should be _presumed_
worthless by default. How often does that happen? (It's 20-f'ing-10 and "the
computer said it, it must be right" still runs rampant through our
culture....)

And of course there's the whole reproducibility issue, where the _absolute
bare minimum_ for science would be to publish the _full_ simulation program,
_all_ data, the necessary invocation and compile instructions to bring the two
together, and all necessary information to understand the input and the
output. Clearly, this is not something that fits in a journal paper, but how
often does this happen at all?

No, I am not referring to any specific discipline here and in particular I'm
not actually referring to climate science. I'm nervous about the whole
movement towards simulations in general.

Note that I'm not reflexively against the idea. Meet these bars and I'm happy;
give me enough data for reproducibility and verify that your simulation is in
fact simulating something real and corresponds to reality and I am happy.
(Many physical simulations fit in here.) But as more disciplines jump in I am
concerned that these bars are not well understood, and I'm seeing ever more
press releases about simulations that can't possibly meet these bars.

~~~
yummyfajitas
_I am often told that I should keep my nose out of other science domain's
business because they know more than I do._

I've only heard such statements coming out of a few fields: math education,
labor economics, climate science and psychometrics of race/gender. You should
ignore such statements; they are nothing more than an attempt to bully you
into accepting received wisdom from activists with a PhD.

As an actual scientist (rather than a political activist with a PhD), I
strongly encourage you to stick your nose into any or all of my fields
(quantum mechanics, PDEs, medical imaging, complex analysis, prediction
markets). If you come up with dumb ideas, I'll even explain why they are dumb,
rather than just demanding that you leave things to the experts.

~~~
btilly
Such statements also come out of biologists discussing evolution. This is not,
however, evidence that they aren't really doing science. Instead it is
evidence that they've been burned out explaining basics over and over again to
Creationists and want to get on with their lives.

However some do take the energy out for those explanations. One of the results
of their energy is <http://www.talkorigins.org/>.

Hopefully some day someone will take the energy to do the same with climate
science. Because as much as there is a lot of politics there, there is some
real science there as well.

~~~
lionhearted
> Such statements also come out of biologists discussing evolution.

A lot of my friends are scientists, including in a couple biochemists and some
other people that do more or less serious research into topics like that. In
my experience, you're off on how they deal with stupid people and stupid
arguments - instead of "get out of biology" and moving on with their lives,
they tend to address and correct errors, debate if necessary, or at least
point the person towards a relevant piece that explains things and take new
criticisms and arguments and address them if necessary.

The scientists I know always been open to me saying stupid things (and
occasionally not stupid things) and correcting me if I'm wrong, or exploring
together if I might not be wrong. Good biologists don't say, "Get out of
biology".

> Hopefully some day someone will take the energy to do the same with climate
> science.

Oh, hah, I honestly responded quickly and missed the climate science analogy
originally. If you wanted to make an apologia for climate science, then I do
understand why you'd want to draw a biology:evolution:creationism to climate
science:global warming:deniers parallel.

I was responding to the "want to get on with their lives" part as being wrong
based on my experience, as scientists are usually rather encouraging and
tolerant of dissent. Climate science is not so much interested in people and
data which disagree with them, which is a pretty big problem.

~~~
btilly
I'm married to a biology PhD, and at one point spent a couple of years
watching people, including biologists, deal with a constant stream of
Creationists in places like talk.origins.

My experience is that if you're a personal friend, you get more serious
conversation. If you're someone they know but not so closely, they'll have the
argument if pushed but don't feel the need to actively educate. And if you're
a random person spouting on the Internet, it isn't worth their time to get
involved. If pushed to be involved, they don't feel the need to be pleasant
about it.

After spending time myself explaining the same thing over and over again, I've
come to feel the same way. I've also come to realize that there are plenty of
smart people who do not wish to be educated. Including in my direct personal
experience, at least one PhD in mathematics and another in molecular biology.

Based on this experience I have some sympathy for the position of climate
scientists. You spend your life climbing around glaciers in Greenland, and you
don't really feel like spending the rest of it convincing people who don't
want to bother learning the basics about climate.

------
jvdh
A similar argument actually goes for experiments that are somehow affected by
computer networks.

If scientists use grid-computing, cloud-computing, or just the plain regular
Internet, there is no way to accurately reproduce results for distributed
applications. Luckily, some researchers are aware of this and there are now
some projects starting to make testbeds and infrastructures to make
environments where experiments can be reliably reproduced.

~~~
sparky
Links, for the curious:

<http://www.planet-lab.org>

[http://www.hpl.hp.com/open_innovation/cloud_collaboration/cl...](http://www.hpl.hp.com/open_innovation/cloud_collaboration/cloud_technical_overview.html)

<http://cloud.cs.illinois.edu/>

<http://www.cs.duke.edu/courses/spring08/cps214/project.html> (Testbeds and
Emulation section)

~~~
wmf
PlanetLab and the clouds are great ways to do _non_ -reproducible research,
because the other users in the system (where sometimes "the system" is the
public Internet) create interference. Reproducible networking research either
has to be simulated or has to be run on an isolated testbed such as Emulab.

~~~
sparky
The results from PlanetLab et al. may not be deterministic, but you should be
able to reach the same conclusions based on repeated experiments. Otherwise,
your results may be a little too fragile to form the basis of sweeping
conclusions.

This may not be ideal, but it is no worse than any branch of science which is
not purely digital.

------
m104
My (biotech) employer largely solves this issue by keeping a copy of the
formal research specs outside of software altogether. All validation documents
and research data are kept in paper form (in addition to digital form) in such
a way that future researchers or inspectors could take those documents and
data and reconstruct the research.

It wasn't always this way, unfortunately. I've been involved with trying to
glean some formulaic/methodological insights from spreadsheets and code and
it's not always possible to reverse engineer the essential methodology or be
sure that mistakes were avoided.

Proper scientific research practices are similar to proper data backup
practices: the documentation (backup files) are important, but they don't
matter if you can't have successful reproducibility (restoration).

~~~
bbgm
In the life sciences provenance is as important as reproducibility, and you
can use software systems to manage provenance. The paper requirement will go
away over time. Even the FDA, which required paper in the past, is moving to
an electronic model. But you do need to document methods somewhere and not
just in code, a reference document which could be shared, even if the specific
implementation is different

------
ynniv
I have a strong (probably unusual) standard when it comes to computer modeled
science: if explanation requires software or data that I don't have access to,
I completely disregard it. That might sound kind of crazy given the state of
science and my chosen profession, but (maybe because of my chosen profession)
I know that a complex system can tell you whatever you want it to tell you. It
won't be the truth, but it will be "convincing" to most people. If the
research really mattered, someone would reproduce it with independent software
and data anyway.

I think this is a good solution to the problem. If most people only believed
in reproducible science, there would be pressure on authors to use software
and data that can be shared, or no publisher would carry their article. Seems
easy to me.

------
alex_stoddard
As a working programmer in biological research science I would love to see a
requirement that papers involving software be published in the "literate
programming" paradigm. At the very least per reviewed publication must include
all software be open source (a least in a loose sense). It is depressing how
often a published result depends on custom closed source software.

------
khaless
Hmm, from what I remember, Reproducibility became a big issue in SIGMOD. In
2008 they introduced some new requirements when submitting a paper.

[http://behind-the-enemy-
lines.blogspot.com/2007/08/experimen...](http://behind-the-enemy-
lines.blogspot.com/2007/08/experimental-repeatability-requirements.html)

------
abscondment
Reminds me of an interesting Wired article from 2008: "The End of Theory: The
Data Deluge Makes the Scientific Method Obsolete"
[http://www.wired.com/science/discoveries/magazine/16-07/pb_t...](http://www.wired.com/science/discoveries/magazine/16-07/pb_theory)

~~~
earthboundkid
A fun article, but Peter Norvig of Google has said he feels like it severely
misunderstands his work.

