
Science Code Manifesto - michael_nielsen
http://sciencecodemanifesto.org/
======
noodly

      Without better software, science cannot progress.
    

Science can progress even without software - the only difference that software
makes is rate of progress and its cost.

    
    
      But the culture and institutions of science have not yet adjusted to this reality.
    

Did you ask yourself why ? Maybe, because what's now, just works for them ? I
didn't see any description of problems on this page, that this manifesto wants
to solve.

    
    
      The code is the only definitive expression of the data-processing methods used: without the code, readers cannot fully consider, criticize, or improve upon the methods.
    

The code is not as important as descriptions of algorithms, and the ideas
behind code - readers should not concentrate on the code, but on everything
that's behind it, and if they want to verify results, they should write code
themselves, in order to increase plausibility of the results by independent
verification - including independence of code.

I'm not buying this.

~~~
lmkg
> _Science can progress even without software - the only difference that
> software makes is rate of progress and its cost._

Depends on the science. Facial recognition software? That requires software to
make progress.

> _Did you ask yourself why ? Maybe, because what's now, just works for them ?
> I didn't see any description of problems on this page, that this manifesto
> wants to solve._

It is usually to the benefit of the _individual researcher_ to keep the code
under wraps, but not to the benefit of the _field of research_. A researcher
benefits from generating results. They do not benefit from other people being
able to verify their results. In fact, they may even benefit from raising the
barrier to entry, as it makes their own results more important in the field.

> _The code is not as important as descriptions of algorithms, and the ideas
> behind code_

In theory, this should be 100% true. In practice, results are determined as
much by parameters, implementation choices, and flat-out bugs, as they are by
the algorithms and the big ideas. I absolutely agree that people should re-
implement the algorithms, rather than share the bugs (Dijkstra wrote against
this). However, what happens when you find a discrepancy between your results
and what a paper said? Without having access to the original source code, you
can't figure out what caused the difference. Anecdotally, my girlfriend has
been in this situation before--she couldn't replicate a (computer vision)
result from someone else's paper. The original author was reticent to share
their own code, so she didn't know whether the difference was a bug on her
end, a bug on their end, or a platform difference. Open source code may have
helped answer that question.

~~~
noodly
> _Depends on the science. Facial recognition software? That requires software
> to make progress._

Like every computation - you can simulate it on paper :)

> _It is usually to the benefit of the individual researcher to keep the code
> under wraps, but not to the benefit of the field of research. A researcher
> benefits from generating results. They do not benefit from other people
> being able to verify their results. In fact, they may even benefit from
> raising the barrier to entry, as it makes their own results more important
> in the field._

It's to the benefit not only of the individual researcher, but also of the
institutions - because they don't need to upkeep code repositories - so
publishing is not as costly as it could be, when releasing source code was
mandatory. From field of research point of view - as I wrote earlier - -
source code is not important. It may be useful in some situations but only for
individual researcher (like your girlfriend).

> _Without having access to the original source code, you can't figure out
> what caused the difference._

I think that's good reason to share the code or for discussion between
researchers - but I don't think it's enough to make sharing code mandatory,
because your girlfriend could for example write article pointing out
differences between her result and previous result(s) and be done (assuming
she was certain about her result), without ever looking at others source
codes.

------
feral
In one way, I really agree with this initiative. I'd like to raise a counter
argument, about replication, though.

In a nutshell, if someone has to reimplement the code, from the details in the
paper they've read, its a great check that the original author isn't just
reporting the results of subtle bugs, or particularities, in their software.

Its true that opening the source allows other researchers look for bugs in the
code, and that's good. But such checks are inherently less thorough than
instead having another group replicate the results, in a separate environment,
just working from the published paper details.

One objection to this 'clean room' replication, is that perhaps its not
practical to re-engineer the code that was written for a paper - thats just
too much work. But that's basically saying "well, we cant replicate the
results of this paper, its too much work" - generally, that is not the way you
want to go about doing science. A cornerstone of science we are willing to
accept slower short term progress, in return for more certainty that what we
are doing is correct (and hence, hopefully, faster long term progress);
painstaking replication is a key part of this philosophy, I think everyone
agrees.

Consider the neutrinos: they did an experiment; others are trying to suggest
reasons for the surprising results. In the unlikely event the result lasts,
other scientists are going to want to redo the experiment. Ultimately, its not
enough to analyze the data, or experimental setup; you want to replicate, and
ideally, from the ground up - in a different lab, with different equipment -
and, with different code to process the results.

Now, maybe the manifesto is aimed at a world where its not possible to
replicate based solely on whats in the paper - perhaps there's just too much
detail that can't be included: and hence, a lot of the 'science' is inherently
tied up in the detail of the code. I'm sceptical about whether thats a good
way to do our science - but if we decide to go down that route, then the
'paper' as publication, the de facto unit of scientific output, is something
we will also have to really rethink; and reviewers are going to have to be
responsible for signing off on the code, which they don't generally currently
do.

I think there's an argument, currently, for telling people "I could give you
the code - but it'll only take you a couple of days to write your own code to
replicate the results, and it'd be much better if you could do that" - but I'm
sure this varies drastically across domains.

I'm not sure where I stand overall - but I don't think its black and white.

~~~
rflrob
> I think there's an argument, currently, for telling people "I could give you
> the code - but it'll only take you a couple of days to write your own code
> to replicate the results, and it'd be much better if you could do that" -
> but I'm sure this varies drastically across domains.

While I can definitely see a PI setting this kind of task to a student, I
don't think that by releasing the code the task suddenly becomes impossible.
If (and when) the two independently written programs disagree, it will then be
possible to immediately step through each and figure out where the point of
divergence happens, and which (if either) implementation is more likely
correct, rather than then starting an email chain with the original authors
saying, "we get different results, but we don't know why".

~~~
feral
If people continued to write independent programs, purely to replicate and
verify claimed results, then certainly it'd be beneficial for them to have the
source, in order to track down where the discrepancy arises. This is often how
it works currently, in practice, in my limited experience; you mail the
authors and ask them to help you track down the problem, and get either code,
or support; but I acknowledge this wont always work.

My concerns are that 1) I don't trust people to do the hard thing, and re-
implement to replicate, rather than take the easier way, and just use the code
that's provided. There's very little credit currently given for replicating
existing (even recent) results.

2) More importantly, when replicating a paper, thats a little vague, it'll be
more and more tempting to just peek at the source; and suddenly the paper
isn't the document of record anymore. I feel that if we go down the route
where the code becomes the detailed documentation of the scientific process,
that's a very fundamental shift from the current model, where the paper is
supposed to be repeatable, in and of itself.

If we go down that road, we probably need a whole different review
infrastructure; are reviewers really going to spend the time to review large
and hastily written scientific codebases?

I doubt it; so how does review work when: "The code is the only definitive
expression of the data-processing methods used: without the code, readers
cannot fully consider, criticize, or improve upon the methods." Will it no
longer be possible to criticise a paper for lacking sufficient detail to
reproduce the results? Will the reply be 'read the source' ?

Maybe that's just the way things are going to go. There's a lot to like in
that manifesto. But there's going to positive and negatives to letting the
source become the documentation. The discussion around the manifesto on their
website does not acknowledge such tradeoffs; its taking a pretty one-sided
view. Maybe that's just how you are supposed to write manifestos :-) But I'd
like to see some discussion of these tradeoffs.

------
maxs
Great thought behind this manifesto. I wish more authors would attach their
source codes to their manuscripts. Even when the code is not clean (academic
codes usually aren't written with extensibility in mind), it would be very
useful to be able to verify precisely what the code is computing.

Many times I've had occasions where the paper doesn't clearly explain
parameter values, order of updates in the simulation, etc.

A tip I've discovered: you may have a chance to get access to the source code
if you email the corresponding author (if they're still alive!).

------
bigiain
See also:

<http://matt.might.net/articles/crapl/>

"Toward these ends, I've drafted the CRAPL--the Community Research and
Academic Programming License. The CRAPL is an open source "license" for
academics that encourages code-sharing, regardless of how much how much Red
Bull and coffee went into its production. (The text of the CRAPL is in the
article body.)"

~~~
gfodor
The crapple? Seriously?

------
TheEzEzz
Yes!

I wasted almost 6 months last year attempting to replicate some results from a
paper I was reading. It turned out the authors of the paper hadn't included in
the paper a key addition to their equations, which they were nonetheless using
in their code. ANGER.

------
fluidcruft
I wonder if there are some unintended intricacies involving licenses that need
to be addressed.

For example, GPL lets you do lots of things privately but restricts your
ability to distribute. And there is a good deal of useful GPL glue out there
(Octave, R, etc). Sometimes it might be expedient to just use FastICA rather
than limit yourself to what's available and what the GPL might allow you to
distribute. This gets really murky because the FSF is not really clear at all
about what constitutes a derived work in scripting languages (which are
distinct because source and executable are indistinguishable--the FSF (i.e.
the advice you get from licensing@) plays pretty loose with this, choosing to
draw the line on a somewhat arbitrary in a case-by-case manner).

Or maybe academic publication automatically qualifies as a fair use exception?

------
da-bacon
The discussion document is interesting and spells out a lot for of what is
intended: <http://sciencecodemanifesto.org/discussion>

One thing that is missing is whether the quality of code should be used in
peer review. If you are going to make software essentially part of the
publication record, then is it valid to reject a paper because it's code is
spaghetti and not understandable. A paper whose derivation the reviewer cannot
follow because will get rejected, should a similar standard be applied to the
software?

------
mturmon
Good idea, good intentions? Yes.

Will anyone provide funding to support code doc and release? I don't see any
signs of it.

Is there incentive for PIs or grad students to do this? Not much.

Is there cultural pressure to do so? Very little.

Conclusion: this is not going to happen.

~~~
TheEzEzz
Cultural pressure must start with awareness.

~~~
mturmon
Quite true, and kudos to them for trying. Just being realistic about the
chances for results in the next decade or two.

Lobbying sponsors of research would be another pathway to adoption. It would
raise a lot of questions about proper allocation of resources between
consolidation and exploration.

------
cabalamat
Pirate Party UK are currently having an open policy consultation where anyone
can propose and debate policy -- <http://www.reddit.com/r/Policy2011/>

I have added the Science Code Manfesto to it --
[http://www.reddit.com/r/Policy2011/comments/lddvh/adopt_the_...](http://www.reddit.com/r/Policy2011/comments/lddvh/adopt_the_science_code_manifesto/)

