
If you're going to do good science, release the computer code too - rglovejoy
http://www.guardian.co.uk/technology/2010/feb/05/science-climate-emails-code-release
======
lutorm
I am not surprised that people find errors in code written by researchers and
grad students who have little training in software development and, perhaps
more importantly, are doing so in a culture which values them writing papers,
not good code. (See for example <http://lanl.arxiv.org/abs/0903.3971> for a
discussion of this situation in astronomy/astrophysics.)

I find it much more surprising that professionally developed software used for
scientific research is also error ridden. And while it might be difficult to
convince individual researchers to release their code, that's nothing compared
to the difficulties of convincing Wolfram research to release the source code
to Mathematica...

But I do think that research is somewhat undeservedly singled out for this,
just _because_ some academic software is open for inspection. Like the article
mentions, it certainly seems like the financial software has caused a lot of
badness. How about flight control software used by NASA that crashed the Mars
orbiter? Who knows how many innocent lives have been lost due to software
errors in military systems like UAVs and missiles. Maybe none, but we can't
know because it's all secret. Shouldn't they be required to show their code,
too?

~~~
smallblacksun
But the military and NASA don't claim to be generating reproducible knowledge
through the use of their code. In particular, the military doesn't WANT other
people to be able to reproduce what their code does. Also, there is a
difference between operational code (code that runs a physical object like a
lander or a UAV) and analytical code. NASA makes some of their code available
here: <http://opensource.arc.nasa.gov>

~~~
lutorm
Cool, I didn't know about the NASA open source project.

You are right that knowledge production isn't the purpose of those other
entities, of course. However, in my mind the purpose is less important than
the outcome -- why is it more harmful to society if scientists produce a
flawed scientific result than if the military kills innocents or the financial
sector brings on a market crash because of flawed models? They all hurt
society and could all benefit from more scrutiny. I admit the military case is
a stretch, but certainly the financial sector seems like a relevant example.

------
jackfoxy
If science is to remain science, and not devolve into mysticism, data and
computer models must be available to other researchers in order to repeat
experiments and provide knowledgeable criticism. Calling anything "settled
science" which is not openly available to all researchers is not scientific.

~~~
kurtosis
I have no beef with open audits of published science that is used in decisions
of economic consequence.

But I would only add that sometimes you learn a lot more from trying to
reproduce a result without the code/schematics of the original experiment. If
you implement it yourself and get a different answer, you should publish it
and not bias yourself by paying too much attention to the original authors
interpretation. As long as you can justify your methods you should be fine.

Also, I feel that it's a lot more fun to design an experiment knowing that
it's possible than it is to merely copy someone else's published procedure. A
month in the lab spares you a day in the library!

------
regularfry
A sound idea.

While I can imagine any number of reasons people might post facto not wish to
release code, if it were developed from the start with the intention of
releasing it, I think we'd all benefit.

Inevitably, the cost of doing so would increase the cost of the research, but
I believe it would be worth it.

~~~
anamax
> Inevitably, the cost of doing so would increase the cost of the research,
> but I believe it would be worth it.

I'm not convinced that it would increase costs.

I'll bet that there's a lot of reinvented code in science. If every project
released their code, new projects would start reusing code from current
projects. In some cases, that sharing and reuse would reduce costs.

~~~
JunkDNA
I have seen code reinvention in my career a number of times. In one instance,
I was actually asked to code up a method where the code and method had been
published in a scientific journal. When I asked why I should implement this on
my own, instead of using code developed by the group who published the method,
I was told, "Because you can't trust anyone else's code. It's better to write
everything from scratch so you know it's _right_ ".

I don't personally have the hubris to think I can code up a method better than
the people who invented it in the first place. That aside, it's just so
wasteful.

So instead of spending time on novel work _we_ were doing, I spent a month
implementing a half-baked version of something _other_ people had done.

~~~
btilly
As silly as the explanation was, there is actually a good reason to re-
implement. And that is that if nobody does, then any bug in the original code
will survive to cause problems with nobody knows how many results before
anyone catches the bug.

Reimplementing from scratch then comparing with the original gives an
opportunity to find such bugs.

~~~
barrkel
Yes, but such arguments apply at different levels of abstraction.

I doubt one would rewrite the OS, compiler or runtime libraries because they
couldn't be trusted; though all these can also have bugs.

~~~
btilly
One would probably not rewrite them. However people both can and do take their
software and run it on a different operating system, compiled with a different
compiler, linked with different run-time libraries, on a different type of
hardware. And yes, I've seen bad software assumptions flushed out by doing so.
(Don't use floating point for complex financial calculations please. OK??)

------
Lewisham
It's surprising how few Computer Science papers release code as well. I don't
care if it's platform-specific and it requires ridiculous numbers of obscure
libraries and only operates on proprietary data that you can't release. I
don't care, I want the code to be open-source. I want to see what you did, and
whether I believe that it does what you claim it does in the paper.

Where possible, I open-source everything I try to be published. There's only
one project I haven't (a scraper for the WoW Armory), but even then I released
the library I built for it.

There's no excuse to not do so. Unless you have something to hide.

~~~
lutorm
_There's no excuse to not do so. Unless you have something to hide._

Not true, for the same reason that commercial ventures don't like to release
source code even if they don't have something to hide.

Having a capable computer code can be a substantial competitive advantage and
make it possible to do studies no one else can. While this is less than
desirable from the standpoint of science, it's perfectly understandable given
the career pressures that individual scientists operate under.

~~~
j_baker
This creates a conflict of interests though. Is the research legit or has it
been "enhanced" to help a business venture the researcher has in the works?

~~~
lutorm
Oh, for sure. But I wasn't even talking about any business ventures (those are
rare in astrophysics...) but more about keeping your code under wraps to
prevent others from benefiting from your hard work. Especially, when (as I
said in another post), code development is not especially beneficial for your
career.

Though it's hard to find a situation where people don't have a (short-term)
incentive to make their work _look_ good. One can hope it will catch up with
them in the long run, but more likely by then they have a new job (and, in
academics, tenure) that will never hear about their past shoddy work.

~~~
btilly
The solution is to make peer reviewed code produced for a paper be considered
equivalent to a paper in tenure decisions. And for all papers in peer reviewed
journals that do computer analysis to be backed up by peer reviewed, published
code.

That makes code development beneficial for your career, gives an incentive to
not keep it under wraps, improves quality, and is likely to reduce the number
of published incorrect results.

Of course that is a pipe dream at this point, but what's wrong with dreaming?

------
maurycy
Finally. Finally a discussion about this.

~~~
timr
Enough with the false melodrama, please. Aside from the fact that your comment
is content-free and inane, scientists have been discussing this subject since
computer simulation first became a part of science. A lot of scientists _do_
share their code (I'm one of them, and I believe in sharing code). But there
are good arguments on the other side. Among them:

1) Papers describe methods in enough detail to reproduce them. If they don't,
there's a _serious_ problem.

2) Independent lines of verification. If simulation code becomes a reference,
it's inevitable that the same bugs/bad assumptions will contaminate an entire
field. Independent re-implementation of the same algorithms is a strong hedge
against this phenomenon (even if it means that there are more bugs overall).

3) Money. A lot of scientists fund their research in part through licensing of
implementations of their algorithms. I don't like it, but until someone gets
around to repealing Bayh-Dole (a _real_ scientific travesty, IMO), this is
going to continue to be a problem.

In short, what you really meant to say was that finally someone wrote a
_newspaper article_ about this subject. It's not a new discussion.

~~~
DaniFong
Closed academic publishing is intellectually bankrupt, and is probably one of
the greatest problems effecting research today. People don't share code, and
put a paywall between themselves and the public. There are open journals, but
they are rarely as prestigious, and so are not as valuable to those seeking
tenure. These academics put tenure before fruitful scientific discussion.

~~~
lutorm
So would you rather people publish in "low-impact" journals and then leave
science completely because they can't get a permanent job?

"Intellectually bankrupt" is a pretty strong term to use for people who work
for a small fraction of the amount of money normally talked about on this
site.

I'm not saying there aren't issues, but blaming the individuals who are trying
to make a living by doing science isn't going to help. The success rate of
getting permanent jobs in science might be higher than that of startups, but
the "payoff" is a small fraction.

~~~
DaniFong
I have not left science completely: I've made my own job. It is possible but
it is only made harder because of the closed system.

There are many of us who've left academia and still do science. We're
generally maligned, and removed from the ability to even participate in a
discussion due to a variety of academic access restrictions, and why?

What's more, day by day people are showing how to achieve scientific
credibility and influence through their blogs and paper hosting services like
ArXiv or, as Michael Nielsen points out, open journals like PLoS Biology. The
majority of scientists still bow to tenure pressure, and frankly I don't
understand why. There are other opportunities if you want to gain status, and
one doesn't even have to gain traditional academic status if one wants to do
real science. There are other options.

~~~
lutorm
Which academic access restrictions are you talking about? I know people who
have started independent "institutes" but the only reason you need to do so is
to receive federal funding. It's true that if you brand yourself as an
"independent researcher", people might be inclined to think you are a
crackpot, but publishing real papers should take care of that.

I'm not sure blogs are a relevant source for scientific studies though. Not
necessarily because I think peer review is the greatest system, but having
your paper published in an actual journal (open journals are fine) at least
means you managed to convince a few other people that it's worth looking at
the paper.

------
merraksh
There are a few examples of how this can be done. One of them is Mathematical
Programming Computation (MPC), a journal where articles submitted must be
accompanied by the source code that was used to produce the results. The
article is peer-reviewed, and the code submitted is tested by "technical
editors" to verify that the results are correct. See <http://mpc.zib.de>

------
moron4hire
Opening the source for research software is absolutely vital to the concept of
reproduceability. However, this fact of the level of programming training for
most scientists is a major issue. A lot of novice programmers tend to fall
into a trap of "it runs without error, it must be right." Even expert
programmers struggle with verifying that their results are correct;
technically, program verification is a mathematical impossibility. So it's a
daunting task to start with, reproducing results of software-based research.

This is only compounded by the fact that reading source code sucks. Source
code is an end result of multiple processes that occur in feedback loops. With
just the source code, you never see _how_ the code got that way. It's like
showing someone a maze with the start and end points marked but the middle of
the map blocked out.

Different programmer's conceptions of what constitutes good code varies
widely. One man's golden code is another's garbage. Just because the source
code is available doesn't mean anyone is going to understand it or be able to
work with it effectively.

Compounding this all is the fact that few people are going to _want_ to read
the source code. Analyzing source code is dull work, maybe the worst job a
programmer can take while still doing programming. Most programmers are far
happier to discard old code and start from scratch. This is often a bad idea
and doesn't lead to a better product, but at least you don't want to kill
yourself while you're doing it.

When it comes to reproducing algorithmic results, I would prefer having a
description of the algorithm, a set of inputs, and a set of outputs. I would
then write the actual code myself and see if I get the same results. This, I
think, is much closer to the concept of reproducing lab results in the
physical sciences. You wouldn't use the same exact particle accelerators if
you were verifying the results from a paper on nuclear physics. I'm afraid
having access to the raw source code will be used as a crutch where logic
errors are missed from reusing portions of code without much thought about the
consequences. Take, for instance, the subtle differences in implementations of
the modulo operator across programming languages:
<http://en.wikipedia.org/wiki/Modulo_operator#Common_pitfalls>

It would be great if scientific software were open. Unfortunately, it won't
matter a lick if it is.

------
jgrahamc
Yes, tell me about it: [http://www.jgc.org/blog/2010/02/something-odd-in-
crutem3-sta...](http://www.jgc.org/blog/2010/02/something-odd-in-
crutem3-station-errors.html)

------
eshi
I might be alone in this, but this seems like a symptom of the problems of IP
laws.

~~~
artsrc
One problem with IP laws is that to fully enforce them you need a police
state.

I don't know precisely what you are thinking, but my view is that the IP
framework should be: For a published work to be eligible for copyright, source
code must be published. Something like a cross between github and the library
of congress.

Publishing source code does not currently relinquish all rights. This would
add greatly to our societies store of knowledge and would help prevent the IP
theft in the code of published works.

~~~
eshi
This is sort of what I was getting at. I agree that releasing source code
shouldn't be a matter of giving up property rights. In fact, plenty of
commercial systems and software do allow source code access. However, it
always seems to be through messy licenses and cumbersome legal agreements to
not divulge anything.

As it stands, companies seem more motivated to protect their IP rights than to
produce tools that would keep science reliable. IMHO, companies view source
code as the product of their investments and secrets worth protecting. The
main fear seems to be that if these secrets are published competition could
use them against them by receiving a boost in their own R&D efforts by
deriving methods and processes from their own work.

This doesn't seem like just a software problem since I've heard wetware horror
stories from biotech and agriculture folks.

It honestly makes me wonder if software should be something you can patent. At
some level, it seems disturbingly similar to companies that patent colors,
genes, or derived living organisms.

------
albertcardona
The title contains the reason on why we created Fiji (<http://pacific.mpi-
cbg.de>): so that instead of releasing a Matlab script without documentation
on its many parameters and exact Matlab version used, as a print out (or
nowadays, downloadable .m file as supplementary material), we could offer
instead a ready-downloadable, version-controlled and fully working program.

A colleague of mine made similar remarks recently:

"... if you can’t see the code of a piece of ... software, then you cannot say
what the software really does, and this is not scientific."

