

Secret Computer Code Threatens Science - emcl
http://www.scientificamerican.com/article.cfm?id=secret-computer-code-threatens-science

======
bravura
Back in my academic days, I became a proponent for open-notebook science.

Far too often, researchers never release code because it's never "polished"
enough.

So I began publishing my code on github from the moment I started the project.
e.g. <https://github.com/turian/neural-language-model>

However, some of my more conservative colleagues were averse to this approach.
I constantly debated with my office-mate, who was of the opinion that there
are many reasons not to present half-finished work.

So there is also a large cultural barrier to more open science. If there were
publishing pressure on researchers to open their code, then it might effect a
cultural shift.

~~~
ender7
Every piece of academic code I have ever seen has been an unmitigated
nightmare. The sciences are the worst, but even computer science produces some
pretty mind-crushing codebases.

So, I don't blame them for being embarrassed to release their code. However,
to some degree it's all false modesty since all of their colleagues are _just
as bad_.

Add into this the fact that no one in academia understands version control
systems, and it's a hard hill to climb.

~~~
ahelwer
Perhaps this attitude contributes to the problem (I am not excepted from it).
If you were given an earful about how your code is an unsalvageable flaming
pile of rubbish by everyone you showed it to, would you want to release it
alongside a paper to which you've given years of your life?

~~~
jordanb
Yeah no kidding. There's a huge difference between the disposable one-off code
produced by a scientist trying to test a hypothesis, and production code
produced by an engineer to serve in a commercial capacity.

The original transistor, produced by Shockley and team at Bell Labs, "worked"
only in a nominal sense. It didn't do anything other than prove a concept. To
turn it into something usable in real equipment took years of effort by other
scientists, and engineers. Thank god they published the details of it rather
than saying "we made it and it worked, here are the results" because they were
afraid of releasing something that was "a pile of rubbish."

------
jgrahamc
Nice to see. Back in February I had a paper in Nature (with two co-authors)
arguing for the same thing
([http://www.nature.com/nature/journal/v482/n7386/full/nature1...](http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html)).
With this paper in Science
(<http://www.sciencemag.org/content/336/6078/159.summary>) it means that the
top two journals in the world have now published papers arguing for source
code openness.

Probably time for an international cooperation on defining open code policies:
[http://blog.jgc.org/2012/04/more-support-for-open-
software-i...](http://blog.jgc.org/2012/04/more-support-for-open-software-
in.html)

------
Tichy
Hm, shouldn't the articles contain the information necessary to rewrite the
code? Then rewriting the code could be seen as replicating the experiment.

Both sharing and not sharing seems to have pros and cons. For example if the
code is buggy and shared, odds might be higher that the bugs will never be
found because nobody will bother trying to write the code again.

~~~
drucken
I completely agree. If this is publishable science, then a strong and
reproducible description of the science and algorithms used is all that is
necessary.

I would be very interested if the author had actually given even a single
instance where the lack of software code that merely _implements_ the
experiment has completely impeded progress on the science in a paper. Even if
this were the case, would that not imply simply more algorithmic detail is
required?

Of course, for all of the above, I am referring to non-computer science. There
may be special circumstances in computer science where the code itself _is_
the published algorithm or an intended description of the underlying science.

~~~
qznc
Also agree.

For an example of a specific circumstance consider theorem provers, because
the proof is usually too large for a paper publication. The Archive of Formal
Proofs (AFP) [0] is a repository for Isabelle proofs, which my collegues use.
They submit a proof to AFP and write a paper about the results, where they
cite the AFP publication.

[0] <http://afp.sourceforge.net/about.shtml>

------
strictfp
All researchers know that they should release their code. The problem is that
they are just bad programmers. Programming has turned into a required skill
for many scientist, but the school system is lagging behind. So right now we
have all these scientists lacking fundamental skills. Sooner or later this
will be recognized and schools will then hopefully accept programming or
computer science as just another subject in the curriculum.

------
tzs
If the programs are actually important in the production or verification of
the research, then don't we want peers who try to independently reproduce the
experiment to also independently develop their own programs, so that their
reproduction is truly independent?

~~~
mistercow
Possibly, but it would also be good if the source code were available for
scrutiny by independent scientists.

------
CJefferson
My biggest problem with releasing code is that it people will expect support.

I released some code that only compiled on visual studio 6, with a specific
version of a fairly expensive library. I got several emails asking for a mac
or linux version, rather an update for more modern compilers.

Personally I would have preferred people just reimplement the code from the
paper. I suspect for them it would be less work.

------
rlvesco7
I've often thought there should be an open code/data license that restricts
usage and dissemination only to those who agree to make their code and data
equally available.

Why? Because in many fields there is a negative incentive to provide code and
data. It not only takes time, but it opens you up to criticism by people who
wouldn't be willing to make their own code/data available. Perhaps something
like this would raise the bar and encourage more people to share their
code/data. Just a thought.

------
raphinou
I experience it first hand as I am implementing a machine learning algorithm
described in a paper. There are questions arising on what and how they did
their experiments and on details of the algorithm, which I can't deduce from
the paper . Hence, I'm guessing but still unable to reproduce their results.
Leaving me to wonder if I have a bug or if I misinterpreted something....

~~~
qznc
Which just means that the paper was incomplete.

The more interesting question is, how we can check a paper for completeness. I
fear the answer is to try and implement it, which is costly for doing it in
the peer review process.

------
alexkappa
The title is somewhat misleading. It made me think of a "secret code" hidden
behind a bush waiting to cut the throat of Science...

------
cek
Pedantic, I know, but: Source code. Not source codes.

Seeing this made the author lose credibility on the subject.

~~~
ajax77
Pedantry isn't so bad, if it's correct... As was stated, the term "source
codes" is very common in many communities, particularly scientific computing
and associated academic circles. The only credibility lost here is... well
let's just move on here, shall we?

------
bbgm
To some extent it is culture and the current incentive model, and to some
extent it's just a need to be pragmatic. If you're a grad student who wants to
defend in a certain amount of time and you have various deadlines
(conferences, concerns about being scooped), you end up hacking up some code
that gets your work done, allows you to analyze your data and publish. That's
what gets you recognition, helps you defend, etc. In some cases, the code is
your work and those groups spend to spend more time on making sure the code is
robust, re-usable, and sustainable.

In general though the system doesn't encourage you to follow good practices at
all. Having said that I've definitely seen a change over the last few years
towards more awareness.

------
bbgm
Various people, e.g. Titus Brown, have been trying to take a different
approach. Titus doesn't practice open notebook science, but he does try and
practice "replication". More on replication:
<http://ivory.idyll.org/blog/apr-12/replication-i.html>

The paper gets it own website: <http://ged.msu.edu/papers/2012-diginorm/>;
which includes arXic preprint, data and code repositories and even an AMI with
everything loaded. Basically eveything you need to replicate the work in the
paper.

------
shawn-butler
Seems to me that most finished academic papers including dissertations, etc in
the field of computer science also lack source code. Alot of institutions even
discourage by policy the submitting of source to examiners unless it is
illustrative of the text. Not based on any serious survey other than what I
read of course so I may be wrong in this belief.

------
sabalaba
Reproducible research is a really interesting topic. One of my good friends in
academia showed me how he was using babel (an emacs mode) to do literate
programming and reproducible research. I think itsa fantastic idea, the data,
conclusions and code used to arrive there should all be part of the peer
review process; open source research.

