
No One Peer-Reviews Scientific Software - cwan
http://chicagoboyz.net/archives/10436.html
======
mattheww
I work in a fairly large scientific collaboration. Our data sets are on the
scale of a few hundred terabytes per year. So, there's a lot of software to
convert from the bits that stream out of the detectors to something that end
users (grad students) can analyze.

We have a software group devoted to maintaining the core code. Our code has a
nightly autobuild for development code and libraries are released every couple
months.

The code is based on the ROOT package (about one million lines of code)
maintained by folks at CERN and is well established within the community and
elsewhere. Our libraries probably come to about a million more lines of code,
which is maintained scrupulously by our collaboration.

Now neither of these sets of code actually do any analysis, they just make it
so that the data is usable. In my analysis, there is probably ~10k lines of
code to do the analysis and make the figures. In total, there's probably 500k
or more lines of analysis code that is not officially maintained.

That doesn't include all of the code loaded onto FPGAs and custom chips
reading out the detectors or the code in the trigger system. Nor does it
include code written into various simulators used to determine "expected"
response of the detectors.

So for peer review to look at code at the level suggested, people would have
to look over literally millions of lines of code. To even be able to make it
run, a user would have to set up an environment that would take between a few
days and a week.

Instead, when I go to publish something, I lay out both my method and how I
verified that it works. Reviewers then check that my method is sound and that
my verification looks right. They have to trust that I implemented the method
as I said I did.

In my collaboration, there is actually an internal review process that
verifies code runs and for which a much longer private note must be written to
explain all of the details of the analysis. However, there is still a high
level of trust that everything was implemented as stated. This review does not
qualify as "peer" review because it's conducted by people who will be listed
as authors.

While I agree that more review is better than less, I hope this comment
illustrates why software peer review is not a reasonable expectation.

~~~
lutorm
The code I wrote for my simulations is not that big, but it's certainly big
enough to be complicated and buggy. And it's GPL and available on my website.

And while I certainly don't expect any of my referees to go over the code
line-by-line, one of the rationales I had for GPLing the code was to make it
_possible_ for someone to replicate my study. One of the basic tenets about
publishing research is that articles should contain all the information needed
to reproduce the results. This is not possible for simulation papers unless
the source code is public (or maybe a binary is provided, but that has many
pitfalls by itself).

Results generated with proprietary codes could just as well be hokus-pokus in
my mind. It's not clear how it adds to the field unless others can reproduce
and build on the results. And with reasonably complicated codes, it's not
realistic to think that someone will implement their own version (though that
is of course the absolute best thing).

So while I don't necessarily think scientific software should be peer
reviewed, I'm very hesitant about results obtained with proprietary software.

~~~
spamizbad
What's your view on the mountains of scientific code that depend on
proprietary math kernels and in some cases proprietary compiler platforms?

~~~
lutorm
As you may guess, I'm unhappy about it. In astronomy, almost everyone uses
IDL, which is a proprietary, closed-source language. (It also sucks, but
that's beside the point.) I think this is bad, because it locks practically
the entire research field into a closed platform. (Though GDL is working to be
a replacement. I don't know how conformant it is, though.)

Still, I think this is less of a problem than the high-level scientific code.
Most of the low-level functionality is pretty simple and (hopefully) much
better tested than most scientific codes.

------
timr
I did my Ph.D. in a field requiring complex computer simulations. This
article, like most climate "skeptic" critiques, is a logical fallacy writ
large. A kernel of truth, surrounded by massive layers of exaggeration and
misinterpretation of fact.

First, the notion that anyone could "peer-review" software for correctness is
obviously absurd. We know this is an impossible standard. We accept that
software is a (probably buggy) model, and (in my experience) reviewers are
therefore _extremely_ skeptical of results from computer models -- to the
point that they're actually biased _against_ publication. And since reviewers
know that they can't verify correctness, they tend to look for theoretical
papers that very closely reproduce experimental results.

Second, this type of critique is pushing hard on a straw man. The implicit
assumption is that each paper is a stand-alone morsel of truth, and that the
peer-review process somehow _guarantees_ a paper's claims. This is not the
case, and no scientist believes it to be true. It doesn't matter that peer
review fails sometimes, because the system doesn't depend on it being
flawless.

Science is about process, not people. Peer review is an important part of that
process, but it's openly acknowledged that review has flaws. Thus, ask a
scientist about any particular paper, and while s/he may be enthusiastic about
the result or the theory, you'll almost never hear a scientist take the paper
for granted as fact. More often than not, discussion of an interesting paper
will center around designing _other_ tests that can independently reproduce
the paper's result. Ultimately, you have a system where a lot of genuinely
smart, skeptical people are attacking ideas from all sides until either a
consensus emerges, or the idea is destroyed.

The "skeptics" have it wrong, not because they're incorrect about human
nature, but because they're attacking a fictional vision of science that
exists only in their minds. They imagine a world revolving around a theory
that is supported by one piece of data, extrapolated by buggy computer models,
and presented by corrupt individuals who (for some unknown reason) have an
incentive to doctor the results in the same way. In reality, science is
massively redundant and competitive, computer models are treated as ancillary
data (at best), and individual corruption is exposed by the redundancy of the
system. If the "skeptics" put as much effort into understanding the science
they critique as they do trying to find "smoking guns" for individual
researchers, they would understand this dynamic.

~~~
jacoblyles
Then I think there is a certain disconnect between how scientists understand
science and how it is presented to the public at large. It certainly doesn't
help in this regard that many scientists involved in Environmental Science are
also passionate political activists. Maybe it is understood by scientists that
peer review has flaws, but when they appear on morning talk shows the term
"peer review" is wielded as a holy sword.

Regardless of the state of the current process, how could review become
anything but better from increased openness?

>"And since reviewers know that they can't verify correctness, they tend to
look for theoretical papers that very closely reproduce experimental results."

This seems to be part of the problem. "We'll use this data series for the 60
years that it agrees with everybody else, then throw it out for the 40 years
where it doesn't, then not show anybody the 40 years we threw out".

But then again, I'm no climate scientist. Although, one would think that no
harm would be done by releasing the data, even if there were legitimate
statistical reasons for truncating it.

>"Ultimately, you have a system where a lot of genuinely smart, skeptical
people are attacking ideas from all sides until either a consensus emerges, or
the idea is destroyed."

If you're a smart, skeptical person attacking a paper that shows global
warming to be severe, Michael Mann will get you fired. Or ignore your ideas.
Mann just published a paper that appears to use an inverted data series after
McIntyre already correctly pointed out that it was inverted, out of pride I
suppose. Fine science, that.

~~~
DanielBMarkham
_It certainly doesn't help in this regard that many scientists involved in
Environmental Science are also passionate political activists._

Absolutely.

I think we're going to have to have some kind of new certification system for
research that affects public policy. Scientists simply can not be activists --
the conflict of interest is too great. If you find a meteor is going to strike
the earth next year, you'd better be spending your time on orbital
calculations and mass estimates, not on the Today show. Leave that to the
politicians. Mind your knitting.

In the code arena I'll be charitable and say that mistakes were made at CRU.
Public-funded science should run from a public wiki or source control system
where the scientists are the only authors but the public at large are readers.
Yes, I know that will drive some scientists mad compared to the current
secretive system but it's time to let some daylight in, folks. You'll never,
ever win an argument with an honest skeptic if you're keeping secrets. As for
the dishonest ones? They're not your job. Mind your knitting.

~~~
timr
_"Public-funded science should run from a public wiki or source control system
where the scientists are the only authors but the public at large are
readers."_

The IPCC report is free and public, for exactly this reason. It's quite plain
from their arguments that few "skeptics" ever actually look at it.

 _"Yes, I know that will drive some scientists mad compared to the current
secretive system but it's time to let some daylight in, folks. You'll never,
ever win an argument with an honest skeptic if you're keeping secrets."_

Riiight. Because the politicians and lobbyists who advocate for the "skeptics"
are paragons of honesty and openness. Those dirty scientists could learn a
thing or two.

(I have to say...your comment is the most brazen example of black-is-white,
up-is-down spin I've ever seen on HN. Bravo.)

~~~
billswift
The IPCC report is a political document. It uses results that support its
position and ignores everything else. The entire book "The Deniers" is a
collection of complaints by scientists that the IPCC twisted their results to
support its agenda. Ironically, nearly all of the scientists still "believe"
in global warming, they just think THEIR work was misused.

------
jeremyw
Two myths my non-science friends understand to be true.

a) A peer reviewed paper is very close to fact.

b) Scientists are essentially free from backbiting, tribalism and empire-
building. They dispassionately rely on provable data.

Modify slightly to ignore any scientist funded by the wrong people.

~~~
troystribling
You may also want to add,

c) Data obtained with public money is considered non-proprietary.

I have spoken with several people who read about climate scientists seeming to
hide or be reluctant to share information collected with public money
interpret this as the scientist being evasive or untruthful. They are always
surprised when I tell them that project scientist, particularly on large space
and earth science projects, usually have proprietary access to the data for
some number of years as payment for involvement in support of the project.

------
araneae
This reminds me of one case where a person published a phylogeny, a reader
felt that it was flawed, and so they went over the code together. They
discovered a major bug which, when fixed, completely invalidated the results.
The paper was retracted. (I can't source this, because it was a conversation I
had with a postdoc and I don't remember the details)

In general, though, the fact that code isn't reviewed isn't the biggest
problem with peer review. The problem is that it's simply very difficult to
come up with the real problems if you're just reading a paper, even if they're
lab experiments- just like you can't see the code, you can't watch them
perform the experiments. With my own work, reviewers never actually pick up on
what are the _actual_ problems with the research (which I know only too well.)

------
DanielBMarkham
This is an excellent find. Thanks

 _How did we let this problem develop? I think it was simply a matter of
creeping normalcy._

Probably so. I know from working with various industries, such as military
radio guys, that gradually their jobs became all about software. Twenty years
ago you'd have one software guy and 30 engineers and now you have 30 software
guys and 1 engineer. It was very late in the day when folks sat down and said
"Hey! We're really a software organization now, and this stuff is really
important, so we'd better start tightening things up" I imagine some types of
academic research are in that same place now.

It just creeps in over many years.

Open Source Science is looking better and better.

------
tybris
*sigh, the general public will be heartbroken when they find out that scientists are in fact regular human beings. Science has become a surrogate for their faith.

------
swolchok
It's unusual to peer review software in computer science research as well.
There is no _requirement_ that you release the source or even the binary of
your shiny new system, especially in time for review. I've recently reviewed a
paper that had a reference to an open-source project associated with the paper
on a well-known site for hosting such things, but the project was content-
free.

------
tome
I think this article presents a brief, well written summary of an important
issue from someone who clearly knows what he's talking about and I would
recommend reading it.

------
RichKatz
Regarding: the "NO ONE" peer reviews (as the title of the article claims) and
"creeping normalcy" as was apparently implied. I can think of numerous
counter-examples in the software industry to what is stated about lack of
review. For instance:

1\. A company in the banking industry subjects every step of software
development process to code review, plus security code review, and Q/A testing
every time they release code for production.

So.. Shannon Rose wasn't talking about the banking industry or business code
review. The author was talking about scientific systems. Ok.

2\. Let's take a bioinformatic knowledge-base company. The code used to search
and extract information from the knowledge base, though proprietary, is
subject to code review and Q/A testing. In addition the knowledge base itself
vigorously Q/A tested, peer reviewed and signed off before release.

But Shannon Rose wasn't talking about bioinformatics? Really? Ok....

3\. Let's take a civil engineering company particularly, one who has been in
the nuclear engineering industry (such as it has been). Structural engineering
code used in nuclear power plants is constantly verified, running known
engineering cases against the code base and rigorously checking the result,
footnoting and explaining every detailed difference down to the nth decimal
place.

But Shannon Rose wasn't talking about actual critical engineering code ... But
wait. There's more.

The entire engineering software code base was open-source as are many
scientific software projects. Further more, it was looked at in detail by
physicists, engineers, and software engineers from within the company who
weren't even working on the project. The code base was thus open to
examination by engineers and scientists who use it as well as outsiders.

But, Shannon Rose wasn't talking about the nuclear engineering industry. No.
This article concerns a particular set of useful but not life-threatening
scientific finding that the author disagrees with - and where there are people
who oppose learning and understanding what the truth is and what it means.
Maybe that idea violate some persons understanding of some biblical text
somewhere because somehow humans happen to be responsible mucking up a whole
planet.

So, when that biblical text was written, who did peer review on that text? The
interpretation?

The fact is, I can think of countless examples where people who work on
scientific software, who after all believe that they are responsible for what
they do, because maybe they're just that way, take extra precaution to verify
their results and subject their work to review. (Because, after all, we don't
want another Three Mile Island, or Y2K).

How many examples do I have to cite where there are responsible people before
we can reject the idea that "no one" does peer review of any scientific
software?

A footnote: the author also wrote that they think the Obama administration is
"bringing martial law to the U.S." And why is this? Because the Bush
administration was too incompetent to put heinous criminals on trial, we won't
get to convict them.

And somehow this mistake is Obama's fault. If McCain had won it would
presumably be his fault instead. Because one president couldn't be responsible
enough to begin with. Khalid Sheikh Mohammed was actually put under military
tribunal and allegedly "confessed." Who did peer review on the trial of Khalid
Sheikh Mohammed?

Answer: No one. There was a document written up. No one signed it.

No. This article is not about science, or software. It claims instead there is
this general widespread failure by people who work and review in scientific
software. That claim is irresponsible on its own. It's follow-up conclusion
that this "completely irresponsible" software review can be regulated by so-
called "responsible" politicians (such as those who brought us Abu Ghraib,
torture, and the possible untriability of Khalid Sheikh Mohammed) can "clean
up" peer review is beyond hogwash.

Peer review certainly is an issue and always requires addressing. But I think
this author should take their overactive bile somewhere else. </rant>

------
spamizbad
Also these climate models are likely calculated on Intel hardware: a company
notorious for floating-point bugs. Even if we verify the software is correct,
unless we see the RTL that describes the logic behind these chips, it's all
just hocus-pocus as far as I'm concerned.

~~~
titusflavius
You can't verify software is correct, it can always fail at time t+1.

Basic configuration testing on PPC or other architectures would highlight a
lot of the error-sensitive paths in the math libraries.

