
How Bad Software Leads to Bad Science - danso
http://motherboard.vice.com/read/how-bad-software-leads-to-bad-science
======
swatow
I'm happy that people are discussing this, but I really don't like they way
the article uncritically repeats the message of the Software Sustainability
Institute.

The study by the SSI, only proves that people are writing software, and that
many of these people have no training in software engineering. The SSI claim
this training is necessary to prevent things like the retractions of journal
articles they discuss in the article. But their survey doesn't go as far as to
provide evidence that there is a link between formal training in software
engineering, and errors in scientific journals.

If this article was science, I would call it bad science, because the data
they present is not sufficient evidence for their hypothesis.

~~~
stinos
While you are right regarding hard data an evidence, I think the core idea is

    
    
        Problems arise when that software is designed by researchers who really don’t know what they’re doing when it comes to coding. A single mistake in the code can lead to a result that appears innocuous enough, but is actually incorrect. 
    

and I am actually sure that is, unfortunatley, very very much a real problem.
The reasons I believe so are anecdotal, but so overwhelming I cannot ignore
them. I personally prevented already a couple of false results of making it
into a paper. And that is just me, in just one research group, in just one
university. And that also happened by accident mostly (just staring at a guy's
screen during a conversation and noticing errors all over the place, redoing
an analysis for more subjects and figuring out it is completely wrong, ...).
The mistakes I've seen are close to terrifying. Scripts ignoring all input but
instead using a generated fixed dataset. Scripts always yielding 'good'
results even if you give them white noise as input. Basically, _all_ typcial
programming errors and antipatterns ever invented come together in huge god
scripts which produce the results, while in the background Matlab is spitting
out one warning after the other, console windows are screaming errors, and all
that is just being ignored by the researcher who thinks he/she just mastered
the skill of programming. Or knows he/she doesn't, but just doesn't care. As
long as there are good results, you know.

Oh yes I'm getting a bit sentimental now and it's not like that _everywhere_
and not _every_ researcher writes code like that. But if even me, with over 10
years of programming experience and all my automated builds and tests and
whatnot cannot reliably produce code without one mistake, how on earth are
researchers for which the programming part is often just a byproduct, a
necessary evil, going to do that?

~~~
venomsnake
If a scientist is good enough to design a proper experiment, he is good enough
to design a good program. The skillset is the same.

~~~
stinos
The skilset is similar maybe, but putting it in practice is something else. In
your claim you skipped the part where the scientist used his skills to
_become_ good enough to design a good program. As long as that learning
process didn't happen, and that takes time, he/she will not be good enough. As
such I know _excellent_ scientists which are known and praised in their field
and come up with inventive designs. Yet their code to do something as simple
as plotting a histogram is a mess.

------
guscost
As a general rule, we shouldn't take seriously any computer models for which
the source code is not available.

There is no valid reason why this should not already be a universal practice.
If you are looking at any paper which cites the output of a custom-built
computer program but omits the source code, it's probably safe to assume that
the program is a mess and therefore the conclusions are not (yet) robust.

~~~
crosvenir
I mostly agree, though if I had to choose between the model on which the code
was based or the code itself, I'd prefer the model.

While it would be great and preferable to have both, isn't the code just a
language/stack/programmer(s) specific interpretation of the underlying model?

~~~
ScottBurson
> isn't the code just a language/stack/programmer(s) specific interpretation
> of the underlying model?

There's the catch -- it _should_ be, but there's no way to verify it's correct
without comparing the source code against the model. (Well, I suppose if you
had access to the raw data _and_ the time to rewrite the software yourself
from scratch, you could do that and see if you got the same results. But who
has time for that?)

~~~
marvy
I don't know who has time for that, but it's far more reliable than auditing
source code. Two people using two different languages are not likely to code
up the same bug, but overlooking a bug in code someone else wrote is easy.
Besides, coding things up yourself is not much slower than a full audit:
understanding other people's code is hard. There's a reason that many
developers have an urge to throw out "legacy" code and redo it from scratch.

------
Thriptic
I am a researcher with 0 software development experience who is trying to
refactor a piece of software for image analysis that a former post-doc left
behind. Many individuals in our lab (including myself up until recently) write
software for analysis and simulation without any understanding of basic coding
best practices such as version control, unit testing, documentation,
formatting for readability, or even commenting. This comes back to bite our
lab in the ass frequently.

This occurs because many of our people take the code cademy course on python
for example and then assume they "know how to code" when they finish; if they
run into problems, they can simply consult stack overflow for anything they
don't know. They therefore never learn anything about formal software
development best practices unless they put in extra effort to do so, which
they generally don't.

I have learned a lot trying to refactor over the last few months, and I have
tried to pass on some of this knowledge to my labmates. We have actually come
a long way over the last few months. We now employ version control, use style
guides, and more robustly document programs. I am currently trying to sell my
labmates on the merits of unit testing but we'll see how that goes :)

~~~
stinos
Things like this make me happy. Not all labs have the proper atmosphere for
getting such improvements pushed through, so if you try and start succeeding
that's marvellous.

------
88e282102ae2e5b
If you're interested in improving this situation, the Software Carpentry
Foundation trains scientists in modern programming practices, and also teaches
scientists and programmers how to teach programming to others:
[http://software-carpentry.org/](http://software-carpentry.org/)

------
bioinformatics
It's not only a problem of bad software, it's a problem of lack of testing,
lack of proper development management (two factors of bad software, indeed),
but mainly a problem of poor or lacking documentation. Not code comments, but
actual documentation that allow other people to use the program. This leads to
improper usage, errors that cannot be solved and are not reported and/or
noticed by whoever is running the software.

Not to mention the publish-and-forget.

~~~
craigyk
Absolutely. But I think that testing is still the bigger problem; a lot of
scientific software is used in conjunction with other scientific software, and
the interfaces themselves are also usually undefined and untested. So
something as simple as IT upgrading a package to fix a bug or add a feature
might break a lab's workflow in ways that are difficult to detect. It's easy
to mess this kind of software up by creating code that produces incorrect
results despite appearing OK by the field's dominant scoring criteria.

Also publish and forget... that's just the natural fallout of the way research
is conducted and funded. I've written scientific software that I don't feel
much personal obligation to maintain or update. It's challenging work which
has greatly reduced payoff (to the powers that be) after it has been tallied
on the scoreboard. Also, most positions in scientific labs are on a clock; not
exactly an environment that fosters long-term stewardship.

I'd love to see science funding agencies spend a couple million to fund
reasonably sized groups of 5-10 programmers/scientists to produce code over ~3
year funding cycles. Just to make sure that signals don't get crossed, these
groups should be told not to spend a single minute of their time on publishing
to journals. Dissemination is limited to short talks and training workshops.
Continued funding would also focus more heavily on user surveys than third-
party citations.

~~~
bioinformatics
I agree with everything you mentioned. I have tried to establish some testing
procedures in the labs I have been in the past, where even backing up data was
a myth, with zero success. With the requirement to compile code in four
different OSs, testing and some automated procedure was the key, but not one
person came on board.

Now moving to a diagnostic setting, and working by myself, testing and
documentation have been the main goals to me. And now, every single piece of
third-party software has to be exhaustively validated in every aspect before
being put into production mode.

"I'd love to see science funding agencies spend a couple million to fund
reasonably sized groups of 5-10 programmers/scientists to produce code over ~3
year funding cycles. Just to make sure that signals don't get crossed, these
groups should be told not to spend a single minute of their time on publishing
to journals. Dissemination is limited to short talks and training workshops.
Continued funding would also focus more heavily on user surveys than third-
party citations."

On the side of software/funding, your ideas are spot on. I think it is about
time to have professional and steady funding to keep software working and
available. I haven't published app notes or the like in the past, most of the
software I worked on were never published, but I see important applications
where the code is not open source, there's no testing and no updates for
years. We need to move from the on-the-clock science to some model that would
allow scientific software to be properly maintained.

Maybe if we treated scientific software like we treat "regular IT", might be
the key.

------
jacquesm
That's only one part of the problem. Bad data leads to bad science just as
frequently if not more frequently, a limited understanding of the effects of
sample sizes on output quality for statistical procedures is another, as well
as what the software is actually doing internally.

My main exposure to this has been by observing a biologist draw absolutely
unsupportable conclusions from a bunch of very low quality data using some
software as if it was a magic incantation. Not that that bothered supervisors
or anybody else, as long as the funding kept coming everybody was absolutely
happy.

The software wasn't all that bad, even though the user interface was crappy.
The whole pipeline did not stand up to scrutiny, and of course _my_ sample
rate of this sort of thing is too low to draw conclusions from but I really
hope that was an example of how it is normally not done.

Personally I think that if you aren't using software to do something faster
that you could do yourself manually then you probably shouldn't be using that
software.

------
plg
The alternative (at least in the extreme) is bad as well --- scientists who
can't program anything and are rely on pre-canned point and click analyses and
programs. There isn't a button for X and so I can't do X. I pressed the Y
button and so of course my Y analysis is valid. Etc.

Edit: you think this is hyperbole but I've heard the above two statements more
than you might guess, coming out of the mouths of prominent scientists.

~~~
hobs
I dont think its something relegated to scientists, many people without some
software experience view software as magic, and assume when it says X button
does X, that it does it without error and without effort.

