
How to support open-source software and stay sane - sohkamyung
https://www.nature.com/articles/d41586-019-02046-0
======
a3_nm
Argh, this article is going to give researchers the misleading impression that
releasing code as open-source is complicated, that you need to maintain it,
ask yourself difficult questions, be an expert in programming, find funding to
keep it around, etc.

But in my field, in most cases the source code is never released at all.
That's a far bigger problem that not having support to use it.

So fellow academics, please don't use this article as an excuse to not release
your code. When in doubt, just push the thing to Gitlab as is, add a README
that says "This is research code for paper X and it is unmaintained.", and
disappear. It's not ideal, it's not the best way to do science -- but it's
much better than not releasing your code.

Related: the CRAPL
[http://matt.might.net/articles/crapl/](http://matt.might.net/articles/crapl/)

~~~
_dps
> But in my field, in most cases the source code is never released at all.
> That's a far bigger problem that not having support to use it.

I agree but I think it's good for people to know that almost any moderately
successful open project will attract whiners, complainers, entitled people,
and in some cases outright abuse.

From the perspective of society, having no open code is worse than having open
code that's unmaintained. We're agreed here.

From an individual contributor's perspective, opening yourself up to varying
forms of whining and abuse "for the social good" (and not, say, for your
tenure or publication count or whatever a researcher cares about in the
moment) is a bigger problem than just sitting quietly on stuff you don't want
to become a drain on your life.

~~~
Bakary
What exactly are the whiners going to do if you just ignore them?

~~~
krageon
If you use an online presence that is traceable across the internet, some
subset of them will bother you _everywhere_ or decide to make your life
miserable when you ignore them. As such, anything that can remotely be seen as
connected to you with any longevity is best put online without your name and
with a throwaway nickname.

Given that precaution, nothing! Sadly a lot of people don't take this
precaution.

~~~
zrobotics
If the code is released anonymously, what good is it though (in the context of
research code)? Ideally it should be connected to the paper to allow for
replication, and what academic would want to release a paper anonymously?

~~~
krageon
I think we agree then that releasing code connected to a paper is something I
wouldn't recommend anyone actually do, even if I do like it when it does
happen :)

------
boron1006
Great article, and fantastic to see a spotlight on an issue that I've thought
a lot about.

The sad part is that to a lot of scientists and researchers, software/software
engineers isn't something worth paying for. It's not uncommon to see
"programmer" jobs that are looking for 3+ years of experience that offer <$15
dollars an hour in the US. Sometimes they're "volunteer intern" positions. Of
course the people who end up filling these positions aren't usually actual
developers, so the software gets built poorly, eventually gets scrapped, and
the cycle continues.

Management also hasn't really evolved past the 90's. Non-technical scientists
often want 100% of control and to make each decision, but don't want to spend
any time on it. This means developers often have little to no specs to work
with, but spend all of their time guessing about what the scientists want, and
having to go back and fix everything after.

>“That’s really the tragedy of the funding agencies in general,” says
Carpenter. “They’ll fund 50 different groups to make 50 different algorithms,
but they won’t pay for one software engineer.”

This is the crux of my frustration. It's not even 50 different algorithms
often. A lot of the time, 50 different research groups will be working on very
similar programs, and none will be able to deliver a working version.

Though the article mentions that research funding does exist, clicking on one
of those funding pages and looking through their examples reveals that only
~1/10 of their websites are actually still active, and they aren't old sites.
Again this goes back to the whole "scientists don't value software thing".
I've seen scientists happily sign off on spending $20,000+ on hardware
components that would usually cost <$100 to make, but balk at contributing $50
yearly to support open source.

I got lucky that I managed to find a place where I get paid fairly, and my
boss is actually technical and can manage tech projects well, but these places
are few and far between.

~~~
opportune
A lot of the points you bring up re related to cost. Here's the thing about
cost, let's say it costs $100k/year to hire a good software engineer capable
of writing scientific code (able to program and test complex algorithms, write
HPC code, turn whitepapers into code), which might even be an underestimate
depending on how benefits are paid out and the area. You can also fund 3 more
grad students for that kind of money. The grad students will directly convert
a PI's money into authorships while the software engineer's contribution will
be only indirect, and likely take years to pay off.

Plus, with only a single software engineer, there's a good chance you get
unlucky and end up with someone clueless/lazy. You would probably need 3-4
software engineers to make a functioning team with best practices and hedge
your bets against accidentally hiring someone who sucks. So now we're talking
10+ grad students.

Open source software is a bit different because many labs can band together to
fund things they find useful. But again there are still issues with cost-
effectiveness. I'm guessing most lab contributors to OSS would want some sort
of quid-pro-quo which may not be realistic for all OSS projects. And by
funding OSS you are also funding competing labs' abilities to use the same
features you use, which is good for science in general but not good for
people's careers sometimes

~~~
bluGill
Why hire an engineer? Why not walk over to the CS (might have a different
name) and talk to a professor there. They can set you up with plenty of
undergrads who need this experience, and they should be able to guide them
into something that is maintainable long term.

Note that I said should there. How to write maintainable programs seems to be
lacking in research area.

~~~
opportune
I was such a CS undergrad working in a lab once upon a time. I don't think you
really want that because the undergrad will probably only be working for like
4-15hr/week potentially for only a single semester. For a summer position,
sure it's 40hr/week but still only for about 10-15 weeks.

And still, you're getting a generic CS undergrad's caliber of work and
responsibility which I would say on average is not great. They might not be as
familiar with version control, best practices, etc. and could just end up
writing code just as bad as the scientists.

I think if hiring a team you would need at least one somewhat experienced
full-time software engineer to act as team lead/PM for the other developers,
whether fulltime or students.

~~~
Master_Odin
Yeah, picking up undergrads (or even grad students) from the CS department is
not a surefire way to end up code that follows best practices, is well
maintained/documented, etc. and I'd probably argue if your team leader doesn't
have that experience, you're more likely to end up with something that doesn't
follow great practices (especially with regards to any sort of test suite).

~~~
bluGill
All the replys about undergrad quality are correct. I stand by my statement
though: we need to figure out how to solve this problem and research is sorely
lacking.

------
dddddaviddddd
Maintenance is a challenge anywhere where software is developed in-house
without a dedicated development team. Development is often lead by one person
and becomes very difficult when they depart. It seems like all the regular
maintenance challenges are present in these situations, just exacerbated. Not
sure what organizations that aren't software-focused can do to improve their
situation in this regard.

~~~
PascLeRasc
One thing we can do as users is champion the idea that open-source authors
don't owe us anything. Having support or getting help with problems is great,
but the author's already done us a huge favor by writing the software we
needed in the first place, and they aren't required to go beyond that or do
anything specifically for an individual.

~~~
crispyambulance
Mostly agree, however, at some point the author's DO NEED to do something
beyond creating the thing or else face the extinction of that piece of
software.

I think most people who have created something will generously bend over
backwards to help individuals in the early stages of it's lifecycle. You can
see that all the time on github.

The problems come when the project takes off to the point where there isn't
enough support for the number of people using it BUT the software isn't
mature/popular/fit-enough to be "under the wing" of a larger organization who
can afford to pay for it's maintenance and evolution.

Is there a way to bridge the gap between author's-generosity-support and
corporate/organizational stewardship? We do have the social networks in place
to allow that, they're just focused on different objectives.

~~~
SpaceManNabs
Wasn't there a thing recently were an author just gave away one of his node.js
libraries, and then it was used maliciously by the requester to attempt to
hijack bitcoin wallets?

Found it: [https://arstechnica.com/information-
technology/2018/11/hacke...](https://arstechnica.com/information-
technology/2018/11/hacker-backdoors-widely-used-open-source-software-to-steal-
bitcoin/)

I don't blame anyone in this scenario because the culture of open source
projects and their interplay with enterprise encourages it

------
ylem
I think part of it is funding--but part of it is also recognition. In a number
of fields, the development of scientific software receives little recognition
compared to the science--even if the impact is large in terms of the
discoveries it enables. So, for say tenure or career advancement, it might be
"nice" for you to do it, but nowhere near as important as publishing high
impact papers (at least in many fields). Especially if a graduate student or
postdoc commits too much time to it instead of research results, they risk not
being able to continue in their field (though they probably have more options
if they decide to leave science to become software developers).

------
xvilka
Speaking about biomedical software - I suggested[1] to make a Julia flavor of
Biostar Handbook[2], an amazing introduction into the field of bioinformatics
and genomics from the Biostars[3] Q&A site authors. Porting algorithms to
Julia will greatly improve the speed and maintainability of corresponding
programs.

[1] [https://discourse.julialang.org/t/biostar-handbook-
computati...](https://discourse.julialang.org/t/biostar-handbook-
computational-genomics-and-julia-to-be-or-not-to-be/25732)

[2] [https://www.biostarhandbook.com/](https://www.biostarhandbook.com/)

[3] [https://www.biostars.org/](https://www.biostars.org/)

------
dekhn
Great article, thrilled to see PIs coming out and saying explicitly that the
funding agencies are making a huge mistake funding discovery-driven science at
the cost of long-term production work.

------
zdw
Determining worthiness of projects to fund is a really difficult battle. Do
you go based on popularity? Importance? If something is worthy, how much is it
funded? For how long? Who is paid to do the work?

Even other projects that tried to address this like the Core Infrastructure
Initiative seem to have unintended consequences. For example OpenSSL got CII
funding, then used some portion of that to relicense as Apache 2 which breaks
compat with the more free LibreSSL fork, weakening the overall community.

------
jgamman
if people paid the price of a coffee for good software, post-docs could attach
themselves to a research group and fund themselves by maintaining quality
software products - freelance scientist FTW.

the problem here is that lots of people want _other_ people to work for free.
if you're not being paid, it's a hobby and you don't owe anyone anything. if
the science system relies on free labour and refuses to support it, that's a
very different conversation and the results are predictable.

------
FpUser
"...Scientists writing open-source software often lack formal training in
software engineering, which means that they might never have learnt best
practices for code documentation and testing..."

This is the most ridiculous statement. I had chance to work in either
environment. Having that experience for the quality of the end result I will
take scientist (preferably physicist or mathematician) with self taught
software development skills over formally trained agile guru any time.

Of course there are exceptions but ...

~~~
oneepic
I have the exact opposite opinion, plus my cousin is a robotics/ML professor
who has to deal with this same issue in his research group.

Indeed, tons of scientists really have no idea about style, testing, etc.
They're happy to just write an imperative C/C++/Python program with no docs
whatsoever, run it, and be done with it.

~~~
FpUser
I was talking about scientists who switched from science to writing software
as a product for whatever reason. They surely produce good docs (properly
documenting their work is the very basis of being scientist).

As for writing imperative code in C/C++/Whatever other language they seem to
choose: nothing is wrong with that.

~~~
opportune
Writing medium-sized amounts of imperative code is wrong if it is going to be
shared with any other person than a single developer writing it. And even if
only a single person will ever see it, you are still better off writing it in
components once the project reaches a certain size.

~~~
FpUser
Imperative programming and components are 2 different things. The latter could
be implemented with numerous approaches including imperative programming

~~~
opportune
That's true, but when someone mentions writing something in an "imperative
style" I think it's common for that to actually just mean one huge file that
executes sequentially which is not amenable to testing, having someone else
working on a different part of the code without running into lots of merge
conflicts, makes it very hard to refactor, etc.

