
We need a hub for software in science - juretriglav
http://juretriglav.si/why-we-need-a-hub-for-software-in-science/
======
dalke
While the principle is sound, I have a few issues with the explanatory text.

> By building a hub for research software, where we would categorize it and
> aggregate metrics about its use and reuse, we would be able to shine a
> spotlight on its developers,

What are these metrics? Download statistics? Number of forks? Number of stars?
How do they help 'shine a spotlight'?

Organizations have download statistics already, though they are far from
accurate. For example, I co-authored the structure visualization program VMD.
It included several third-party components, for example, the STRIDE program to
assign secondary structures, and the SURF program to compute molecular
surfaces. How would the original authors know about those uses?

(In actuality, we told them we used their software, and the SURF developer's
PI once asked us for download statistics.)

> if you’re a department head and a visit to our hub confirms that one of your
> researchers is in fact a leading expert for novel sequence alignment
> software, while you know her other “actual research” papers are not getting
> traction, perhaps you will allow her to focus on software.

The hub proposal offers nothing better for this use case than the current
system. People who use a successful sequence alignment program end up
publishing the results. These papers cite the software used. If the software
is indeed one of the best in class, then the department head right now can
review citation statistics. What does the hub add?

Suppose, as is often the case, that one of the researchers is a contributor to
a large and successful project. How does the department head evaluate if the
researcher's contribution is significant to the overall project?

As it says, this is a rabbit hole. But it's one that has to be solved, and
solved clearly enough for the department head to agree with the solution, in
order to handle this use case. I'm not sure that it can.

Personally, the best solution I know of is a curated list (like ASCL).

Perhaps as good would is something like PubPeer, to allow reviews of the
software.

> Research software is often incredibly specific, and trying to Google for it
> is more often than not, an exercise in futility ... “sickle”

More often, research software that people write is incredibly generic. "Call
four different programs, parse their outputs, combine the results into a
spreadsheet, and make some graphs." This might take a couple of weeks, and
doesn't result in any publishable paper or good opportunities for code reused.

Yet this is surely more typical of what a 'research software engineer' does,
than developing new, cutting edge software.

This leads to another possible use case. Suppose you want to read in a FITS
file using Python. Which package should you use? A search of ASCL -
[http://ascl.net/code/search/FITS](http://ascl.net/code/search/FITS) \- has
"WINGSPAN: A WINdows Gamma-ray SPectral Analysis program" as the first hit,
and the much better fit "FTOOLS: A general package of software to manipulate
FITS files" as the second.

Way down the list is 'PyFITS: Python FITS Module'. And then there's 'Astropy:
Community Python library for astronomy' which has merged in "major packages
such as PyFITS, PyWCS, vo, and asciitable".

The task then is, which metrics would help a user make the right decision?

~~~
juretriglav
This is a great comment! Thank you!

> What are these metrics? Download statistics? Number of forks? Number of
> stars? How do they help 'shine a spotlight'?

I decided against going deeper into the description of what the hub would
ideally look like, and what kind of metrics it would collect, because that is
a vast topic by itself. Download statistics and distributed version control
stats definitely come into play here (number of active contributors, open
issues, frequency of updates, responsiveness to opened issues, etc.), but the
main statistic is probably software citations. I'd like to spend some
dedicated time to try and figure out which numbers (if any) are indicative of
the quality of software, but I'm sure you're well aware of how difficult that
task is. Do you have any suggestions perhaps?

> I co-authored the structure visualization program VMD. It included several
> third-party components, for example, the STRIDE program to assign secondary
> structures, and the SURF program to compute molecular surfaces. How would
> the original authors know about those uses?

That's the idea behind transitive credit. As long as you can read and parse
the dependency tree, you can give credit to dependency authors, such as STRIDE
and SURF. This can go multiple levels deep, in theory.

> If the software is indeed one of the best in class, then the department head
> right now can review citation statistics. What does the hub add?

I'd like to believe this to be true, but why is every researcher involved in
writing code unhappy with how software is treated in academia. In sickle's
case, for example, some authors don't use a regular citation and instead only
use a link to GitHub (e.g.
[http://europepmc.org/articles/PMC4410666](http://europepmc.org/articles/PMC4410666),
ctrl+f sickle). This happens more often than you would think and is something
that regular citation statistics just don't cover, and the hub can tackle
easily.

> How does the department head evaluate if the researcher's contribution is
> significant to the overall project?

In software written using version control systems, it's at least a bit easier
than in research papers. Which author's contribution is significant for the
overall paper? Those problems remain the same, but due to the transparent
nature of software, are perhaps at least more tangible here.

> Perhaps as good would is something like PubPeer, to allow reviews of the
> software.

That's absolutely true! And this can be easily added to the hub. It could be
as much an automated collection process as a hand curated crowdsourced
process. There has to be a good balance in there somewhere.

> More often, research software that people write is incredibly generic. "Call
> four different programs, parse their outputs, combine the results into a
> spreadsheet, and make some graphs." This might take a couple of weeks, and
> doesn't result in any publishable paper or good opportunities for code
> reused.

Perhaps that is true. But what I saw from experience (and what you can see for
yourself if you search for e.g. github.com on EuropePMC
[http://europepmc.org/search?query=github.com](http://europepmc.org/search?query=github.com))
is that the software that is cited in papers is rarely generic. We can argue
back and forth on this, but my theory is that what people reference in papers
is a specific software-based piece of the puzzle that is novel. Would be nice
to get some data on this. Stay tuned :)

> The task then is, which metrics would help a user make the right decision?

Searching for FTOOL's citation on Google Scholar (Blackburn, J. K. 1995, in
ASP Conf. Ser., Vol. 77, Astronomical Data Analysis Software and Systems IV ..
etc), versus WINGSPAN's citation, gives the following results: 170 cites for
FTOOLS, and 0 for WINGSPAN. This is a metric that the hub could surface and
make those decisions easier.

Again, thank you for the insightful comment!

~~~
dalke
> Do you have any suggestions perhaps?

No, I don't. Our funding sources really wanted the number of users. The number
of downloads was a reasonable proxy since at the time each organization hosted
its own downloads, so they were roughly comparable to each other.

Over time it has became harder, and I don't know what the funding sources now
want.

There are no good measure of software quality. Even the number of users isn't
useful except perhaps by comparing to alternatives in the same class. (The
number of users of semi-empirical quantum mechanics software is certainly less
than the number of users of BLAST, even when the QM software has been in heavy
development for over 40 years.)

My underlying point is that the text doesn't explain how the solution - a hub
- solves the given use case any better than the current system. Readers have
to accept your belief.

There is, by the way, a long history of code hubs, at least in the form of
aggregate archives. In my field, the oldest is the CCL archive, at
[http://ccl.net/chemistry/resources/software/index.shtml](http://ccl.net/chemistry/resources/software/index.shtml)
. You can see from the dates that 1) it's old, and 2) few use it any more.

There are also previous attempts at crowd-sourced software reviews. Here's one
for molecular visualization software:
[http://molvis.sdsc.edu/visres/](http://molvis.sdsc.edu/visres/) . (The author
uses the term 'Visitor-Maintained Indices' because it predates 'crowdsource'.)

Why didn't similar earlier systems succeed? Was it simply poor metrics? Lack
of automation? Or something else? How will it handle the churn of increasingly
updated software and outdated information?

> As long as you can read and parse the dependency tree

I think what you are saying is that _if_ a project references another project
as an external dependency, then that can be captured and analyzed.

However, in the case of VMD, STRIDE and SURF are included as part of the
distribution. There is no dependency tree.

This is quite common. Some of the other code I write include third-party
packages as a submodule, in order to simplify installation, or because it
requires a few tweaks. (Consider, for example, when the original developer has
graduated and there is no replacement maintainer.) I've seen others
incorporate my code the same way.

> This happens more often than you would think and is something that regular
> citation statistics just don't cover, and the hub can tackle easily.

Actually, I don't think you explained how the hub can tackle that problem. The
hub might have more data, but that data might not be useful.

How might a review of the hub data help the department head make a decision?

> Perhaps that is true

Really? My experience is that munging different formats is incredibly common,
and unpublishable.

I was asked just last week if I was interested in writing a program to call
another program in order to explore parameter space. This is a call via the
subprocess module, wrapped in a few loops, and some parser code to get the
ouput from each run and put a summary in an output file.

Not publishable. And they are willing to pay $1,000 for it. (I said I would
rather teach them how to do it.)

> what I saw from experience ... is that the software that is cited in papers
> is rarely generic

Of course not. No one cites Microsoft Windows, or the Linux kernel, or the GNU
tools.

My statement was a different one. Most science software development goes
completely unpublished, with neither a repository nor a published paper.

My analogy is to the glass blower. Once upon a time all chemists blew their
own glass, and a department would have its own glass shop. There could even be
a specialist for making the more advanced glassware. Rather like how most
researchers do some programming, but there are also research programmers.

We see these in the names, like the Erlenmeyer flask, Schlenk tube, and Dewar
flask. But most of the glassware is quite boring - enough that these days it's
mostly outsourced and mass-produced. Most of the people who produce them, just
like most other lab technicians, don't have their names in scientific
publications.

Most software is in the boring category of making a variation of something
well known. A slightly different type of Schlenk tube, for example.

But let's get back to the scenario, where someone distributes a world-class
algorithm X. How does the department head judge the effectiveness?

You said citations weren't sufficient, because they aren't always cited.

However, presumably the same is true for every other program in the field,
with about the same error rate. So the department head should be able to
compare citations with similar programs and establish a ranking.

Which, I'll note, is exactly what you suggest with 'Searching for FTOOL's
citation on Google Scholar'.

So the hub doesn't provide anything that couldn't be done by a meta-tool on
top of Google Scholar, and so citations _are_ sufficient, no?

------
khinsen
The analysis of the problem is good, but I am not convinced about the
solution.

There has never been a central hub for scientific papers, but that has never
been a problem. Why should it be a problem for software? We have technologies
for making software citable with a DOI, so we could aim for the same network-
of-references approach to discoverability that has worked reasonably well for
journal articles.

As for impact metrics, they have done more harm than good for papers, so I
don't see why should run to make the same mistake for software. Moreover, we
could do much better. Given that we are slowly moving towards provenance
tracking and workflow management for replicability, we could use that same
provenance information for measuring software use in a way that is verifiable
and hard to game. I have outlined such an approach in a recent paper
([http://dx.doi.org/10.12688/f1000research.5773.3](http://dx.doi.org/10.12688/f1000research.5773.3),
see the Conclusions), which should be combined with transitive credit
([http://openresearchsoftware.metajnl.com/articles/10.5334/jor...](http://openresearchsoftware.metajnl.com/articles/10.5334/jors.be/)).
Such a metric would measure how much a piece of software has contributed to
published computational results.

