> By building a hub for research software, where we would categorize it and aggregate metrics about its use and reuse, we would be able to shine a spotlight on its developers,
What are these metrics? Download statistics? Number of forks? Number of stars? How do they help 'shine a spotlight'?
Organizations have download statistics already, though they are far from accurate. For example, I co-authored the structure visualization program VMD. It included several third-party components, for example, the STRIDE program to assign secondary structures, and the SURF program to compute molecular surfaces. How would the original authors know about those uses?
(In actuality, we told them we used their software, and the SURF developer's PI once asked us for download statistics.)
> if you’re a department head and a visit to our hub confirms that one of your researchers is in fact a leading expert for novel sequence alignment software, while you know her other “actual research” papers are not getting traction, perhaps you will allow her to focus on software.
The hub proposal offers nothing better for this use case than the current system. People who use a successful sequence alignment program end up publishing the results. These papers cite the software used. If the software is indeed one of the best in class, then the department head right now can review citation statistics. What does the hub add?
Suppose, as is often the case, that one of the researchers is a contributor to a large and successful project. How does the department head evaluate if the researcher's contribution is significant to the overall project?
As it says, this is a rabbit hole. But it's one that has to be solved, and solved clearly enough for the department head to agree with the solution, in order to handle this use case. I'm not sure that it can.
Personally, the best solution I know of is a curated list (like ASCL).
Perhaps as good would is something like PubPeer, to allow reviews of the software.
> Research software is often incredibly specific, and trying to Google for it is more often than not, an exercise in futility ... “sickle”
More often, research software that people write is incredibly generic. "Call four different programs, parse their outputs, combine the results into a spreadsheet, and make some graphs." This might take a couple of weeks, and doesn't result in any publishable paper or good opportunities for code reused.
Yet this is surely more typical of what a 'research software engineer' does, than developing new, cutting edge software.
This leads to another possible use case. Suppose you want to read in a FITS file using Python. Which package should you use? A search of ASCL - http://ascl.net/code/search/FITS - has "WINGSPAN: A WINdows Gamma-ray SPectral Analysis program" as the first hit, and the much better fit "FTOOLS: A general package of software to manipulate FITS files" as the second.
Way down the list is 'PyFITS: Python FITS Module'. And then there's 'Astropy: Community Python library for astronomy' which has merged in "major packages such as PyFITS, PyWCS, vo, and asciitable".
The task then is, which metrics would help a user make the right decision?
> What are these metrics? Download statistics? Number of forks? Number of stars? How do they help 'shine a spotlight'?
I decided against going deeper into the description of what the hub would ideally look like, and what kind of metrics it would collect, because that is a vast topic by itself. Download statistics and distributed version control stats definitely come into play here (number of active contributors, open issues, frequency of updates, responsiveness to opened issues, etc.), but the main statistic is probably software citations. I'd like to spend some dedicated time to try and figure out which numbers (if any) are indicative of the quality of software, but I'm sure you're well aware of how difficult that task is. Do you have any suggestions perhaps?
> I co-authored the structure visualization program VMD. It included several third-party components, for example, the STRIDE program to assign secondary structures, and the SURF program to compute molecular surfaces. How would the original authors know about those uses?
That's the idea behind transitive credit. As long as you can read and parse the dependency tree, you can give credit to dependency authors, such as STRIDE and SURF. This can go multiple levels deep, in theory.
> If the software is indeed one of the best in class, then the department head right now can review citation statistics. What does the hub add?
I'd like to believe this to be true, but why is every researcher involved in writing code unhappy with how software is treated in academia. In sickle's case, for example, some authors don't use a regular citation and instead only use a link to GitHub (e.g. http://europepmc.org/articles/PMC4410666, ctrl+f sickle). This happens more often than you would think and is something that regular citation statistics just don't cover, and the hub can tackle easily.
> How does the department head evaluate if the researcher's contribution is significant to the overall project?
In software written using version control systems, it's at least a bit easier than in research papers. Which author's contribution is significant for the overall paper? Those problems remain the same, but due to the transparent nature of software, are perhaps at least more tangible here.
> Perhaps as good would is something like PubPeer, to allow reviews of the software.
That's absolutely true! And this can be easily added to the hub. It could be as much an automated collection process as a hand curated crowdsourced process. There has to be a good balance in there somewhere.
> More often, research software that people write is incredibly generic. "Call four different programs, parse their outputs, combine the results into a spreadsheet, and make some graphs." This might take a couple of weeks, and doesn't result in any publishable paper or good opportunities for code reused.
Perhaps that is true. But what I saw from experience (and what you can see for yourself if you search for e.g. github.com on EuropePMC http://europepmc.org/search?query=github.com) is that the software that is cited in papers is rarely generic. We can argue back and forth on this, but my theory is that what people reference in papers is a specific software-based piece of the puzzle that is novel. Would be nice to get some data on this. Stay tuned :)
> The task then is, which metrics would help a user make the right decision?
Searching for FTOOL's citation on Google Scholar (Blackburn, J. K. 1995, in ASP Conf. Ser., Vol. 77, Astronomical Data Analysis Software and Systems IV .. etc), versus WINGSPAN's citation, gives the following results: 170 cites for FTOOLS, and 0 for WINGSPAN. This is a metric that the hub could surface and make those decisions easier.
Again, thank you for the insightful comment!
No, I don't. Our funding sources really wanted the number of users. The number of downloads was a reasonable proxy since at the time each organization hosted its own downloads, so they were roughly comparable to each other.
Over time it has became harder, and I don't know what the funding sources now want.
There are no good measure of software quality. Even the number of users isn't useful except perhaps by comparing to alternatives in the same class. (The number of users of semi-empirical quantum mechanics software is certainly less than the number of users of BLAST, even when the QM software has been in heavy development for over 40 years.)
My underlying point is that the text doesn't explain how the solution - a hub - solves the given use case any better than the current system. Readers have to accept your belief.
There is, by the way, a long history of code hubs, at least in the form of aggregate archives. In my field, the oldest is the CCL archive, at http://ccl.net/chemistry/resources/software/index.shtml . You can see from the dates that 1) it's old, and 2) few use it any more.
There are also previous attempts at crowd-sourced software reviews. Here's one for molecular visualization software: http://molvis.sdsc.edu/visres/ . (The author uses the term 'Visitor-Maintained Indices' because it predates 'crowdsource'.)
Why didn't similar earlier systems succeed? Was it simply poor metrics? Lack of automation? Or something else? How will it handle the churn of increasingly updated software and outdated information?
> As long as you can read and parse the dependency tree
I think what you are saying is that if a project references another project as an external dependency, then that can be captured and analyzed.
However, in the case of VMD, STRIDE and SURF are included as part of the distribution. There is no dependency tree.
This is quite common. Some of the other code I write include third-party packages as a submodule, in order to simplify installation, or because it requires a few tweaks. (Consider, for example, when the original developer has graduated and there is no replacement maintainer.) I've seen others incorporate my code the same way.
> This happens more often than you would think and is something that regular citation statistics just don't cover, and the hub can tackle easily.
Actually, I don't think you explained how the hub can tackle that problem. The hub might have more data, but that data might not be useful.
How might a review of the hub data help the department head make a decision?
> Perhaps that is true
Really? My experience is that munging different formats is incredibly common, and unpublishable.
I was asked just last week if I was interested in writing a program to call another program in order to explore parameter space. This is a call via the subprocess module, wrapped in a few loops, and some parser code to get the ouput from each run and put a summary in an output file.
Not publishable. And they are willing to pay $1,000 for it. (I said I would rather teach them how to do it.)
> what I saw from experience ... is that the software that is cited in papers is rarely generic
Of course not. No one cites Microsoft Windows, or the Linux kernel, or the GNU tools.
My statement was a different one. Most science software development goes completely unpublished, with neither a repository nor a published paper.
My analogy is to the glass blower. Once upon a time all chemists blew their own glass, and a department would have its own glass shop. There could even be a specialist for making the more advanced glassware. Rather like how most researchers do some programming, but there are also research programmers.
We see these in the names, like the Erlenmeyer flask, Schlenk tube, and Dewar flask. But most of the glassware is quite boring - enough that these days it's mostly outsourced and mass-produced. Most of the people who produce them, just like most other lab technicians, don't have their names in scientific publications.
Most software is in the boring category of making a variation of something well known. A slightly different type of Schlenk tube, for example.
But let's get back to the scenario, where someone distributes a world-class algorithm X. How does the department head judge the effectiveness?
You said citations weren't sufficient, because they aren't always cited.
However, presumably the same is true for every other program in the field, with about the same error rate. So the department head should be able to compare citations with similar programs and establish a ranking.
Which, I'll note, is exactly what you suggest with 'Searching for FTOOL's citation on Google Scholar'.
So the hub doesn't provide anything that couldn't be done by a meta-tool on top of Google Scholar, and so citations are sufficient, no?
There has never been a central hub for scientific papers, but that has never been a problem. Why should it be a problem for software? We have technologies for making software citable with a DOI, so we could aim for the same network-of-references approach to discoverability that has worked reasonably well for journal articles.
As for impact metrics, they have done more harm than good for papers, so I don't see why should run to make the same mistake for software. Moreover, we could do much better. Given that we are slowly moving towards provenance tracking and workflow management for replicability, we could use that same provenance information for measuring software use in a way that is verifiable and hard to game. I have outlined such an approach in a recent paper (http://dx.doi.org/10.12688/f1000research.5773.3, see the Conclusions), which should be combined with transitive credit (http://openresearchsoftware.metajnl.com/articles/10.5334/jor...). Such a metric would measure how much a piece of software has contributed to published computational results.