But they didn't even bother to cite a tagged release or even a specific commit. Presumably, master will change and could be different than when the paper was published. I guess you could compare dates (although, who knows how far prior to the submission date the git repo master branch corresponded to the exact code used for the paper), but why make reproducibility more obfuscated, when all it costs you is `git tag` to the appropriate commit, making an easily obtainable copy of the state of the repo exactly as it was used for everything that was submitted.
Providing the source is better than nothing, but we still have so far to go educating academics about the vital need for real and effortful data provenance and reproducibility.
Providing a git repo is so many orders of magnitude better than the usual "source code available upon request" throwaway line that I'd hesitate to complain about it. You'd be surprised how many people in non-CS fields still think that source code is something to be hidden from colleagues.
But yes, I would recommend that academics check out http://zenodo.org/ which not only helps with the tagging of a github repo but also provides a citable DOI.
While we're at it, we might as well point out that source code alone is useless if it doesn't come with instructions on what kind of environment is required to install and run it, as well as the actual input data files used by the authors.
Man, they use Git and you complain? I've collaborated with pretty brilliant PhDs[1] who would say "Subversion is so complicated, can't we just use Dropbox?"
[1] Insult to injury: computer science, not physics.
I don't know if you're in TCS or an applied branch, but my experience with TCS labs is that at most 10% of the lab (postgrads and teachers) are able to use some versioning system.
Not a joke: courses supposed to teach programming are organized with Dropbox and email attachments.
They could use git with fancy pre-commit hooks and connect their repo to travis-ci to check every push for software and mathematical correctness, and I would still complain if it's not transparent exactly what code was used for the particular results used in the publication.
Time and again, new services have come out, for decades, that supposedly will help solve the versioning, collaboration, and reproducibility issues with academic papers.
They never work, and the tools are always misused in this same way: as a catch-all bucket for disparate and unrelated snapshots of the paper, the code, and the data.
Merely moving from manual versioning in Dropbox to manual versioning in git is not in the least bit to be considered progress. We have to absolutely curb the underlying problematic behavior (e.g. this repo has but a single commit, the concept of version doesn't even apply) before it counts as even the tiniest step in the right direction.
In fact, I see it as a huge problem that people are chiming in to say that using git is in and of itself some kind of positive step. Why?
I don't care if you use svn, darcs, mercurial, git, perforce, or any other versioning system. The fundamental issue is actually keeping track of the changes you've made, and ensuring the precise version used for publication is trackable and available.
Even using Dropbox with clear documentation of changes made would be better than using a git repo where, prior to your first commit, you have destroyed all version history information and manually put everything into the "good" commit that you feel like "showing" to everyone.
That behavior is the entire problem. Using git, in and of itself, doesn't represent any degree of departure from the old practices. Just because git is popular and new-ish and sexy doesn't mean simply dropping stuff in a git repo constitutes progress. That's a dangerous implication.
Unfortunately, I don't have a Bitbucket account because I don't approve of Atlassian, so I can't. But hopefully someone here who does use Atlassian products will do it?
The fact that there is only one commit is a bit disconcerting. It suggests that they edited, debugged, and modified the code "offline" until they felt it was "good enough" to share, so whatever revision history there was when the paper was being created, revised, and submitted, is lost.
We can only take the authors' word for it that in between getting results for the paper and releasing the commit, they didn't make any significant alterations that would affect reproducibility.
The point is not to be critical of these particular authors. Good on them for sharing their code.
The point is to highlight that it's harmful for academics to "hide" their code until they get it all together for the "perfect" commit. It defeats the point of version control and their repo functions instead as basically nothing but a download link.
All the reproducibility problems would remain the same as if the code was provided as just a tarball linked from some academic web page.
The fact that there is only one commit is a bit disconcerting. It suggests that they edited, debugged, and modified the code "offline" until they felt it was "good enough" to share, so whatever revision history there was when the paper was being created, revised, and submitted, is lost.
Obsessions with code history are a little disconcerting. It's interesting to see, but has little relation to the quality of the paper (or the code base, for that matter).
> Obsessions with code history are a little disconcerting.
Wanting to see code history is not anywhere close to an "obsession" -- that's just a loaded word that does nothing but distract from the issue.
> It's interesting to see, but has little relation to the quality of the paper (or the code base, for that matter).
This is just completely wrong. I don't want to see code history because I care about the author's earlier mistakes, previous implementation attempts, refactorings, etc., from any sort of judgemental, coding skill point of view.
The reason it's important is that maybe they used version 3 when they ran some part of the study and generated some of their data. They figured it was good enough, but then later they updated some code to version 4. They didn't need to re-run things for that older data, but they needed version 4 for something else.
But, version 4 makes a change that causes an error, or an undefined behavior, or a change to some statistical degrees of freedom calculation or something, that would imply the data from version 3 also needs updated.
So the authors are using something that came out of version 3, looked good, and they thought never needed revisiting. Then they are also using something from version 4 that, overlooked by them, invalidates or changes something from version 3.
All of this gets whitewashed because they go through and manually finalize the code, and upload it to a public repo.
Now, you as the reader are not even aware that there ever even was a version 3, let alone that some of the data came from version 3, some came from version 4, and the mixup is based on an oversight or something.
The code history really matters. If you cannot reproduce the results immediately with the tarball the authors release as the "good" version, and there is no version history, then, effectively, there is zero reproducibility. You have to push back on the authors, who weren't making their changes under version control when the version 3 to version 4 issue arose, and so good luck ever getting an answer.
I've actually experienced this first hand. I had to reproduce some quant finance research on country-based beta calculations. It turned out that in an early draft of the work, the team had used a completely different data set and in the earlier data set, the country code for Sweden had been set to "SW" even though the ISO country code for Sweden is "SE". They got some intermediate statistics out of that data set and saved them for later usage.
Then later on they used another data set and someone had the bright idea of using "SW" for Switzerland, even though that country code is "CH".
But because all that remained from the first data set was aggregate statistics, and no version history of the code that had loaded the spreadsheet and made these assumptions at that time, there was no way to figure this out.
In the final result, Swiss and Swedish data was being aggregated together and throwing numbers off. And those two countries both have very small market caps, so this is hard to spot and track down.
After a ton of outrageous debugging, we just happened to get lucky and notice that the difference was almost equal to the largest Swiss company in that time period (Nestle I think), and then began to wonder about the "SW" issue and then tried all permutations of separating the two data sets into Swiss and Swedish data until things matched.
If we could have bisected the original source revisions, it would have saved weeks worth of tedious, expensive data drilldown nonsense to replicate a result.
What? Do I have an obsession with writing paragraphs or looking at source code? What you are saying doesn't make sense and doesn't engage the conversation. Instead you are being needlessly hostile.
Besides, even if I did have an obsession with reading source code, it would not have any bearing on the usefulness of expecting version history in general. Here you are talking about me which makes pretty much zero sense, whether I am obsessive about it or not.
It's important to write a lot about this because the nature of your comments borders on unstable hostility. Even if you disagree with what I am saying about version history, and even if you disagree with how passionately I feel about it, I still deserve better than to endure petty insults that literally have nothing to do with the context of the conversation.
Don't take it personally, I'm not insulting you personally for having an obsession with reading the source code (I think that's great), I'm insulting everyone who is obsessed with the code history.
In science, reproducibility is what matters, not the history. Although if someone didn't tag the revision they used to generate data (or to compile and send to customers), that's pretty annoying.
I see, thank you for clarifying. I misread your comment, but if it is a general comment about the concern for version history of code that supports a scientific conclusion, then I absolutely withdraw my comments about it being insulting or personal.
Without the history to prove exactly what happened and when, there is no reproducibility.
I would even go further and say that each and every model fitting attempt should be recorded in some type of version history. This way, researchers can't engage in multiple comparison errors, whereby they test lots of things and fish for a specific threshold of significance that may be met solely by chance based on how many things they tried, or researcher degrees of freedom problems where seeing the output of intermediate model fits causes you to change the experimental design mid-stream (like collecting more data in the context of a model where that would change the required significance threshold).
These kinds of pitfalls are extremely common and go unreported most of the time. If someone found a significant relationship after testing hundreds of things and fine tuning the data collection, even if that result is statistically significant we should deeply discount it. But what if there is no record anywhere that all those previous attempts took place and that they motivated the experimenter to tweak the experimental design?
Providing the source is better than nothing, but we still have so far to go educating academics about the vital need for real and effortful data provenance and reproducibility.