Plus, if people actually provided code, someone might actually get it into their heads to run it. That can't happen. No, literally, it can't happen. The level of professionalism with regards to source control, documentation, distribution, etc in most academic labs is insufficient to allow the code to be executed outside of the environment that it actually executes on. If you put a tarball up somewhere, somebody who tries to run it is going to get a compile error because they're missing library foo or running on an architecture which doesn't match the byte lengths hardcoded into the assembly file, and then they're going to email you, and that is going to suck your time doing "customer support" when you should be doing what academics actually get paid to do: write grant proposals.
This, by the way, means that peer review by necessity consists of checking that you cited the right people, scratched the right backs, and wrote your paper in the style currently in fashion in your discipline, because reproducing calculations or data sets is virtually impossible.
From Richard Feynman's1974 CalTech Commencement:
"We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It's a little bit off because he had the incorrect value for the viscosity of air. It's interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher.
Why didn't they discover the new number was higher right away? It's a thing that scientists are ashamed of--this history--because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong--and they would look for and find a reason why something might be wrong. When they got a number close to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that. We've learned those tricks nowadays, and now we don't have that kind of a disease.
But this long history of learning how to not fool ourselves--of having utter scientific integrity--is, I'm sorry to say, something that we haven't specifically included in any particular course that I know of. We just hope you've caught on by osmosis"
Should be "anxious not to publish", or "anxious about publishing", I think.
Source code is a very precise way to exhibit the work you have done. It is so precise that even particular machine can understand it. I am really happy to see that recently in NIPS more and more people published their MATLAB code along with their papers.
And for code that is not easy to install (e.g., requires many libraries, frameworks, dev tools, etc.), you need to spend time on documentation and even then, you often need to answer questions like "how can i create an eclipse plug-in to use your code or lxml failed to install"...
This x 1000! This is a topic near to my heart as I've actually had people use models & software we provided to them, not making an effort to understand fundamental concepts and theory behind it (or just being incapable of understanding) and then publish papers saying our models are crap! (not even providing better ways, nor comparing to anything else, just saying 'I put in some data and the results were unrealistic, this model is bad).
My point is that there is not a lot of incentive to release code: it requires a lot of your time, it is not always clear that granting agencies/employers consider this as being more positive than publications, and it exposes your work to all sorts of unfair replication/comparison.
Now, if research work without released code were never cited, that would be a terrific incentive to release your code :-)
P.S. I speak from experience and I release the code of my most important research work.
Ha. That doesn't apply to climate scientists. And there was the case of Andy Schlafly of Conservapedia (a creationist lawyer) demanding all sorts of nonsense from a scientist who documented evolution through many, many generations of bacteria.
At its worst, you get ideological cranks ganging up and demanding information, possibly through FOIA requests, pretty much for the sole purpose of using up all the scientist's time responding. Then when they don't respond, and/or it turns out they aren't subject to FOIA requests, the cranks freak out, break into a server, steal emails, and declare a conspiracy.
To be fair, no climate scientist is asking anyone to bet the economy on their results, it's the various interest groups that pressure scientists into giving an answer that is less nuanced as they would have liked and then blame them when things go wrong. (at least for the 10 or so I know personally, but I'm pretty sure no others do, either). Actually this is true for all scientists I know in field that have political relevance.
We all want scientists to share their code because that's the positive sum action. But individual scientists aren't paid based on how well the scientific community is doing. They're awarded positions, grants and prestige on their individual performance against other scientists. So scientists worried about their career think in zero sum terms: if I publish this source code, will I be pipped to the next paper? Well, I'll publish this other piece to make myself look good, since I'm not following it up; and then I'll collect the citations too."
We can wring our hands about scientists acting in bad faith all we like, but it's obvious we just have to change the incentives. Funding agencies just need to award higher weight to journals that demand source releases, transitioning to only weighting those journals.
In the prisoner's dilemma, the parties can work together to yield a common, greater result. But they might not do so because the common solution requires trust; any individual might go for the easy answer that brings himself a return while screwing the others.
You could argue that a prisoner's dilemma view of it is more realistic. In this view, the scientists all desperately do want to get it right, but they know they can't because they'll be punished for doing it by their peers, who'll seize the opportunity to get ahead at their expense. This model is plausible too, but it's different from the one I suggested. So it's not actually a terminological difference.
Incentives are difficult to get right.
If I write a paper describing an algorithm (or process, or simulation, etc) and open source my code, someone attempting to reproduce and confirm my work is likely to take my code, run it, and (obviously) find that it agrees with my published results. No confirmation has actually taken place - errors in the code will only confirm errors in the results. Further work may then be based on this code, which will result in compound errors.
If, however, I carefully describe my algorithm and its purpose in my paper, but don't open source the code, anyone who wishes to reproduce my results will have to re-implement my code, based on my description. This is vastly more likely to highlight any bugs in my implementation and will therefore be more effective in confirming or disconfirming my findings.
I'm not sure yet what I think about this argument. It seems to only apply in certain domains and within a limited scope (what if the bug exists in my operating system? Or my math library?) but in relatively simple simulation models, it may have some validity.
What do you think?
 From Adrian Thompson, if you're interested: http://www.informatics.sussex.ac.uk/users/adrianth/ade.html
Note that this is nothing terribly new.
Sometimes experiments testing a hypothesis give false positives because something went wrong in the experiment.
In that sense there is no real difference between a faulty thermometer and a bug in your code.
Having written this, I think your argument in fact does apply after all.
In natural sciences one would argue that if your buggy code confirms an invalid hypothesis, someone redoing your argument with the same code would not uncover the problem. Publishing your code invites people to use your faulty thermomenter.
Of course I'm assuming that the published paper contains all necessary details to reproduce the experiment.
In case of e.g. climate models or modelling the formation of galaxies that might be a problem because the code _is_ the experiment.
Describing what code does is very hard, and it would be easier to just publish it in stead.
Depends on what level you're describing it. "It does an FFT" or "it sorts" are pretty clear. It gets hairy if you describe the specific details of the implementations. But the implementation is likely irrelevant, because other scientists can choose any implementation. Even with complex models you ought to be able to piece much of them together out of descriptions at that higher level.
It's true that third parties can apply methods easily to new data. But it is a testimony to the method, and references will help building the reputation of the original inventor.
Another concern only addressed in the comments on this blog post is that most scientists do not produce beautiful programs. The reasons are twofold:
- Programs are hacked together as quickly as possible to produce results. Scientists are mostly concerned with testing their theories, and not so much in producing software for public consumption.
- Most scientists are not great programmers.
Consequently, scientists usually do not want to make their source code available.
This situation sucks, given that in many countries taxpayers fund science.
I think it'd be perfectly reasonable for society and the gov't to demand that scientists who produce results based on data and data processing make the data and the programs available to their peers and the general public, and also demand that it satisfies some reasonable software engineering expectations. (These will be set by their own peers anyway.)
It is accepted by scientists that it is important to communicate research findings in the form of well-written human-readable papers. Over time the community must accept that including well-written machine-parsable data and code is just as important part of the scientific communication framework.
Finally, a personal anecdote. I was contacted by a senior researcher to work on a cutting-edge numeric simulation project, and he sent me the source code, but made it clear that it is "top secret", even though it is gov't funded, as it is much better than its competitors, also gov't funded. He estimated that this advantage will last for several years, meaning several papers.
But according to what you say, this means that he (still programs in Fortran) would need to start caring about versioning, cross-platform stuff (fortran, remember), readability (well, it's fortran, for me it is always unreadable) and maintainability. In addition to his usual duties of "doing research", giving lectures, answering students, advising PhD's and asking for grants.
The problem is most likely to be solved when everyone in the field can code the algorithm in the paper. If the peer reviewer can write a program following the algorithm and applied on the data it validates the results, this ends up working well. I doubt (very profoundly) that giving the source will result in better peer reviews. The peer reviewer may have the code, but he will most likely compile, run and check that the numbers are the same.
I strongly agree with you on this point. The peer reviewer obviously won't have the time to check the code. But the very act of releasing the code will force the original author to be more thorough. In this sense, it will result in better, less buggy code, better results and better science.
Also, going one step beyond the peer reviewer, I think an interested party has the right to check the results for software and data bugs, given that we're in agreement that the peer reviewer probably won't have the time?
Another anecdote. We found a really interesting signal in a popular astro dataset, which would have put us on the frontpage of a few magazines for sure. After many sleepless nights, it turned out that it was a bug in the processing pipeline software. It took me an unreasonable amount of time to find the bug, because the software wasn't released, so I had to go hunting around for clues in the appropriate papers. (This was a very large survey, so there were tons of papers.) If we had higher software engineering standards, maybe this bug would not have made it into the production pipeline and pollute the dataset.
So, in general, the professor's strategy could be to simply make sure he always has such a person (Phd or postdoc) as part of his team, who advises on coding standards, versioning, cross platform issues, etc.
On the other side, I disagree with "most scientists are not great programmers". What is a "great programmer"? In my definition, it is someone who can write a program to solve a problem without too much hassle. And a lot of scientists I know satisfy this to terrific levels. Of course, they use no orthogonality, nor source code control, nor do extreme programming and usually don't write test cases. They just do what is asked as quickly as possible to keep on doing what is needed to do.
Great programmers somehow only appear to write books and give pristine examples of how you build infinitely extensible architectures.
Donald Knuth is the author of literate programming, which is a framework for writing human-readable programs:
And I've read the Quake 3 code extensively. Great code, but certainly not pristine, and I'm sure if you handed it to code review to virtually anyone you know, they'd find a whole bunch of sylistic and architectural issues with it. Like take a look at the playerDie code. You're telling me you wouldn't have said, "Rewrite this?" if you a colleague handed this to you?
And yes, Knuth is the author of literate programming, but that's not how the code started out. Read his letters on computer science.
Tarjan wrote a similar thing, I think in his ACM Turing Award lecture.
I picked those names because they are the best our industry has. But even with that, they all pretty much write code the way the previous post noted.
Hunting down all the right citations, putting results in good prose, formatting the paper to the standard of where you are trying to get it published, responding to peer reviews (as silly as they can get), doing peer reviews for others... all things that are not "doing research" (strictly speaking) and to some are not enjoyable.
But they have to be done anyway, it's a quality standard that has to be met, everyone suffers for it and everyone benefits from it. It's just that the scientific community has let itself set really lousy standards for code sharing. It should be rectified.
Do bear in mind that those people are not cheap, so it's doubtful that anyone would want to incur the expense.
And why dismiss so casually the argument that running the code used to generate a paper's result provides no actual independent verification of that result? How does running the same buggy code and getting the same buggy result help anyone? As long as a paper describes its methods in enough detail that someone else can write their own verification code, I would actually argue that it's better for science for the accompanying code to not be released, lest a single codebase's bugs propagate through a field.
The real problem, if there is one here, is the idea that a scientist's career could go anywhere if their results aren't being independently validated. A person with a result that only they (or their code) can produce just isn't a scientist, and their results should never get paraded around until they're independently verified.
Because this recent rash of articles is a result of "ClimateGate". Clearly the issues raised are more general.
And why dismiss so casually the argument that running the code used to generate a paper's result provides no actual independent verification of that result? How does running the same buggy code and getting the same buggy result help anyone
I think it's a bogus argument because it's one scientist deciding to protect another scientist from doing something silly. I like your argument about the code base's bugs propagating but I don't buy it. If you look at CRUTEM3 you'll see that hidden, buggy code from the Met Office has resulted in erroneous _data_ propagating the field even though there was a detailed description of the algorithm available (http://blog.jgc.org/2010/04/met-office-confirms-that-station...). It would have been far easier to fix that problem had the source code been available. It was only when an enthusiastic amateur (myself) reproduced the algorithm in the paper that the bug was discovered.
But that's the actual problem, that nobody else tried to verify the data themselves before accepting it into the field. If you could reproduce the algorithm in the paper without the source code, why couldn't they?
And while it may have meant that the Met Office's code would itself have been fixed faster, I don't buy the idea that having the code available necessarily would have meant the errors in the resulting data would have been discovered faster. That would imply that people would have actually dived into the code looking for bugs, but we've already established that the people in the field are bad programmers who feel they have more interesting things to do. Why isn't it just as plausible that they would have run the code, seen the same buggy result, and labored under the impression they had verified something?
Writing your own code for anything but trivial analysis is a huge time sink. If I can take someone else's code instead of writing my own, I'll do so. There is a very real chance that making all codes public will seriously increase overall consolidation and decrease independent verifications. (Independent verifications are a problem anyway because funding agencies are unlikely to fund redoing the same experiment and journals are less likely to publish them.)
 Programmer = somebody who spends 8 hours a day at it.
Incidentally, my experience is that plenty of people who are programmers are ashamed of a lot of their code too, at least in the sense that they wouldn't want anyone else reading it and judging them. Writing code that looks good as well as getting the job done is hard, whoever's doing it, and it's by no means always worth the effort.
If you don't like that deal, governments have an even better one: we give you patent rights on what you invent -- as long as you show us how it is done.
These deals aren't altruism on the part of the public. Nobody thinks science is a charity. It's vital to the interests of the particular nations and the species as a whole.
In my opinion, no institution of higher learning that is supported by taxpayers should be giving out credentials to people who are so insecure and unprofessional as to not be able or willing to completely describe how they reached whatever conclusions they have. And that's not even getting into the issue of taking research and making political arguments out of it. That raises the bar even higher.
It's a scandal. And the only reason it's coming out is because some people -- for whatever reason -- have a bug in their shorts about climate science.
It's time to set some ethical standards for all scientific research. Open data, open programming on standardized platforms, and elimination of scientist-as-activist. There's just too much dirt and conflict of interest in certain areas of science. Not all, by any means. But enough to leave a bad taste in the average citizen's mouth. I love science. We deserve better than this. Something needs fixing.
Even worse, I take offence to the sentiments expressed in it. It's time to set ethical standards?! Really? Are they so unethical? I would claim that scientists have, by and large, very high ethical standards. Considering that they themselves are being pushed more and more to market their research to get funding, the highest standard for our work remains truth, and nothing but the truth.
Elimination of scientists as activists is a non-statement. Assuming that scientists believe in their own work --a very common affliction-- what does it mean to be an activist? Just that they try to convince others of the validity of their work? Isn't it unethical _not_ to warn the world of the impeding doom your research has uncovered? And to be somewhat insistent if people do not want to hear it?
Don't confuse politicians, crooks and businessmen in the climate change debate with the scientists. While all pretend to use scientific arguments, very few do.
The political question should be: does the risk of climate change justify the costs that might prevent it.
On the other hand, scientists have spent decades fully aware that they'll have to build their own labs, obtain their own equipment, get their own animals if the work requires them, and follow the described procedures.
They can't use the other guy's mice or zebrafish or monkey. They probably can't use the other guy's lab. They might be able to use the same telescope for astronomy, but it's better to use a different telescope to control for some fluke of the original.
So given all that, it hardly seems a huge deal to not be able to use someone else's code, as long as you know what the code was supposed to be doing.
But that's a vacuous statement. The only description of what the code is doing is the code. The description you get in a paper is the goal of the code, but goals aren't results. (Mistaking stated goals for results is a surprisingly common systematic error, programmers make it all the time when they believe the hype a project puts out before the project actually has any results. Once you start looking for it it's hard to see a day go by without someone doing this.) I can make a simple information theoretic argument that it is blindingly obvious that any significant code base will have many more bits in it than any journal article could, so it is literally mathematically impossible for a journal article to accurately describe the code. If goals were the same as results all of our jobs would look very different.
Moreover, the reason why you can't use the other guy's monkeys is purely physical. If they could use the other guy's monkeys, they would. They try as it is; the entire purpose of controlled breeding gene lines is to try to erase variations. If they could use guaranteed-atomically-identical lab hardware, they would. (The story of cold fusion would probably be very different if this were possible, for instance; instead of an effect that nobody could reliably replicate, there would be one set of results that everybody could replicate and probably rapidly explain. Note I'm not making a claim about cold fusion itself, just the way history turned out.) Don't elevate an accidental physical limitation to an essential component of science. Very few things hit the true ideal of science, and in general the closer you can get, the closer you should get.
No, they wouldn't, any more than a scientist would use only one monkey. Because animals vary. Hell, my Mom had two uteruses. If you don't look at different specimens, you don't know if what you observe is peculiar to the one specimen, or generally applicable.
Using atomically identical hardware would be a bad idea, because then you wouldn't know for sure if your results are replicated because the experiment was good, or because of some quirk in the apparatus.
Replication of results is merely a step in the process. Once you establish replicated results, you proceed from there. Maybe you observe that the results are irrelevant because your monkey is weird in some way, and you test it on another monkey to establish that point; others can then examine their copies and decide whether you've got a point. Maybe you build on the experiment with a standard test bed. Maybe you take apart the apparatus to demonstrate how a bit of impurity corrupted the results, and then everybody else can do the same disassembly. Maybe you shuffle in some new monkeys and hardware and run the test again to be sure. Not being able to precisely replicate results is a handicap that the various sciences proceeds through anyhow because it has no choice, not a desirable part of the process. Sure, there's a small amount of danger that you might overfit your results, but there's no guarantees anywhere; it's less than the danger that you face from non-replicable results.
Not having replicability only throws away options, options that smart people could use to further their science, option that when missing can only slow progress down. That dumb people might misuse it really isn't a very interesting point.
And again, I reiterate that to the extent possible animal researchers do in fact try their best to get animals identical as possible, so I give you not only the theory I outline above, but the practice of biology as well, where carefully controlled standardized gene lines are used.
Macaques don't really offer those kinds of options, at least not that I've ever heard of. You just hope they're healthy, disease-free, relatively smart, and are easy to get along with.
This would be horrifying. Maybe it could work in some fields, but not in mine (neuroscience). What, I'm going to use Java v.<whatever> on Ubuntu v.<whatever> because a funding agency tells me I have to?
I agree with much of your post, but not this bit.
edit: I think much of the professional insecurity of my field -- which feeds into the stuff you're complaining about -- is driven by the fact that way more grad student positions are funded than there are academic jobs. I wrote about this in more detail here: http://news.ycombinator.com/item?id=470181
Successfully reimplementing the experiment on disparate platforms would liekly serve to support the findings even more. It might be more work up front for researchers to have to do the complete implementation in their preferred platform and then work out the kinks, but it might improve the actual science being done.
If you don't publish the code and I come up with a different result while supposedly using the same algorithm, data, and assumptions, how do we know where a discrepancy between my results and yours comes from?
Note that if you're taking public money, the code isn't yours. It's the publics. Don't like that? Don't take public money. (If you work for Google, Facebook, etc, they own what you do on their dime - same deal here.)
The majority of scientists go into science to add to the common pool of knowledge. For many, seeing their work being used in a positive way is extremely gratifying.
The pay walls are not the choice of most scientists. There is a push towards open-access, but it's not easy. To publish a single open-access article a scientist must pay thousands of dollars. For example, a single article in BMC bioinformatics costs US$1805. This must come out of already hard fought for and limited grant money, money that must also pay the salaries of many younger scientists still in training.
How many conference intros have we all heard that say, "X person has been published X times and in Y journals". Are you saying you've never heard advisors talk about splitting articles up and the like? How about the decline of repeating published experiments?
These aren't isolated incidents at all. Status seeking is in all of us, and I don't consider it a bad thing. The metrics just need to be aligned with the goals of the endeavor - furthering knowledge.
Currently, we have a system where most grad students are chattel slaves seeking PhDs for a tenure track positions that 95% won't be able to get. So competitive is the tenure system that people are doing the above things to have a chance.
It's like the idealist-activist who becomes a pragmatic-politician and learns along the way the cruel facts of life and the system. If that's an ignorant or insulting point of view, then welcome to humanity... ;-)
1. Spread FUD (Fear, Uncertainty, and Doubt) about the scientific results used to create evidence for global warming.
2. Observe that the training and skills of scientists processing data, building models, and drawing conclusions from data need to be improved.
3. Promote a very limited view of the scientific method where "replicating a result" means "accessing another scientist's data and computer programs and duplicating the processing that was performed". Independent verification usually means that a totally independent experiment is run to test the same hypothesis, new data is gathered and processed and a result produced which is compared with previous results (and those predicted by current theories). Verification means that the same phenomenon is observed at the same level modulo the statistics of measurement.
3. Also not correct. I simply don't believe that not releasing source code is the right answer. It's one group of scientists claiming to save another group from themselves. The argument appears to be that if they released the code others would run it and be satisfied with the result. So? That's just bad science and tells you something about the people who run the code. The solution isn't to protect idiots from themselves.
First, I don't think that we've learned how to make complex models yet. But in fairness, it's a really hard problem. If my numerical code is wrong, I won't get a segfault. Rather, I may notice "unusual" patterns in my model output, which could be:
- A genuine physical effect
- An artifact of the assumptions we used (because models are simplifications)
- A numerical method that hasn't converged, or whose accuracy is insufficient
- A bug
Untangling this is nigh impossible, unless you rely on very, very careful testing of independent parts. That's how the NASA does it , but it's simply not within the realm of what the typical physicist can/will do (and understandably so, numerics is hard).
The solution would be to have tried and tested libraries, built by numerical specialists, so that physicists would only have to specify the equations to solve. That's what Mathematica does, and it's the only sane way I know of making complex models.
But it's slow, so physicists use Fortran instead, and code their own numerical routines in the name of efficiency. Tragedy ensues. Fortran's abstraction capabilities are below C . Modularity is out of the window.
I spent a summer working on one particularly huge model, that had been developed and tweaked over twenty years. At some point I encountered a strange 1/2 factor in a variable assignment, and questioned my advisor about it.
"Oh, is that still in there? That's a fudge factor, we should remove it."
A fudge factor. No comment, no variable name, just 1/2.
Another scientist told me: "No one really knows anymore what equations are solved in there.", to which my advisor replied "Ha, if we gathered all the scientists for an afternoon, we could probably figure it out."
But I agree with the other posters and jgrahamc: the incentives for producing quality code and models are just not there. And sadly, I don't see them changing anytime soon.
 (At least, the subset of Fortran used by the physicists I've met. Modern Fortran is a bit different.)