Hacker News new | past | comments | ask | show | jobs | submit login

In one way, I really agree with this initiative. I'd like to raise a counter argument, about replication, though.

In a nutshell, if someone has to reimplement the code, from the details in the paper they've read, its a great check that the original author isn't just reporting the results of subtle bugs, or particularities, in their software.

Its true that opening the source allows other researchers look for bugs in the code, and that's good. But such checks are inherently less thorough than instead having another group replicate the results, in a separate environment, just working from the published paper details.

One objection to this 'clean room' replication, is that perhaps its not practical to re-engineer the code that was written for a paper - thats just too much work. But that's basically saying "well, we cant replicate the results of this paper, its too much work" - generally, that is not the way you want to go about doing science. A cornerstone of science we are willing to accept slower short term progress, in return for more certainty that what we are doing is correct (and hence, hopefully, faster long term progress); painstaking replication is a key part of this philosophy, I think everyone agrees.

Consider the neutrinos: they did an experiment; others are trying to suggest reasons for the surprising results. In the unlikely event the result lasts, other scientists are going to want to redo the experiment. Ultimately, its not enough to analyze the data, or experimental setup; you want to replicate, and ideally, from the ground up - in a different lab, with different equipment - and, with different code to process the results.

Now, maybe the manifesto is aimed at a world where its not possible to replicate based solely on whats in the paper - perhaps there's just too much detail that can't be included: and hence, a lot of the 'science' is inherently tied up in the detail of the code. I'm sceptical about whether thats a good way to do our science - but if we decide to go down that route, then the 'paper' as publication, the de facto unit of scientific output, is something we will also have to really rethink; and reviewers are going to have to be responsible for signing off on the code, which they don't generally currently do.

I think there's an argument, currently, for telling people "I could give you the code - but it'll only take you a couple of days to write your own code to replicate the results, and it'd be much better if you could do that" - but I'm sure this varies drastically across domains.

I'm not sure where I stand overall - but I don't think its black and white.




> I think there's an argument, currently, for telling people "I could give you the code - but it'll only take you a couple of days to write your own code to replicate the results, and it'd be much better if you could do that" - but I'm sure this varies drastically across domains.

While I can definitely see a PI setting this kind of task to a student, I don't think that by releasing the code the task suddenly becomes impossible. If (and when) the two independently written programs disagree, it will then be possible to immediately step through each and figure out where the point of divergence happens, and which (if either) implementation is more likely correct, rather than then starting an email chain with the original authors saying, "we get different results, but we don't know why".


If people continued to write independent programs, purely to replicate and verify claimed results, then certainly it'd be beneficial for them to have the source, in order to track down where the discrepancy arises. This is often how it works currently, in practice, in my limited experience; you mail the authors and ask them to help you track down the problem, and get either code, or support; but I acknowledge this wont always work.

My concerns are that 1) I don't trust people to do the hard thing, and re-implement to replicate, rather than take the easier way, and just use the code that's provided. There's very little credit currently given for replicating existing (even recent) results.

2) More importantly, when replicating a paper, thats a little vague, it'll be more and more tempting to just peek at the source; and suddenly the paper isn't the document of record anymore. I feel that if we go down the route where the code becomes the detailed documentation of the scientific process, that's a very fundamental shift from the current model, where the paper is supposed to be repeatable, in and of itself.

If we go down that road, we probably need a whole different review infrastructure; are reviewers really going to spend the time to review large and hastily written scientific codebases?

I doubt it; so how does review work when: "The code is the only definitive expression of the data-processing methods used: without the code, readers cannot fully consider, criticize, or improve upon the methods." Will it no longer be possible to criticise a paper for lacking sufficient detail to reproduce the results? Will the reply be 'read the source' ?

Maybe that's just the way things are going to go. There's a lot to like in that manifesto. But there's going to positive and negatives to letting the source become the documentation. The discussion around the manifesto on their website does not acknowledge such tradeoffs; its taking a pretty one-sided view. Maybe that's just how you are supposed to write manifestos :-) But I'd like to see some discussion of these tradeoffs.


> But such checks are inherently less thorough than instead having another group replicate the results, in a separate environment, just working from the published paper details.

Not so. Research into "N-version programming" shows that independent implementations tend to have bugs clustered in the same subsystems.

Besides, if scientists can invent a way to reliably communicate requirements in a 12-page paper in Nature then I have a requirements document in Brooklyn I'd like to sell them.


Most academic code is poor quality and not easily extensible/wrappable. I suspect many people would still develop their own code.

People might be more lazy in some situations, I agree, but at the same time the number of eyes per lines of code will increase, which may be a net gain.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: