Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would think most people would be unwilling to make their "crappy" code public because no matter how many disclaimers they provide with it, they will be judged by others on it.


Why on earth would anyone trust descriptions that they cannot verify?

Trusting without the ability to verify goes against everything scientific.

If you think your code is too "crappy" for publication, why do you believe it is bug free enough to produce dependable answers?


Re-running their crappy code and getting the same result they got doesn't really prove or verify anything. Re-implementing the algorithm they describe in the paper and getting the same result (or not) is far more interesting.


> Re-running their crappy code and getting the same result they got doesn't really prove or verify anything.

Yes it does.

Very often, the data selected for publication is cherry-picked. Running the same crappy code on a more complete data set, (or alternatively, on a partial data set) would give a very quick indication of the robustness of the results - and unlike re-implementing, might be doable in a day rather than months of effort.

Furthermore, when you actually re-implement (if you do), it is extremely helpful to compare intermediate results, which is impossible unless you have the original everything.

> Re-implementing the algorithm they describe in the paper and getting the same result (or not) is far more interesting.

Yes, but very rarely done in fields that are not CS or EE (and not very common in these either). Usually, results are just taken as gospel.

Also, there is a ridiculous amount of negligence (and even fraud) in publications. just running the crappy code, seeing the results, and having a cursory look at the code and data would reveal a lot of that.


> Why on earth would anyone trust descriptions that they cannot verify? > > Trusting without the ability to verify goes against everything scientific.

Hasn't this always been true about scientific papers? Descriptions can be verified by reproducing the experiment. Why is a paper any less trustworthy just because there's code involved?


The need for reproducibility in experiments is an accident of the fact that our universe is horrifically complicated and true reproducibility is a myth, thus we must make a deliberate, conscious effort to come as close as possible, or no progress can be made. When that is no longer true and it becomes possible to run (under certain constrained circumstances) fully deterministic experiments that can be freely replicated to the bit by anybody, it's time to rethink the assumptions made lo these many centuries ago.

People arguing against source code release often argue as if those of us in favor think that re-running the original simulation is the end-all, be-all of reproducibility. Clearly that is not the case. No one simulation can truly prove anything, and independent reverification will always have a place. But since we do have the source artifacts and original data, why not release them and show exactly what was done and how it was done? Again, the idea that experiments should not do so is merely an artifact of the fact that scientific papers could only be 10 very expensive pages or so in a journal; why carry unexamined assumptions based on that now outdated fact forward into the future?

Accidents of the past are nothing more than accidents of the past, not holy writ. And I'm not aware of a good argument against release of source code that doesn't boil down to well, that's just not how we do it when deeply examined.


> Hasn't this always been true about scientific papers? Descriptions can be verified by reproducing the experiment. Why is a paper any less trustworthy just because there's code involved?

It was always true to an extent.

Code is a force multiplier that makes it significantly harder to evaluate the paper with out it (and without reproducing an equivalent).

I'm not in academia myself, but I've heard from friends more than once that when they actually received code (and/or data) they requested from an author, the code turned out to be not precisely described in the paper, and the data is often massaged to fit in a way that's not precisely described either.

The question shouldn't be "why aren't you satisfied with what was good 20 years ago?", but rather "when sharing the bits that makes everything reproducible is a 'git push' away, why isn't it considered mandatory?"

It is a common error that science is about proving things; The scientific method is actually about trying to disprove things and failing to do so. If what you want to do is science, why don't you make it easiest to disprove your results?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: