Hacker News new | past | comments | ask | show | jobs | submit login

(Disclaimer: Author of the OP)

I absolutely understand your feelings on hacky code. Every academic produces hacky code, there are precious few who don't. I myself, when I started, did not want to release my code for the same reason.

However, once I began to realize that we were all on the same boat of HMS Hacked Together, that feeling began to dissipate. My advisor calls it "research code", and it's fine, because as academics, we're all used to it!

That's why I usually just ask for source. I assume the build won't execute on my Mac, and that's OK. I'm not really interested in running the tool, but explicitly finding out how you solved the problem.




(Disclaimer: I work in your lab. ;-))

I've asked people for code a few times, but my experience after getting it is actually that I don't really ask for it anymore, because I've never found it to help me. What I really want in most of the cases is a clear enough English writeup, perhaps with pseudocode, so that I can understand how they solved their problems, and ideally reimplement it myself. At least, that's the case if it's at a scale where that's feasible to reimplement; if they built something absolutely gigantic then it might be another story, but then their megabytes of messy research code I can't grok aren't very useful to me either, and I have no real choice but to wait for the cleaned-up release.

In short, I think "can this be reimplemented by a third party from the published literature?" is a better test for reproducibility than .tar.gzs are. And there's certainly a ways to go on that front, not least because in areas where 6-to-8-page conference papers are the norm, even well-meaning authors can't include enough details, and most don't get around to writing the detail-laden tech report version. But I guess I find code mostly useless for that purpose; it might as well be an asm dump for all the good I usually get out of it.


I agree with your TLDR;, but as you say, we're in a culture of 6-to-8 pages. I actually quite like the 8 page limit for most papers, it forces authors to a brevity of expression that aids focus, but you're right that details are the first thing jettisoned.

If I'm going to propose a probably impossible sea change, taking the baby step of saying "just show me what you've already done" instead of "now write another 12-20 page set of documentation" is the more likely of the impossible two :) In a perfect world, we'd have both!


This is also very true.

And I think a part of the reason why this is worse with research code is that the meaty part of the code tends to be (at least in ML/NLP) a few equations from the paper, in a hacky and convoluted way, and unless you're a world-class expert on keeping track of indexes and one-greek-letter variable names, there's very little to get from the code to a well-written paper. I make an exception for tuning parameters, constants, tweaks, etc, but these shouldn't matter much anyway.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: