I agree with the feeling, but a lot of the time, scientific software is just not ready for real usage. My software at least is completely useless most of the time (you have to open it in the interpreter and type the right incantations, and the data has to be in a non-trivial massaged way, filesystem paths are hardcoded everywhere, etc). It takes some effort to turn a works-for-me research tool into something releaseable, and this effort might even hinder future research, since the fluidity of the hacky code is sometimes necessary for further tweaking. Also, there are usually dozens of dependencies, some of them patched to work with that software, with different incompatible licenses, etc.
I think, however, that most journals should at least ask for a shell script that downloads the code and data, runs the experiments and regenerates the graphs and tables as seen in the paper. This is not always practical (a lot of papers deal with over a terabyte of data, for example), but it is so more often than not. At least for my papers this is a nearly-attained goal.
I absolutely understand your feelings on hacky code. Every academic produces hacky code, there are precious few who don't. I myself, when I started, did not want to release my code for the same reason.
However, once I began to realize that we were all on the same boat of HMS Hacked Together, that feeling began to dissipate. My advisor calls it "research code", and it's fine, because as academics, we're all used to it!
That's why I usually just ask for source. I assume the build won't execute on my Mac, and that's OK. I'm not really interested in running the tool, but explicitly finding out how you solved the problem.
I've asked people for code a few times, but my experience after getting it is actually that I don't really ask for it anymore, because I've never found it to help me. What I really want in most of the cases is a clear enough English writeup, perhaps with pseudocode, so that I can understand how they solved their problems, and ideally reimplement it myself. At least, that's the case if it's at a scale where that's feasible to reimplement; if they built something absolutely gigantic then it might be another story, but then their megabytes of messy research code I can't grok aren't very useful to me either, and I have no real choice but to wait for the cleaned-up release.
In short, I think "can this be reimplemented by a third party from the published literature?" is a better test for reproducibility than .tar.gzs are. And there's certainly a ways to go on that front, not least because in areas where 6-to-8-page conference papers are the norm, even well-meaning authors can't include enough details, and most don't get around to writing the detail-laden tech report version. But I guess I find code mostly useless for that purpose; it might as well be an asm dump for all the good I usually get out of it.
I agree with your TLDR;, but as you say, we're in a culture of 6-to-8 pages. I actually quite like the 8 page limit for most papers, it forces authors to a brevity of expression that aids focus, but you're right that details are the first thing jettisoned.
If I'm going to propose a probably impossible sea change, taking the baby step of saying "just show me what you've already done" instead of "now write another 12-20 page set of documentation" is the more likely of the impossible two :) In a perfect world, we'd have both!
And I think a part of the reason why this is worse with research code is that the meaty part of the code tends to be (at least in ML/NLP) a few equations from the paper, in a hacky and convoluted way, and unless you're a world-class expert on keeping track of indexes and one-greek-letter variable names, there's very little to get from the code to a well-written paper. I make an exception for tuning parameters, constants, tweaks, etc, but these shouldn't matter much anyway.
I was just typing my comment while you posted yours. I absolutely agree with this. Especially the "fluidity of the hacky code" - the environment that makes cool research happen is often the opposite of clean software engineering.
One thing that'd make it particularly difficult for me is that most of my early-stage experimentation is done in image-based environments with REPLs, not by editing code (Lisp, R; the Smalltalk people are also big on it). So I don't even have code to send! Well, some parts usually are in code, but it won't run unless you load it into my image and do the right thing with it. I have working images / image dumps, session transcripts, notes about what I did (some of which may be stuff I did by hand for the first proof-of-concept stage), etc. I find it a much more fluid way to work than code in a text editor, personally.
That is a huge part of my problem, as well. Sometimes when showing the code to an advisor/colleague I have to bring up an interpreter and type over 10 differend commands before something starts to work (and don't even get me started on the fact that half of my bookkeeping, logging, and plotting is done by made-at-the-time emacs macros).
It's been a few years since this was relevant for me, but my primary concern would be that in order to publish actual code I would either have to spend months cleaning it up, writing documentation etc., or spend the next year fielding support calls. Or both.
I agree with another poster here that having somebody else repeating the experiment with their own implementation is a better test for validity - if a second paper just copies the source code from the first and makes a few tweaks, mistakes could easily carry over.
I think, however, that most journals should at least ask for a shell script that downloads the code and data, runs the experiments and regenerates the graphs and tables as seen in the paper. This is not always practical (a lot of papers deal with over a terabyte of data, for example), but it is so more often than not. At least for my papers this is a nearly-attained goal.