Mathiness is something that can usually be detected by reverse engineering the work and seeing if a derivation was important. Unfortunately, I learned this in my own discipline (statistics) by comparing how implementations actually functioned to how they were originally derived and presented. My own dissertation contained a rather terrifying bit of analytic derivation that, upon finding suitably large samples, yielded to an approximation that was many orders of magnitude faster. But hey, my committee loved that gamma.
The issue of reviewer scarcity is of a piece with all serious research. If it was easy, somebody would have solved it already. Post-publication review is the only sustainable model, and it's more beneficial for the reader if they go in with a skeptical mindset anyhow. Journals and conferences simply cannot serve as more than the weakest of filters.
I can't help but think: if they aren't going to filter for rigor and quality, at least they can filter for style, clarity and communication ability. If we're going to basically start treating Real Journals as one tier above Arxiv, then I want those review processes to help make sure that papers are actually readable.
> We encourage authors to ask “what worked?” and “why?”, rather than just “how well?”.
> Typical ML conference papers choose an established problem (or propose a new one), demonstrate an algorithm and/or analysis, and report experimental results. While many questions can be addressed in this way, for addressing the validity of the problems or the methods of inquiry themselves, neither algorithms nor experiments are sufficient (or appropriate).
Comparatively, the other problems are orders of magnitude more important and troubling. Experiments which mislead people into thinking the technique matters more than the compute (like in OpenAI's Glow paper) and intentional anthropomorphization to get media coverage (OpenAI Dota bot's "cooperation and teamwork") are much more serious and could have used more attention in your paper. I think these are more serious because the former can fool even experts in the field, and the latter seriously misinforms the lay public on the topic of AI.
I'm glad you started this important conversation.
> mislead people into thinking the technique matters more than the compute (like in OpenAI's Glow paper)
Are you saying that the results they got had more to do with the amount of computing power they have, as opposed to novel techniques?
Honestly who cares about these papers in the case of ML though? Its not like running other experiments where you cant check it yourself without spending $100k. Just share the code and explain it in a blog or readme. Its usually some pdf with equations no one is solving, proofs based on assumptions that dont apply to my situations, and some cherry picked overfitting of the cv.
The full text exists somewhere online, but I can't find it on my phone.
Edit: summary to save you the click:
"Published in 1973, it was compiled by Lighthill for the British Science Research Council as an evaluation of the academic research in the field of artificial intelligence. The report gave a very pessimistic prognosis for many core aspects of research in this field, stating that "In no part of the field have the discoveries made so far produced the major impact that was then promised"."