Hacker News new | comments | ask | show | jobs | submit login
Troubling Trends in Machine Learning Scholarship [pdf] (dropbox.com)
113 points by nabla9 7 months ago | hide | past | web | favorite | 11 comments

This paper is lovely (and the authors do not spare themselves criticism, which is admirable in itself).

Mathiness is something that can usually be detected by reverse engineering the work and seeing if a derivation was important. Unfortunately, I learned this in my own discipline (statistics) by comparing how implementations actually functioned to how they were originally derived and presented. My own dissertation contained a rather terrifying bit of analytic derivation that, upon finding suitably large samples, yielded to an approximation that was many orders of magnitude faster. But hey, my committee loved that gamma.

The issue of reviewer scarcity is of a piece with all serious research. If it was easy, somebody would have solved it already. Post-publication review is the only sustainable model, and it's more beneficial for the reader if they go in with a skeptical mindset anyhow. Journals and conferences simply cannot serve as more than the weakest of filters.

>Journals and conferences simply cannot serve as more than the weakest of filters.

I can't help but think: if they aren't going to filter for rigor and quality, at least they can filter for style, clarity and communication ability. If we're going to basically start treating Real Journals as one tier above Arxiv, then I want those review processes to help make sure that papers are actually readable.

I really liked the author’s main “Suggestion for Improvements”. It's important to understand "why" and make reasonable claims rather than gouge for press coverage.

> We encourage authors to ask “what worked?” and “why?”, rather than just “how well?”.

> Typical ML conference papers choose an established problem (or propose a new one), demonstrate an algorithm and/or analysis, and report experimental results. While many questions can be addressed in this way, for addressing the validity of the problems or the methods of inquiry themselves, neither algorithms nor experiments are sufficient (or appropriate).

Thanks for sharing! Happy to discuss here if anyone wants to engage in some (pre)debate. Also happy to butt out :).

I wasn't as convinced by the mathiness and speculation/explanation arguments as the others. Sure there may be a few examples of obfuscation via math, but most papers, in my view, don't do that. Adding the couple of math equations doesn't really hurt (it doesn't add anything either). I also interpret explanations in most deep learning papers as speculation, since short of rigorous experimentation or theory (not found in DL papers), there's no way to prove explanation is correct.

Comparatively, the other problems are orders of magnitude more important and troubling. Experiments which mislead people into thinking the technique matters more than the compute (like in OpenAI's Glow paper) and intentional anthropomorphization to get media coverage (OpenAI Dota bot's "cooperation and teamwork") are much more serious and could have used more attention in your paper. I think these are more serious because the former can fool even experts in the field, and the latter seriously misinforms the lay public on the topic of AI.

I'm glad you started this important conversation.

What do you mean:

> mislead people into thinking the technique matters more than the compute (like in OpenAI's Glow paper)

Are you saying that the results they got had more to do with the amount of computing power they have, as opposed to novel techniques?

Ah, I saw this last night. Here is a blog post about the article: http://approximatelycorrect.com/2018/07/10/troubling-trends-...

This is only going to get worse. I see literally no counter force putting any kind of barrier against the incentive to e.g. overfit in order to get exaggerated results. The lucrative reward is too large not to fall for the use of shady tactics (which are sometimes not even consciously obvious; not being critical enough of your own work is a form of model loosening which acts in favour of more impressive results).

This is everywhere in academia. Peer review is like the goggles in the face of a seemingly neverending wave of BS: https://www.youtube.com/watch?v=juFZh92MUOY

Honestly who cares about these papers in the case of ML though? Its not like running other experiments where you cant check it yourself without spending $100k. Just share the code and explain it in a blog or readme. Its usually some pdf with equations no one is solving, proofs based on assumptions that dont apply to my situations, and some cherry picked overfitting of the cv.

Reminds me of the Lighthill Report: https://en.m.wikipedia.org/wiki/Lighthill_report

The full text exists somewhere online, but I can't find it on my phone.

Edit: summary to save you the click: "Published in 1973, it was compiled by Lighthill for the British Science Research Council as an evaluation of the academic research in the field of artificial intelligence. The report gave a very pessimistic prognosis for many core aspects of research in this field, stating that "In no part of the field have the discoveries made so far produced the major impact that was then promised"."

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact