
Troubling Trends in Machine Learning Scholarship [pdf] - nabla9
https://www.dropbox.com/s/ao7c090p8bg1hk3/Lipton%20and%20Steinhardt%20-%20Troubling%20Trends%20in%20Machine%20Learning%20Scholarship.pdf?dl=0
======
apathy
This paper is lovely (and the authors do not spare themselves criticism, which
is admirable in itself).

Mathiness is something that can usually be detected by reverse engineering the
work and seeing if a derivation was important. Unfortunately, I learned this
in my own discipline (statistics) by comparing how implementations actually
functioned to how they were originally derived and presented. My own
dissertation contained a rather terrifying bit of analytic derivation that,
upon finding suitably large samples, yielded to an approximation that was many
orders of magnitude faster. But hey, my committee loved that gamma.

The issue of reviewer scarcity is of a piece with all serious research. If it
was easy, somebody would have solved it already. Post-publication review is
the only sustainable model, and it's more beneficial for the reader if they go
in with a skeptical mindset anyhow. Journals and conferences simply cannot
serve as more than the weakest of filters.

~~~
eli_gottlieb
>Journals and conferences simply cannot serve as more than the weakest of
filters.

I can't help but think: if they aren't going to filter for rigor and quality,
at least they can filter for style, clarity and communication ability. If
we're going to basically start treating Real Journals as one tier above Arxiv,
then I want those review processes to help make sure that papers are actually
readable.

------
denzil_correa
I really liked the author’s main “Suggestion for Improvements”. It's important
to understand "why" and make reasonable claims rather than gouge for press
coverage.

> We encourage authors to ask “what worked?” and “why?”, rather than just “how
> well?”.

…

> Typical ML conference papers choose an established problem (or propose a new
> one), demonstrate an algorithm and/or analysis, and report experimental
> results. While many questions can be addressed in this way, for addressing
> the validity of the problems or the methods of inquiry themselves, neither
> algorithms nor experiments are sufficient (or appropriate).

------
zackchase
Thanks for sharing! Happy to discuss here if anyone wants to engage in some
(pre)debate. Also happy to butt out :).

~~~
backpropaganda
I wasn't as convinced by the mathiness and speculation/explanation arguments
as the others. Sure there may be a few examples of obfuscation via math, but
most papers, in my view, don't do that. Adding the couple of math equations
doesn't really hurt (it doesn't add anything either). I also interpret
explanations in most deep learning papers as speculation, since short of
rigorous experimentation or theory (not found in DL papers), there's no way to
prove explanation is correct.

Comparatively, the other problems are orders of magnitude more important and
troubling. Experiments which mislead people into thinking the technique
matters more than the compute (like in OpenAI's Glow paper) and intentional
anthropomorphization to get media coverage (OpenAI Dota bot's "cooperation and
teamwork") are much more serious and could have used more attention in your
paper. I think these are more serious because the former can fool even experts
in the field, and the latter seriously misinforms the lay public on the topic
of AI.

I'm glad you started this important conversation.

~~~
jlelonm
What do you mean:

> mislead people into thinking the technique matters more than the compute
> (like in OpenAI's Glow paper)

Are you saying that the results they got had more to do with the amount of
computing power they have, as opposed to novel techniques?

------
asafira
Ah, I saw this last night. Here is a blog post about the article:
[http://approximatelycorrect.com/2018/07/10/troubling-
trends-...](http://approximatelycorrect.com/2018/07/10/troubling-trends-in-
machine-learning-scholarship/)

------
tw1010
This is only going to get worse. I see literally no counter force putting any
kind of barrier against the incentive to e.g. overfit in order to get
exaggerated results. The lucrative reward is too large not to fall for the use
of shady tactics (which are sometimes not even consciously obvious; not being
critical enough of your own work is a form of model loosening which acts in
favour of more impressive results).

~~~
nonbel
This is everywhere in academia. Peer review is like the goggles in the face of
a seemingly neverending wave of BS:
[https://www.youtube.com/watch?v=juFZh92MUOY](https://www.youtube.com/watch?v=juFZh92MUOY)

Honestly who cares about these papers in the case of ML though? Its not like
running other experiments where you cant check it yourself without spending
$100k. Just share the code and explain it in a blog or readme. Its usually
some pdf with equations no one is solving, proofs based on assumptions that
dont apply to my situations, and some cherry picked overfitting of the cv.

------
tomkat0789
Reminds me of the Lighthill Report:
[https://en.m.wikipedia.org/wiki/Lighthill_report](https://en.m.wikipedia.org/wiki/Lighthill_report)

The full text exists somewhere online, but I can't find it on my phone.

Edit: summary to save you the click: "Published in 1973, it was compiled by
Lighthill for the British Science Research Council as an evaluation of the
academic research in the field of artificial intelligence. The report gave a
very pessimistic prognosis for many core aspects of research in this field,
stating that "In no part of the field have the discoveries made so far
produced the major impact that was then promised"."

~~~
thedailymail
For the curious, here's the full text:
[http://www.math.snu.ac.kr/~hichoi/infomath/Articles/Lighthil...](http://www.math.snu.ac.kr/~hichoi/infomath/Articles/Lighthill%20Report.pdf)

