
Troubling Trends in Machine Learning Scholarship - nabla9
http://approximatelycorrect.com/2018/07/10/troubling-trends-in-machine-learning-scholarship/
======
tw1010
This is only going to get worse. I see literally no counter force putting any
kind of barrier against the incentive to e.g. overfit in order to get
exaggerated results. The lucrative reward is too large not to fall for the use
of shady tactics (which are sometimes not even consciously obvious; not being
critical enough of your own work is a form of model loosening which acts in
favour of more impressive results).

~~~
chriskanan
A major factor is that machine learning is so conference deadline driven. Does
the main result look good enough? Submit it! Ablation studies are the last
thing you do, and often you are pressed for time due to the deadline. Since so
much work is done by PhD students, they are incentivized to get their work out
as fast as possible because they need to graduate and papers in top-
conferences play a huge factor in getting jobs in academia and industry.

I'm aware of multiple papers by top labs where their state-of-the-art results
are really the result of some very minor change (e.g., some pre-processing
technique, bigger images) that was swept under the rug in their paper. All the
math and the complex model had minimal impact on the actual numeric result,
with the actual reason being unpublishable in a top venue.

~~~
at-fates-hands
Your entire first paragraph makes me cringe. Coming from an academic/research
background, all of the things you mentioned goes against everything I was
taught in terms of being rigorous with your research.

Is this more evidence we have really entered the "click baiting" era? Where
the end result of the research is now secondary to getting published and into
big conferences to get some notoriety?

~~~
michaelt

      all of the things you mentioned goes against
      everything I was taught in terms of being
      rigorous with your research.
    

There are certain areas where "the system" asks people to do things right -
but provides vast rewards for doing things wrong, if you can avoid detection.

If you're a pro cyclist who isn't doping, good for you! But there's no medal
for finishing fourth; you'll be rewarded with medals and sponsorship cash for
doping undetectably.

Academics whose rigorous papers aren't _quite_ getting into top-tier journals
are in a similar situation.

------
drewm1980
"The ubiquity of this issue is evidenced by the paper introducing the Adam
optimizer [35]. In the course of introducing an optimizer with strong
empirical performance, it also offers a theorem regarding convergence in the
convex case, which is perhaps unnecessary in an applied paper focusing on non-
convex optimization. The proof was later shown to be incorrect in [63]."

I strongly disagree here. Researchers, please keep trying to prove that your
proposed optimizers are correct, even if you can only manage to prove it for a
subclass of problems, and even if it risks publishing an accidentally
incorrect proof.

~~~
AstralStorm
Yes, but why do it on paper when you can use an automated prover instead,
since you're proving properties of an algorithm?

It smells of laziness...

~~~
shoo
can you cite any non-trivial examples where people writing papers include
machine-checked proofs , in fields of study unrelated to the study of machine-
checked proofs?

------
joe_the_user
_" Failure to identify the sources of empirical gains, e.g. emphasizing
unnecessary modifications to neural architectures when gains actually stem
from hyper-parameter tuning."_

I think everyone is looking for gains coming from architecture modification
since more power and clever hyperparameter tuning can only go so far.

But given that any of systems requires hyperparameter tuning and that requires
a lot of time, it is inherently hard to distinguish between this and novel
architectures. If someone says "X worked for me but Y didn't", tuning could
always be the explanation.

It seems like one could only really scientifically distinguish two
architectures if the hyperparameter methodology was fixed. But that's about
the opposite of now as far as my amateur exposure to the field goes.

~~~
AstralStorm
You could also try to devise optimal hyperparameters by genetic algorithm (or
similar global optimizer) for every architecture with a big fixed number of
generations.

That should be apples to apples comparison... Or best to best. At the same
time fragility of hyperparameters can be evaluated.

I really appreciate the detailed analysis in the article.

~~~
bitL
Genetic algorithms, Bayesian optimization, simulated annealing etc. are like
101 optimization algorithms, they won't get you anywhere. Hyperparameter
tuning is as NP-hard as it gets in the same theoretical ballpark as cryptology
when it comes to measuring how demanding it is. In Deep Learning you are
basically doing meta-optimization, as the process of learning neural network
itself is already non-linear optimization (hint: Adam is a non-linear
optimizer); non-linear optimization is generally NP-hard unless restricted to
some trivial cases like quadratic programming; here you want to optimize over
an infinite set of already NP-hard problems.

~~~
nonbel
What are you saying goes beyond "101 optimization algorithms"? Are you saying
to put another "layer" of deep learning to tune the parameters of the first
"layer"? Doesn't this just add even more parameters to tune?

~~~
bitL
No, you don't put another layer of Deep Learning (well, you technically could
if you had some training data for optimization, but almost certainly you
don't).

You simply put another layer of (mixed-integer) non-linear optimization on top
of Deep Learning hyperparameters, i.e. the hyper-parameters like learning
rate, batch size, category weights, even the loss function composition, will
be the variables you optimize. The (mixed-integer) non-linear optimization
method will have its own set of parameters (they all do), so later you might
indeed want to optimize those as well, if you get hold of some hyper-computing
device allowing you to compress the first two steps to less than your
lifetime. You can probably do this kind of hyper^n optimization as deeply as
you like but I doubt it would lead you anywhere, given how terrible non-linear
optimization performance on general functions is and how short the existence
of Universe is.

~~~
nonbel
Sorry, its still unclear. Do you consider genetic algorithms, etc to be an
example of (mixed-integer) non-linear optimization? Or are you talking about
some other method?

>"You can probably do this kind of hyper^n optimization as deeply as you like
but I doubt it would lead you anywhere, given how terrible non-linear
optimization performance on general functions is and how short the existence
of Universe is."

In practice you don't need to find the actual optimum though, just do better
than the alternatives. This reminds me of "a single nn layer can approximate
any function". Sure, but in practice a reasonable approximation will arise
much easier with other architectures.

EDIT:

By "101" I thought you meant something like "low level" or "first thing you
learn/try". Did you mean something else?

------
nmca
I skimmed this, found it troubling and started trying to make a checklist of
does/don'ts to apply to my own work -

[https://github.com/N-McA/ml-paper-
checklist/blob/master/READ...](https://github.com/N-McA/ml-paper-
checklist/blob/master/README.md)

------
tjpaudio
"Mathiness" as the author puts it is one of my main complaints of many
academic papers, this is not at all limited to ML. I think we really need to
get away from typical math notation and start to present the calculations as
code. This is more readable to the modern practicing professionally and the
algorithm's degree of parallelization opportunities (or troubling lack
thereof) would be more clear.

~~~
ssivark
The paper needs to explain _what_ is being done, _why_ and under what
_assumptions_. As long as each of those are clearly explained, in my
experience, it doesn't really matter whether each of them is explained in
words/math/code. Yes, one form might be a little bit more work for people from
a certain background, but that's a hump you can get over relatively easily
after reading a few papers on the topic. The worst is when you have to guess
at what the authors might be doing and why they're doing it that way, and what
assumptions they might be operating under... and all they've done is dumped
their results within the page limit and before the conference submission
deadline.

------
sctb
Yesterday:
[https://news.ycombinator.com/item?id=17497235](https://news.ycombinator.com/item?id=17497235).

------
gcb0
isn't that true for absolutely everything in CS?

~~~
noelwelsh
Doing bad science is an issue that all scientific fields need to be vigilant
about (e.g. the "replication crisis" in psychology). It is particularly
relevant to machine learning at this point in time as i) the field is growing
extremely rapidly and ii) there is lots of $s on the table for those who
achieve commercialisable results. So the incentives to do bad science are
increasing while the checks against are weakening. These forces are not at
play in, say, theoretical CS.

~~~
ItsMe000001
> _e.g. the "replication crisis" in psychology_

Just an aside, it's not just psychology. When you google "replication crisis"
most articles say "... in science" for good reason.

[https://en.wikipedia.org/wiki/Replication_crisis](https://en.wikipedia.org/wiki/Replication_crisis)

When mentioning "replication crisis" I should probably add articles like this
one too: [https://www.insidehighered.com/news/2018/04/05/scholar-
chall...](https://www.insidehighered.com/news/2018/04/05/scholar-challenges-
idea-reproducibility-crisis)

~~~
thanatropism
That article is incredibly frightening.

> to conclude that, although misconduct and questionable research methods do
> occur in “relatively small” frequencies,

which is kind of expected in a career track that doesn't exactly come with
hedge fund-type money -- why would many people be dishonest for grants?

> there is “no evidence” that the issue is growing.

which might mean that the issue isn't in how science is practiced nowadays,
but that the scientific process (grants, publication, peer review, research
programs) has always been broken.

It's not _clear_ that any of these is the case, but it's depressing to see
"well, people are generally not crooks and the game has always been played
this way" as an optimistic view of the replication crisis.

~~~
nonbel
While outright fraud is probably rare, "questionable research
methods/practices" are standard. Most people doing them dont even know there
is something wrong with it. Pick any biomed or social science paper and I will
quickly find some indication of them (lack of blinding, weird unexplained
sample size changes, missing controls, failure to show a direct comparison of
the variables you claim are related, etc).

