
Critique of Paper by “Deep Learning Conspiracy” - rndn
http://people.idsia.ch/~juergen/deep-learning-conspiracy.html
======
zxcvvcxz
So you mean to tell me that rediscovering older work with faster computers and
strategically referencing (or omitting references) to make yourselves look
like the founding fathers of a resurgent field is a good way to get your
university-funded research lab snatched up for millions of dollars by Big
Companies?

And to add onto this, another academic is getting his jimmies rustled because
he didn't get the money his former PhD students did??

Heavens to Betsy!

\---

Everyone self promotes and does things in their best interests. There is no
clear divide between academic and industrial interests. Maybe in something
more pure where truths are evident (e.g. pure math). But not something like
machine learning, where your success and funding depends on armies of grad
students fine-tuning stuff like the number of "hidden units" in some over-
complicated model whose ultimate goal is to over fit a training set and cause
more hype, etc.

Nothing wrong with this though, progress happens continually, just not
linearly:
[https://en.wikipedia.org/wiki/Hype_cycle](https://en.wikipedia.org/wiki/Hype_cycle)

~~~
joe_the_user
Well,

What I'm curious about is how such hype cycles affect our understanding of
_whether this shit actually works_.

As far as I can tell, the "great advance" attributed to deep learning is doing
benchmarks better than other methods. Which seems to have the result of doing
the same old machine learning tasks more accurately than previously.

The main thing we hear about is more "edgy" stuff like learning video game by
oneself, describing pictures with phrases or giving answers to "philosophy".
But since these applications are easy to cobble together in a half-assed
fashion, is there really more here than incremental progress?

I wouldn't know one way or the other, so I'd like to find out.

~~~
kastnerkyle
I disagree with the "great advance" dig. If you took a time machine 5 years
ago and showed any of the recent advances in deep neural networks (without
showing the algorithmic techniques) people would say "this is AI". There are
huge fundamental gains happening every day where the rubber meets the road
with tasks that could feasibly be seen in the real world.

We are _inventing_ new "real world" benchmarks to try and counteract this (MS
COCO, dialog datasets, the big flickr datasets, translation generally), but
many approaches from the 90s were clearly right mathematically (as Jurgen
says) and just needed more data fuel. So it is obvious to go back and find
interesting ideas that didn't get their due _as long as_ proper attribution is
given. Profs also have their pet projects that didn't quite pan out, and often
want to breathe new life into a cool idea.

These things are only easy to half-assed cobble together in hindsight AND/OR
if you have expertise - having the knowledge and know how to input conditional
information, interpreting deep networks as modeling joint probability
distributions, etc. is just as much algorithmic design as any other task in
graphical modeling, statistics etc. Slapping a big deep convnet (or
feedforward net) on new datasets _IS_ easy, and usually not interesting
scientifically, but also doesn't get published and is reserved for the
blogosphere or bad ArXiV papers.

Incremental progress is 0.5% performance gains in major benchmarks like
ImageNet etc. - company PR (and university PR as well) will crow about this
but no one in academia really cares _unless_ it is accompanied by interesting
scientific ideas or fundamental questions being answered.

------
speechduh
So, for anyone not aware: Schmidhuber is _obsessed_ with this. He wrote an
enormous literature review of deep learning [0] basically because he felt that
people weren't crediting ideas enough. This isn't a one-off essay, for him,
he's been banging this drum for quite a while.

Not saying he's wrong, just FYI.

[0] [http://arxiv.org/abs/1404.7828](http://arxiv.org/abs/1404.7828)

------
jandrewrogers
While I do not have anything invested in Deep Learning, I do have a similar
reaction because I am familiar with the research from 10-20 years ago,
particularly around neural Turing machines. From that perspective, most modern
Deep Learning is essentially that older research with the primary novelty
being better marketing and _much_ faster computers. I can understand why
someone like Schmidhuber would be irritated by the apparent assignment of
credit to people who are essentially repackaging old computer science, given
how much Schmidhuber has done in the field.

DeepMind is a bit of an exception to this. At least one of the founders was
involved in quite a bit of original research way back then.

This phenomenon is common in theoretical computer science. Timing and
marketing matter a lot when it comes to getting credit for important
inventions. I've seen it many times.

~~~
sitkack
I go on the cite rant now and then again. It isn't so much for the attribution
itself, but that it breaks the knowledge graph. By not citing past or similar
work, these researchers prevent others from learning, exploring and ingesting
knowledge from a field.

A few of the ideas at play here are:

    
    
        * Wanting to appear more cutting edge than is actually the case
        * Limiting or strengthening patent applicability
        * Preventing loss of focus via competitors research
    

I think researchers should actually get penalized for having a deficient bib.

------
pmelendez
I started taking machine learning courses in 1999 until later 2001. One of my
professors (who had worked with Vapnik back when we didn't know if support
vector machines were a good idea) said that he didn't use ANN too much because
probably Hinton was the only one who knew how to used them.

I'm telling this anecdote because, even when I agree that we are forgetting to
mention a lot of names, that "PR" work that Hinton et all did, was necessary
(IMO) to bring ANN back to the mainstream area.

~~~
sonabinu
The Canadian research team seems to have kept the work going when others were
skeptical [http://www.thestar.com/news/world/2015/04/17/how-a-
toronto-p...](http://www.thestar.com/news/world/2015/04/17/how-a-toronto-
professors-research-revolutionized-artificial-intelligence.html)

~~~
pmelendez
Exactly my point... I think Hinton deserve a lot of merit for being "stubborn"
enough to find grants for a field that a lot of people have serious doubts
about.

------
sirseal
This is a good critique: it's important to cite the people who have laid the
early groundwork, regardless of how far in the past that work was done.

~~~
GFK_of_xmaspast
You gonna cite Newtown and / or Leibniz when you differentiate a function? Use
a zero? Don't forget to credit Brahmagupta!

~~~
fatjokes
Only if you were going to later cite yourself as a pioneer of calculus.

------
irickt
The author certainly seems accomplished, but his tone and egotism undercut his
message. For example from the front page of his site:

"His formal theory of creativity & curiosity & fun explains art, science,
music, and humor."

I've also read papers of his that take completely off-the-wall pot-shots at
other researchers.

~~~
joe_the_user
"Since age 15 or so, Prof. Jürgen Schmidhuber's main scientific ambition has
been to build an optimal scientist through self-improving Artificial
Intelligence (AI), then retire"...

"His formal theory of creativity & curiosity & fun explains art, science,
music, and _humor_."

Maybe he's a mad man and maybe he just hasn't tweaked his "theory of humor"
quite enough to know some people won't get it.

~~~
eli_gottlieb
What do you mean, _maybe_? Schmidhuber is insane, in the best possible way.
He's exactly the kind of person for whom academia exists: an incredibly
competent, devoted scientist who won't stand for bullshit cooked up by
marketing departments.

------
bachback
lecture of the topic by the author:
[https://www.youtube.com/watch?v=JSNZA8jVcm4](https://www.youtube.com/watch?v=JSNZA8jVcm4)

Several founders from Deepmind where his PhD students.

------
kastnerkyle
One thing which works against the "cite everything" approach is that most of
the major conferences have page limits of 8-10 pages with 1 page bonus for
references. That means if you go over 1 page of references for (at least NIPS)
then you cut into the meat of the paper, reviewers look on in disdain and give
poor marks, etc. So you have to actively prune for the most recent and
directly relevant citations many times, which sometimes counts out semi-
relevant but older work in favor of more relevant recent work.

Much of Dr. Schmidhuber's work is very interesting and _especially_ relevant
now that RNNs are really heating up again - but it is sometimes hard to figure
out exactly _which_ of his papers to cite because many are partially relevant.
And having a full page of only Schmidhuber citations is no good either...

Speaking as a member of the Montreal lab, I am much more up to date with the
work that happens here - so it is hard to fight the natural tendency to cite
recent papers you know (since they all came from work you know of, cause you
were _there_ ). Notice too that all 3 (Hinton, LeCun, and Bengio) worked
directly together at some point, and collaborated often beyond that. So a
version of this is in effect, whereas Juergen has been more separated (both
geographically, and work focus wise) than the other 3. NYU Toronto and
Montreal are all in an 8 hour triangle!

Not to take anything away from his points (I try to cite as many of his papers
as possible without seeming ridiculous, generally) but these are the general
factors at play. We cannot possibly cite every paper in the field, and shining
the light on new works can be more important than citing older work _AS LONG
AS_ there is no claiming as a pure innovation work that was already done "in
the nineties".

Claiming to improve some technique or take it from curious to usable is more
than fine - but given the recent deep learning hype even recent papers are
getting overshadowed by others claiming some new innovation which already
exists in _very current literature_.

Especially given the work that is coming out of industrial labs (Google, FB,
MSR, etc.) it is fairly frequent to see the same model being touted as new
(with minor citations if lucky) when the exact same technique first appeared 6
months ago. Being well-read is not an option as an academic - it is a
requirement! The PR machine of these companies is unfortunately very effective
at dominating the airwaves if you have competing or related work, especially
if you are not from a school with good press e.g. MIT, Stanford.

------
quietplatypus
I don't know.

On the one hand, at the conceptual level, unless you are at the cutting edge
of CS theory, I'm pretty sure almost anything else that is done in computer
science is a mere re-wording of something that was done in the 1970-1980s. So
there is no "holier than thou" at this level.

On the other hand, in terms of practical results in context, there are many
important consequences of being able to take old concepts and run them faster,
because the hardware has improved and well, generally, the entire world is
different.

A big part of "popularizing" a technique is having a good implementation that
takes advantage of advances in computing speed. So the author of the article
misses the practical value of popularizing.

At the same time, what he says is valuable because he touches on a fundamental
choice that we all make: do you want to be a groundwork layer or a
popularizer?

The problem of course is that groundwork layers are mostly forgotten, with
their contributions recognized posthumously, as that's how far out you have to
be to lay any new groundwork, and it's difficult to predict what will be the
foundation for the next hundreds of years.

It's not just deep learning, it's basically that anything that becomes popular
enough to be noticed here probably has a long history behind it, and if we are
to move forward we need to be in the headspace of those who had the sense back
then to form it, and not be in the space of popularizing or being the tool of
the popularizer.

------
murbard2
Right or not (I tend to think he is), it's incredibly shortsighted to write
such an article which is bound to make him look bitter and low status. If you
want to do this, you get a third party to it, come on...

~~~
eli_gottlieb
You basically _can 't_ make Schmidhuber look "low-status" within the subfield
of neural networks. He puts some weird stuff on his personal website, but his
work wins contests and his publication record speaks for itself.

~~~
murbard2
Status as in having a nature article centered around you and your work?

There's no denying that he is a brilliant pioneer, and his accomplishments are
indeed incredible, but that does not translate into status which is probably
very upsetting.

~~~
mafribe
There is an informal honour's code among scientists that you don't take credit
for other people's ideas. While it is forgivable for a PhD or postdoc to be
unaware of older work -- after all there's sooo much of it -- a senior
researcher should not act carelessly with attributions. And where they do,
it's useful to remind them that they have transgressed dearly held ethical
norms.

