
Failures of Deep Learning - stochastician
https://arxiv.org/abs/1703.07950
======
Houshalter
Just yesterday there was a big discussion on here about how academic papers
needlessly complicate simple ideas. Mainly by replacing nice explanations with
impenetrable math notation in an attempt to seem more formal. This paper is
very guilty of this.

E.g. page 5. They attempt to explain a really simple idea, that they generated
images of random lines at a random angle. Then labelled the lines positive or
negative examples, based on whether the angle was greater than 90 degrees or
not. Then they take sets of these examples. And label them based on whether
they contain an even or odd number of positive examples.

They take several paragraphs over half a page to explain this. Filled with
dense mathematical notation. If you don't know what symbols like U, :, ->, or
~ mean, you are screwed because that's not googleable. It takes way longer to
parse than it should. Especially since I just wanted to quickly skim the
ideas, not painfully reverse engineer them.

Hell, even _the concept of even or odd_ , is pointlessly redefined and
complicated as multiplying + or - 1's together. I was scratching my head for a
few minutes just trying to figure out what the purpose of that was. It's like
reading bad code without any comments. Even if you are very familiar with the
language and know what the code does, it takes a lot of effort to figure out
_why_ it's that way. If it's not explained properly.

The worst part is, no one ever complains about this stuff because they are
afraid of looking stupid. I sure fear that by posting this very comment. I
actually am familiar with the notation used in this example. I still find it
unnecessary and exhausting to decode.

~~~
FridgeSeal
May I ask what your background is? I come from a maths + stats background and
the "impenetrable math notation" is literally second nature to me now: I can
test and understand it with far more speed and ease than I do any programming
language.

Academic papers make the very reasonably assumption that you are familiar with
basic mathematical notation, and probably, the basics of your field.

It's intended audience already knows these things, criticising these papers
because essentially "I don't like the notation" is a pretty useless exercise.

Side note: maths papers used to be written with words instead of notation, but
we dropped that, because it was inefficient and difficult to read and
understand.

~~~
Houshalter
Math notation isn't inherently bad. Even in this example, it could be
justified. If it was accompanied by any sort of explanation of what it meant.

No matter how familiar with the notation, it's always going to be less clear
and take time to unravel it. "OK, I see the author is multiplying a bunch of
numbers together? Why are they doing this? [Reads over it 10 more times]...
Oh... it's to tell if it's even or odd. Why on Earth couldn't they just say
that.."

And literally every use of notation in this paper is like that. I was
incredibly fortunate they decided to stop at some point, and just say they
then draw a line with those parameters. They could have kept going and defined
a procedure for drawing lines. I would not be remotely surprised.

I believe it's entirely about signalling. Math notation looks professional and
academic. Just like describing yourself as "we". There's also something called
the illusion of transparency. It's a bias where people believe they are much
more understandable than they actually are. Like if you explain an idea to
someone, you will expect them to have a much higher chance of understanding it
then they actually do. I believe people that write papers are incredibly
guilty of that.

And every freaking academic paper is like this. So many papers pointlessly
give exact equations for a neural network. Instead of just saying "and we
created a 3 layer neural net with a softmax output trained withstochastic
gradient descent." But figuring out that's what the equations describe is
going to take 15 minutes and lots of confusion.

Really, imagine a programmer that explained his ideas entirely in code without
any comments. With one letter, undescriptive variable names. Who used as much
premature optimization as possible to obscure it further. I doubt such a post
would make it to the top of HN. But this crap regularly does.

~~~
yorwba
> If it was accompanied by any sort of explanation of what it meant.

Most of the paper is just text, explaining their results in plain English,
interspersed with technical vocabulary.

> "OK, I see the author is multiplying a bunch of numbers together? Why are
> they doing this? [Reads over it 10 more times]... Oh... it's to tell if it's
> even or odd. Why on Earth couldn't they just say that.."

Using powers of -1 to test for parity is a fairly standard trick, and they
explain it right at the bottom of page 2. Did you overlook that, were you
unfamiliar with this use, or is the explanation not clear enough? Only one of
these possibilities can be blamed on the authors.

Or maybe you were talking about the formulas at the top of page 4, were they
are multiplying two of those random parity functions together? That's not
really to check whether they are even or odd, it's to prove their mutual
orthogonality, which it says in the sentence above. In these transformations,
the notation using powers of -1 is very convenient, since it allows to apply
simple algebraic transformations. Those would have been very tedious had they
used a boolean even/odd indicator.

> They could have kept going and defined a procedure for drawing lines.

They do this: _Note that as the images space is discrete, we round the values
corresponding to the points on the lines to the closest integer coordinate._
They don't use notation for this, because it's not required for clarity.

Basically, I think the ideas in this paper are laid out pretty clearly for
someone who is already familiar with the notation and conventions used.
Research papers are primarily about communicating ideas to people working in
the same field, who don't need to have the basics explained to them. That this
makes the results hard to understand for a more general audience, even if they
are experts in their field, is unfortunate, but basically unavoidable. Adding
links to definitions and examples in introductory textbooks to every research
paper would be pretty awesome, but it shouldn't be the author's burden.

~~~
reader5000
The first author on this paper does have a great intro book on precisely this
area where the notation ramps up more slowly.

------
dmreedy
I think some of the most exciting and interesting work comes out of proving,
not just capabilities, but constraints for systems, be it Gödel, Shannon,
Aaronson, or any of the others in the smaller-than-desirable tradition of
those who say, "No". I think a better understanding what Deep Learning _can
't_ do (well) is fertile material for better understanding the kinds of
problems it _can_ do, and am very excited to see more work in this space, and
movement towards an underlying structural theory.

~~~
bitL
I think the issue with Deep Learning is that it is like a hyperdimensional
optimization heuristics sequence surpassing what human mind can comprehend and
pushing limits of computing (depth of 100 layers max at the moment). Given how
difficult are far more trivial optimization techniques and proving bounds for
them, it seems the times where we could just define a new approach and prove
some nice properties are behind us :-(

~~~
boxcardavin
I don't think the problem here is what the human mind can comprehend because
identifying where these methods break down is actually pretty easy from a math
perspective, and the breakdown doesn't change as you go up and down with the
number of dimensions. What has always surprised me about ML and DL is how far
we can stretch simple regression techniques and how useful it is on such a
variety of problems.

I agree that it has turned into an NP problem in a lot of cases now but the
concepts behind the problem of N-dimensional optimization are pretty well
understood.

~~~
curuinor
It's not like we don't know anything about NP complete problems, neither.
Critical phase transition in transition of alpha on random kSAT and other
stuff, the realization that this is neither necessary or sufficient for NP
completeness just a really common complementary phenomenon, etc etc

~~~
bitL
If you use 40M+ optimization variables and a really really bad optimization
technique (gradients), which is however fast, as is the case in DL, the amount
of nice practical things (e.g. limits) you might be able to say could be very
low.

Yes, we all were stunned when somebody pulled some awesome trick and found a
better upper/lower bound on something, yet sometimes it's better to be
realistic - maybe once we transfer to quantum computers, we can test more
limits practically when utilizing parallel power of QP.

------
csfoo
The lead author will be giving a talk on this work next week (which will be
live streamed and recorded) as part of a workshop on Representation Learning:

[https://simons.berkeley.edu/talks/shai-shalev-
shwartz-2017-3...](https://simons.berkeley.edu/talks/shai-shalev-
shwartz-2017-3-28)

------
smdz
Link to the PDF:
[https://arxiv.org/pdf/1703.07950.pdf](https://arxiv.org/pdf/1703.07950.pdf)

------
bra-ket
the biggest failure of deep learning is the lack of common sense

~~~
visarga
It is growing, gradually: ontologies, word embeddings, mechanical dynamics
prediction, but it will take some time. I don't know either why there isn't
more of a push to bring together all the common sense resources.

