
Deep Learning’s Impact on Image Processing, Mathematics, and Humanity - alfaiate
https://sinews.siam.org/Details-Page/deep-deep-trouble-4
======
backpropaganda
I question this meme that deep learning has less mathematical elegance or
interpretability than other machine learning model. It's simply not true. At
this point, it reads like a mindless copypasta with the possible political
intention to reduce AI funding and use, such as the EU regulation which
requires all algorithms to "explain" their reasoning. Either that, or the
author is just more comfortable with familiar old-school methods which makes
them think that those methods are more elegant. This push towards elegance is
unscientific and counterproductive, much like the mistaken push Einstein and
co. made for deterministic interpretations of quantum mechanics. We can now
understand that its motivation was simply due to a biased notion of elegance
due to familiarity with classical mechanics.

Examples of elegance in deep learning: The gating mechanisms of LSTMs (also
adds to interpretability); structure of convolutions, i.e. weight sharing,
translation invariance; the policy gradient theorem (Sutton 1999); dropout,
its relationship to biology and variational inference; Generative Adversarial
Networks, and connections to game theory, split-brain; the reparametrization
trick; the log-derivative trick; connectionist temporal classification. These
examples are only the surface. Depending on your specialization inside deep
learning, you'd find many more.

Examples of interpretability: Nguyen et al.'s Synthesizing preferred inputs to
hidden neurons, Zeiler et al. convnet visualization, Guillaume Alain's linear
probes of hidden layers, attention readouts in attentional models for machine
translation or speech synthesis, and many more. Ultimately, deep learning
methods are probabilistic, and decent deep learning engineers would be able to
tell you why a model is doing what it's doing by printing probabilities and
activation statistics, much like any other probabilistic machine learning
model.

~~~
maze-le
>> This push towards elegance is unscientific

I think you are right on that one. The Standard Model for example is
absoluteley not elegant, it is a huge convoluted mess of parameters and
constants (just google for "Standard Model Lagrangian"). Yet, it is the best
we came up with, explaining the dynamics of fields and particles[0], that
surround us. Correct answers don't have to be elegant per se, its nice when
they are, but it's not a prerequisite.

But, consider this: You can train a Neural Network to predict the distance of
an object, thrown with velocity v_0, an angle of α, under the influcence of
gravitational acceleration g. After a few hunderd rounds of training, a suited
NN can reasonably predict the outcome of said experiment, with the input: v_0,
α and g. And, as you pointed out, a researcher can explain to you why this is
the case based on activation, feedback-loops, learning algorithm and other
parameters. But neither the NN nor the researcher will be able to give you a
rule like: "F=(G m_1 m_2)/(r^2)" to explain the underlying reasons for the
object-trajectory dynamics.

A Neural Network can give you answers and predictions, yes, but you are not
able to incorporate them into a wider theory, since the output is always
numerical in nature. It is also always tied to one specific fact you are
interested in and cannot give you a generalization.

[0]: In the model of QFT particles are also fields of a different kind.

~~~
DrNuke
> But neither the NN nor the researcher will be able to give you a rule like:
> "F=(G m_1 m_2)/(r^2)" to explain the underlying reasons for the object-
> trajectory dynamics.

Come on, you can just fit the NN numerical outputs with Excel and come up with
tentative explanations for both the underlying elegant rule and the science.
The point with NN-based science being you are not left guessing much, instead
retrofit the NN numerical output and build a coherent narrative with other
accepted results.

~~~
kosmet
Without that equation, you cannot build the rocket that goes to the moon,
however.

~~~
DrNuke
Feeding the retrofitted "blurred" equation to mathematicians and let them come
up with tentative exact models, while being almost there already?

~~~
pshc
Anything that fits within the error bars is viable if you ask me. Whatever
works works.

~~~
sgt101
A lot of investors agree with you. Often they get very rich, especially if
they head for the beach before the fit breaks down.

On the other hand...

------
arximboldi
This is an interesting article. It made me think of the socio-political
consequences of the move towards neural networks.

Many of us, children of the personal computer revolution, were attracted to
computers for its empowering and democratizing effects. Just with a computer
you could build anything! You are as powerful as any of "them"! The
proprietary software model was developed to monetize end consumers by
enchaining them, but the free software model and the fertile communities of
the early Internet showed that the model of personal computation could not
simply be displaced by legalese.

But if we take the author's position on the impact of deep networks, the model
of computation is changing again.

If teaching deep networks is the new way to write useful programs, our brains
and personal computers are not enough... they are obsolete. We now need
clusters and, most importantly, massive access to data! This gives
extraordinary leverage to the hoarders of data and servers of the Internet,
the googles and the facebooks.

I'd like to think that maybe we are just at the beginning, the early
"mainframe neural networks" era. That we just have to wait until there is
enough technology and new markets discovered to build the "personal neural
network". That consensual, open and distributed ways of sharing data will
emerge and that new massively parallel computers will become affordable by the
masses. That the models of neural networks will become well divulged and
simplified and kids will be able to program them with their "Neural Basic"...

But at the moment the prospects don't look good. The Internet is becoming more
and more centralized, personal computers harder and harder to program and
there is a general "war on general computation" [1]. Even universities seem to
bedisplaced by the Internet Lords in driving neural network development...

Maybe it is already a good time to start thinking about what "Libre Neural
Networks" look like. And how can we get there.

[1]
[https://www.youtube.com/watch?v=HUEvRyemKSg](https://www.youtube.com/watch?v=HUEvRyemKSg)

------
frankmcsherry
There seems to be a lot of angst in the responses about how deep nets et al
"are too elegant; are too interpretable!".

It's important to remember that the intended product of science is not tools,
it is understanding. Deep nets may produce very elegant and interpretable
tools, from a very elegant and interpretable theory, but that is not what the
scientists are looking for. They are generally creating and assessing theories
_in their subject domains_ , which are then evaluated by the tools they have
built.

Deep learning does do a great job of providing a baseline ("your theory must
be this accurate to matter"), but it seems to do a much less great job at
extracting new understanding.

~~~
backpropaganda
Machine learning has departed from the venture of understanding-based science
much before deep learning [1][2]. Thus, singling out deep learning reeks of
motivated reasoning or just lack of awareness of the history of machine
learning.

Having said that, there still exists the lunatic fringe (it's a compliment)
inside deep learning who continue to work on the task of generative modeling
in the hope of "understanding the world" rather than "solving a task". Yoshua
Bengio and Yann LeCun don't miss an opportunity to impress on the world how
important unsupervised learning is.

Physics too has abandoned understanding for predictive power. If you take
philosophy of science seriously, you either have to pick "science is whatever
the elite society believes in (Kuhn)" or "science gives us tools for
prediction (Post-Kuhn)". Claiming that science helps in finding truth or
understanding, is going to put you in a very indefensible slippery slope.

"Understanding" is a subjective human concept, and not worth pursuing. Humans
evolved to run from tigers, and struggle with harder tasks like understanding
quantum mechanics or understanding the brain. The directionality of scientific
progress as evidenced by QM is not "understanding", but tooling and predictive
power. I hope Deep Learning is going to follow the path of predictive power,
rather than regress into the pseudoscientific mess of elegance and
understanding.

[1]:
[http://www2.math.uu.se/~thulin/mm/breiman.pdf](http://www2.math.uu.se/~thulin/mm/breiman.pdf)

[2]: [http://norvig.com/chomsky.html](http://norvig.com/chomsky.html)

~~~
bamboozled
"Understanding" is a subjective human concept, and not worth pursuing."

Isn't human understanding of how our environment works precisely how you're
able to sit in your chair and write comments on HN for others to read?

~~~
cercatrova
Sure but I might say that if I were to create a chair by generative methods,
would I need to understand how it was created for it to be useful?

~~~
bamboozled
If you have ever crafted a chair, table, or anything else that is built by
craftsperson, you will soon realise why such a profession exists, because it's
very difficult.

You might slap a few pieces of wood together and by brute force, create
something like a chair and that might do, but it's very unlikely to be a good,
safe, attractive, comfortable long lasting chair. Building quality furniture
is complex and difficult work which takes a lot of skill, experience and
understanding in order to do.

I understand your point, I never said there is anything wrong with
experimenting with NNs, I just don't agree with the OPs sentiment.

------
hnarayanan
I echo the feelings of the author of this piece, and I'm very wary of a world
where predictive power always trumps deeper understanding.

(Edit: I knew I'd expressed this feeling before:
[https://twitter.com/copingbear/status/825098385548009472](https://twitter.com/copingbear/status/825098385548009472))

------
return0
I have no doubt that some day (soon?) neural networks will be able to explain
themselves (using language). But the explanation they give will be less than
satisfactory to our aesthetic criteria. Occam's razor is a great aesthetic
rule, but it doesn't guarantee beauty. We will most likely find out that the
computational vision of the past had managed to find some rules covering some
cases, but it missed many more, most of which will be higher-order and not
that elegant. And it may turn out denoising was a rather messy problem, as was
probably speech recognition.

Drugs work but we don't know how a surprisingly large number of them work. Yet
chemists don't complain about them. Theoreticians should not complain either.
Deep nets give them a huge subject matter that probably hides many interesting
insights in it. I mean, don't convolutional layers look like gabor filters?

The question of what can be understood from them can go deep, to the limits of
the ability of language/math to express ideas.

~~~
nske
I don't see a problem in splitting all these domains to an "applied" school,
that focuses research into evolving solutions that currently seem effective in
approximating answers for certain engineering problems, and a more theoretical
school that has the potential to carve new revolutionary paths while pursuing
some (abstract?) ideal such as mathematical elegance. One shouldn't exclude
the other and apparently there are enough scientists around that are attracted
to either one of the two approaches.

------
NumberSix
Deep learning/neural networks -- as well as a number of other "machine
learning" methods -- is fitting a mathematical model with a huge number of
fitted (tunable) parameters and component functions to data. It has been known
for a long time -- arguably back to the discovery of the Taylor series
expansion in the early days of calculus -- that any function can be
approximated arbitrarily well by the composition of an arbitrarily large
composition of other functions.

If the task is interpolation between the data points this can be highly
accurate. It the task is extrapolation such as prediction or designing a truly
new machine or system in an engineering application, the approximation will
often fail. The about one percent error rate in predictions of planetary
motions from epicycles is one of the earliest cases of this problem.

The simple example is approximating data with a polynomial with an arbitrary
number of terms such as in the Taylor series expansion. With enough terms a
polynomial model can always approximate any data set arbitrarily well.
Polynomial models can interpolate very well unless the data has some generally
unusual behavior -- changing unpredictably at successively finer scales for
example. However, finite polynomial models almost always extrapolate grossly
incorrectly. As you move away from the data set in the space of independent
variables such as X, the largest power N in X^N dominates and the polynomial
approximation function blows up to either plus or minus infinity which is
rarely physical.

What we think of as "understanding" corresponds conceptually in part to the
ability to make accurate predictions. "Understanding" or "explanation"
corresponds mathematically not to some arbitrary super-complex function with
large numbers of arbitrary parameters but rather to a mathematical object such
as as system of differential equations (e.g. Maxwell's Equations for
Electromagnetism or the General Theory of Relativity) that express
interrelationships among the variables and data points.

~~~
ppod
Machine learning researchers are well aware of the problem of out-of-sample
predictions, in fact most of the work goes into making systems appropriately
generalizable by making architectures robust through dropout or
regularization. The resulting systems might not correspond to our intuitive
definition of "understanding", but they look more similar to what we know
about the human brain than an elegant function with a small number of
interpretable parameters.

------
swagonomixxx
First comment I make on HN, but I feel compelled to say something. This post
reeks of the author piping on about the "good old days" of mathematical
elegance.

Firstly, mathematical elegance is not an objective metric by any means. Many
mathematicians disagree on what results are most elegant, though there are
many commonalities as well.

Secondly, all of machine learning is quite elegant. I'm by no means an expert
since I only took two courses in my fourth year that were surveys of many
methods (quite mathematically rigorous btw) but I've taken enough math to be
able to judge what is elegant or not.

As the author himself says, he has slightly modified his research methods.
These days you'd be a fool to ignore deep learning, whether you have a deep
understanding of it or not.

~~~
matt4077
I felt the article was quite fair in acknowledging the intriguing results of
deep learning, and decidedly arguing against ignoring them.

"Elegance" can indeed not be defined, but I believe the misunderstanding here
is more about the point of view: Deep neural networks are plenty elegant when
looked at from an ML POV. But, if you're an expert in, for example, image
processing, using a DNN as a tool, the solution it may give you won't be
"elegant" by anybody's definition. It is, after all, a repetitive formula with
X million arbitrary floats.

Previous ML methods usually resulted in models, or formulas, that were small
enough to "grasp" intuitively. The "beauty", or "elegance" was that often, you
could find connections from terms in your model to the real world.

~~~
ganfortran
> Previous ML methods usually resulted in models, or formulas, that were small
> enough to "grasp" intuitively.

Which is not true in practice. SVM in reality is an instance based method,
with thousands to 10 thousands high-dimensional vectors as its 'parameters'.
Random forests, which is famous for its ease of interpretability is no easy
meal either when you end up with 100s of trees with outrageous branches. Not
mention that real world model are embarrassingly complex ensemble of smaller
but still complex models.

------
hprotagonist
Deep learning is a perfectly useful engineering solution.

As a tool for basic research, particularly in the biomedical realm, it's
awful. You get a system that performs pretty well most of the time, but tells
you nothing of interest.

That's fine, as far as it goes, but "performs pretty well" is so much more
useful in industry than "we know how X biological system works" that entire
fields that are interested in the mechanics of how perceptual biological
systems work and may be fixed are getting eaten alive. To our collective
detriment, I think.

------
projectorlochsa
Author mentions style transfer as a product of NN research.

Yet this paper [1] from 2014 works extremely well, no neural networks. Here's
one from 2008 [2] that worked extremely well too, better than most early NN
techniques.

Although, deep learning is far from the intuitive approach that existed 10
years ago. It's clearer now how to reason about models, layers, activation
functions etc. As much as there's no mathematical foundation, I do not believe
it helps that much in the case of SVMs or whatever else. There still has to be
experimentation, approximation and proper testing.

[1]:
[https://dspace.mit.edu/handle/1721.1/100018](https://dspace.mit.edu/handle/1721.1/100018)
[2]:
[http://link.springer.com/chapter/10.1007%2F978-3-540-88690-7...](http://link.springer.com/chapter/10.1007%2F978-3-540-88690-7_3?LI=true)

~~~
jph00
To claim that those techniques are even in the same ballpark as deep learning
based style transfer is just silly. They are extremely limited in their
application, as you can clearly see by looking at the examples. And the entire
approaches are overfit to the particular applications that they show, so they
can't be readily extended.

These are actually excellent examples of how deep learning has totally changed
our expectations of what can be done with computer vision by using deep
learning.

~~~
projectorlochsa
Deep learning based style transfer that got viral a year or two ago was worse
than this 2008 result, especially worse for that hone application to
portraits.

The hype was nowhere near just because there's no deepness.

I admit that
[https://arxiv.org/abs/1705.01088](https://arxiv.org/abs/1705.01088) this
beats everything and is extremely powerful and simple but the hype for deep
things seems a bit too strong.

------
leoc
> To put it bluntly, your grandchild is likely to have a robot spouse.

I'll take the other side of that bet.

~~~
johanneskanybal
I was already on the fence about the article being standard deep learning
click bait when I saw that :)

------
deepnotderp
In my opinion, this push towards "mathematical elegance " and
"interpretability" is hurting us and is unscientific. About interpretability,
I'll quote Yann LeCun here, you don't really care too much about whether or
not your taxi driver is interpretible, he's a black box. Not only that, but
there are a variety of new methods such as attention mechanisms, gradient
masks and auxiliary explainer networks that give really good interpretability.
And it's funny how this criticism never comes up for SVMs...

I also think it's unscientific to expect mathematical elegance from
everything. Math is ultimately a logical descriptor tool for AI, a means, not
an end in itself. Besides, there's actually a host of involved mathematics
underlying the seemingly simple SGD of deep nets. For example, tropical
geometry has been used to analyze the loss surfaces of ReLU networks and
random matrix theory has been employed to analyze the loss surfaces with
respect to the quality of local minima.

Tl;dr: deep nets are far more interpretible than they've been given credit for
and they're also more mathematical than some (such as this author ) would have
you believe.

------
crb002
Good Computerphile episode on understanding what is going inside a neural net,
[https://www.youtube.com/watch?v=BFdMrDOx_CM](https://www.youtube.com/watch?v=BFdMrDOx_CM)
. Looking at high information neurons to see what they pick up can yield a lot
of insight into the problem.

------
gaius
But this is true of neural nets as well, so I'm not sure why the author was
trying to contrast them. Essentially it is a question of do you want a system
that works well but that no-one understands or one that works not so well but
is understandable? But if you meta it, it's the same.

~~~
backpropaganda
I don't think even this dichotomy is real. Deep learning methods are both
performant and elegant.

~~~
baq
They're elegant only if the sentence 'consider a model with a billion
features' doesn't make you feel disgusted, I'm afraid.

~~~
trevyn
elegant, _adjective_ : (of a scientific theory or solution to a problem)
pleasingly ingenious and simple.

One easily-described approach that is state-of-the-art in a massive number of
domains? That's pretty f'ing elegant.

------
mwytock
Many methods in machine learning and statistics benefit from two types of
theory:

1) Optimization theory - which says, if I repeat this iterative method N times
I will find a (nearly) globally optimal solution

2) Statistical theory - which says, if I observe this process N times I can
accurately estimate a population quantity with high probability

Deep learning does not benefit from the same theoretical guarantees.

For the most part the response from the community is "but it works really
well!" which is a fair and valid response especially since what most
practitioners care about is predictive accuracy.

Personally, I find applying neural networks extremely annoying at times due to
the amount of twiddling of hyperparameters, slow convergence, etc.

------
linux_devil
" The facts speak loudly for themselves; in most cases, deep learning-based
solutions lack mathematical elegance and offer very little interpretability of
the found solution or understanding of the underlying phenomena."

I don't agree with this statement, a simple look at cs231n lecture series will
show you how much math is involved. A lot of articles/people claim its a black
box, but while writing a small network architecture you realise it's not.
Methods like stride, padding, activations, learning rate, optimisations, drop
outs etc. give you "aha" moment which is followed by a mathematical
explanation. One should study the topic thoroughly before criticising it.

~~~
gjulianm
Having a lot of math involved does not mean that it is mathematically elegant.
As a mathematician, I ask myself several questions. First of all, what is
really a neural network? Is it an approximating function? Is it a geometric
separation on a space, such as SVM? Is it a manifold classificator? (see
[http://colah.github.io/posts/2014-03-NN-Manifolds-
Topology/](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/), which
is very interesting)

Also, what are we approximating? Continuous functions? Non-continuous
functions? Are they even functions and not probability measures? Are those
functions arbitrary or do they represent something like a manifold?

And the most important: how well are we approximating whatever we want to
approximate? The universal approximation theorem gives uniform convergence for
measurable functions, but do not specify at which rate or depending on which
parameters. It is a strong theorem but not that surprising from the
mathematical standpoint, where you already know that you can approximate any
function by continuous, compactly supported functions.

Finally, how do you mathematically define the problems that arise in neural
networks? What is overfitting? How does the learning algorithm affect the
results?

The fact that some techniques are justified by mathematical explanations does
not mean that it is mathematically elegant. For it to be mathematically
elegant you should have at least clear definitions of the objects of study and
the problems you want to solve. I don't think this is the case in neural
networks.

~~~
linux_devil
"Having a lot of math involved does not mean that it is mathematically
elegant" I agree to your point, but at the same time how much effort is being
made to understand the intricacies of deep learning is my question. In my
opinion, those who dont understand these techniques bluntly say its a black
box but its not entirely a black box. I am confident if more reearchers start
to peel it off layer by layer , a lot more insights will be generated given
the field is relatively new .

------
ganfortran
> deep learning-based solutions lack mathematical elegance and offer very
> little interpretability of the found solution or understanding of the
> underlying phenomena

I disagree. Elegance and interpretability comes with one big flaw: assumption,
lots of them. While deep learning based methods assume few or not. If they are
so elegant why would them underperform? Probably it is because the problem we
are trying to solve doesn't follow the assumptions, like convexity, etc.

In that sense, one can say deep learning methods are even more elegant,
because they can work end-2-end. In that sense, NN can be a blessing, because
now we got the answer, we don't need to assume anything, but just to DECODE
it.

~~~
js8
There is a no free lunch theorem in learning, which says that without limiting
your hypothesis space, you cannot learn. That means, you need to have some
assumptions.

No doubt that DNNs are subject to the same theorem, the only question is, how
does their hypothesis space looks like? Does anyone have idea about this? I
suspect we don't really know what the DNNs assumptions are.

~~~
yorwba
The DNNs assumptions are "The true model can be fit by this network
architecture". If you use convolutions, then an additional assumption is
translation invariance.

~~~
js8
> The DNNs assumptions are "The true model can be fit by this network
> architecture"

Yeah, but that's tautological. What I mean can somebody sit down and for a
given DNN architecture, write down (at least approximately) the set of
functions that it can learn? Or more importantly, what functions it cannot
learn? Or at least, how many bits are assumed and how many bits have to be
learned?

I think that is what bothers people about DNNs. I personally think they are
sometimes even inefficient - we are learning them much more parameters (bits)
than the actual hypothesis space requires.

~~~
yorwba
> What I mean can somebody sit down and for a given DNN architecture, write
> down (at least approximately) the set of functions that it can learn?

For a two-layer architecture with ReLU activations and n units in the hidden
layer, this is the set of piecewise linear continuous functions with n kinks.

~~~
Klockan
How many of those do you need to tell the difference between cats and dogs in
random internet pictures?

------
pinouchon
> In its feed-forward architecture, layers of perceptrons—also referred to as
> neurons—first perform weighted averaging of their inputs, followed by
> nonlinearities such as a sigmoid or rectified-linear curves.

I have a really hard time keeping reading after this passage.

------
ngcc_hk
You can do regression

Exam score= cat_a + cont_b + err

but how do you say something like cf male female has 10 score advantage and
family income impact is ...

Can Galileo invent physics by doing nn

------
c0achmcguirk
> _" But what about us scientists? What is the true objective behind the vast
> effort that we invested in the image denoising problem?"_

This is the telegraph operator lamenting all the time spent learning Morse
Code when the telephone was invented.

I think scientists can be motivated by looking for new ways to solve real
problems. Don't weep over the time spent understanding the classic models.
Rejoice that we have a much better tool to further mankind.

