
Deep Learning – The “Why” Question (2018) - YeGoblynQueenne
https://blog.piekniewski.info/2018/10/13/deep-learning-the-why-question/
======
debbiedowner
Everybody knows that biologists are the ones doing real science. They look
under microscopes and see how life works firsthand.

But any chemist will tell you that everything the biologists see is really
down to chemical reactions, all those behaviors are the function of electrons
being transferred and molecules changing shape. Yes, chemists are the ones
doing real science. They weigh things, measure the temperature and pressure of
their reaction vessels, and see the effects of all the chemicals that interact
firsthand.

But any physicist will tell you that everything the chemists see are down to
electrical interactions, they their chemical models are just the crude
implementations of the laws of nature that only physicists understand,
studying them in colliders and syncotrons and seeing the fundamental particles
that make up the universe. Yes, everybody knows that physicists are the ones
doing real science.

But any mathematician will tell you that the physicists simply take the most
beautiful equations and strip away all the elegant details to a crude
approximation, and say these basic objects represent well the simple
experiments and measurements they do. Only mathematicians who understand the
fundamental rules and symmetries of the universe can truly understand the
universe through the study of equations that reflect how everything works. Yes
everybody knows that mathematicians are the only ones doing real science.

But any Logician can tell you that a mathematician is just applying axioms and
predicates willy nilly to get to some expression that piles together
fundamental truths in such a spaghetti heap that the beauty of the real
knowable rules of existence are lost. The mathematicians merely apply the
findings of logicians to reach much more trivial and applied goals. Yes,
logicians are the ones doing science.

But no. EVEN a biologist, knows, a logician does not do science.

~~~
denzil_correa
[https://xkcd.com/435/](https://xkcd.com/435/)

:-)

------
AlexCoventry
SATnet is an interesting paper, in this regard. It's built on a theory of
backpropping through the semi-definite programming relaxation of a MAXSAT
problem, to learn the MAXSAT problem associated with a class of training
examples. It gets good results: learns to solve sudoku puzzles with only
binary feedback on its performance (i.e. "that's the solution" or "that's not
the solution", no partial feedback on partially-correct solutions). They say
it's the first time a neural net has been able to do that. On the other hand,
if you look at the coefficient matrix they learn for the MAXSAT problem, it
doesn't correspond to any logical proposition because the overwhelming
majority of the learned coefficients are close to 0, whereas in the
representation of a MAXSAT problem, the coefficients must be -1, 0 or 1, where
0 means the corresponding variable doesn't occur.

Is it a major result? Well I'm certainly impressed. Do we understand what the
network is doing? I don't think we do... I'm still glad to have read the
paper, though. I don't know what standard you would hold researchers to, to
let through results like SATnet but reject spray-and-pray results which only
look good because of offline multiple sampling. If you could require a lab
notebook of authors, detailing every step of development, that might work.

[https://arxiv.org/abs/1905.12149](https://arxiv.org/abs/1905.12149)

------
martindbp
Call it what you will, but from what I can see progress exploded when
theoretical justification was thrown out the window. Maybe intelligence just
is messy at its core? Science is all about induction, but what if there are no
simple rules to break this down?

The only successful intelligence we know of came about through evolution, not
science. The way I see it, researchers are now doing what evolution did, but
in a more guided way and much more quickly. Whether you call that science or
engineering or tinkering, it doesn't really matter.

~~~
otabdeveloper4
> progress exploded

Did it? Deep Learning provided us with some mindblowing performance art
(FaceApp, DeepFake, etc.), but as far as I know any actual business value that
claims to be derived from 'AI' is fake news.

~~~
lukasLansky
Voice recognition, translation, OCR, image classification, etc. is done
through neural networks mostly nowadays.

If
[https://en.m.wikipedia.org/wiki/Google_Translate](https://en.m.wikipedia.org/wiki/Google_Translate)
does not seem like having business value to you, then I don't understand what
you mean by "business value".

~~~
otabdeveloper4
> If
> [https://en.m.wikipedia.org/wiki/Google_Translate](https://en.m.wikipedia.org/wiki/Google_Translate)
> does not seem like having business value to you, then I don't understand
> what you mean by "business value".

There's no way Google Translate will ever be monetized. So yes, it's
effectively an expensive art project for Google, not something that brings
real business value.

Hotword detection ('ok google') is a practical application of AI, but that's
really slim pickings considering how hyped 'AI' was. (Also, the costs of data
collection for this feature are still way too high.)

~~~
lukasLansky
I think we are really torturing language here. I don't want to get into
specifics of what constitutes business value, how to value brand and customer
satisfaction and so on. I can add more examples like
[https://deepmind.com/blog/article/deepmind-ai-reduces-
google...](https://deepmind.com/blog/article/deepmind-ai-reduces-google-data-
centre-cooling-bill-40) that shows how neural networks helps in the pure
bottom-line sense you care about.

Let me close my argument: neural network techniques developed in the last ten
years are super useful here and now, they are used by billions of people every
day and they do make their lives easier.

~~~
otabdeveloper4
> I think we are really torturing language here.

Not really. My point is simple: as of 2020, any proposition for a business to
'invest in AI' is a money-losing proposition. (Just like 'blockchain'.)

------
Barrin92
It's one of the fundamental critiques Chomsky already brought up decades ago
with the ascend of statistical methods in linguistics.

They may be useful in the sense of generating commercial value, just like ML
is useful to random-walking you to some solution that you couldn't have come
up with, but it is not science.

There is little insight to be gained from this, it is more guesswork and art
than anything else, and likely at some points the practical results will
diminish as soon as one stumbles about more fundamental problems and has no
model or structure to reason with.

ML is essentially behaviourism on steroids.

~~~
nmfisher
My problem is that Chomsky seems ardently committed to the "botanical"
approach to natural language as the One True Way (TM).

By this, I mean the assumption that there is some universal logical structure
to human language, and that our job is simply to come up with the labels for
each constituent part and arrange them all correctly like a jigsaw puzzle.
Once the puzzle is complete - boom! We've "solved" language.

That would have been a reasonable starting point back in the 60s, but I don't
think it's advanced the field nearly as far as statistical methods.

Chomsky would retort that "advancing the field" is meaningless unless it
allows us to "understand the field". But if natural language is a largely
arbitrary collection of rules in constant flux, there would be very little of
this "understanding" to be had. You could be bashing your head against the
wall for centuries trying to induce structure into a bunch of symbols where
_none was ever to be had_.

Now I do believe that purely statistical methods will hit a wall - the same
way you could never teach a baby to communicate by throwing it a copy of
Wikipedia and nothing else.

But if some parts of language are essentially arbitrary, then statistical
methods are the best available tool to uncover that.

~~~
sytelus
Language is infinite but the generating process (i.e. our brains or computers)
are finite. The generators come in two flavors: rigid rule based (like Chomsky
suggests) or stochastic (more modern view). The stochastic generators are
obviously more powerful but difficult to "understand". Unfortunately, we
associate the word "understand" with ability to have complete list of static
rules. In my opinion, probabilistic rules are also as much "understable". Then
there is dynamical probabilistic generators which I think our brains aren't
evolved enough to truly "understand".

------
hcta
This seems to be more a broadcasting of the author's values than anything
else; the part of it which is objective observation is IMHO obvious.

I think there is an appropriate generic response to critiques like this, which
is that people are generally doing their best, and they may have different
motivations for getting into the field than you do, and this diversity is a
good thing. Probably every nascent science has its Faradays and Maxwells, who
complement one another.

------
YeGoblynQueenne
There was a similar discussion in another thread, someone pointed to the
piekniewski.info blog and I found that article that I think is spot-on (though
I don't like the blog overall because it seems to have a real bone to pick).

Scientific progress is marked by hypotheses that explain observations and
generate predictions that can then be verified (or falsified) by new
observations. Machine learning research has produced scant few such verifiable
hypotheses in its very long history. To a great extent the same goes for the
entire field of AI. I quote from John McCarthy's article on the Lighthill
Report [1]:

 _Much work in AI has the "look ma, no hands" disease. Someone programs a
computer to do something no computer has done before and writes a paper
pointing out that the computer did it. The paper is not directed to the
identification and study of intellectual mechanisms and often contains no
coherent account of how the program works at all. As an example, consider that
the SIGART Newseletter prints the scores of the games in the ACM Computer
Chess Tournament just as though the programs were human players and their
innards were innacessible. We need to know why one program missed the right
move in a position - what was it thinking about all that time? We also need an
analysis of what class of positions the particular one belonged to and how a
future program might recognize this class and play better"._

McCarthy wrote that in 1974. The criticism is every bit as valid today as it
was back then. Machine learning and AI research remains an endeavour that is
rarely and only incidentally scientific. Despite 70 years of AI research and
the recent amazement at the "progress" in machine learning (a "progress" only
in the context of the field's own measures of progress) we have learned very,
very little from AI research that we didn't know already. Big machines can
compute big programs. So, what?

_________

[1] The quote is from McCarthy's review of the Lighthill report. The review
was published in the journal Artificial Intelligence Vol. 5, No. 3, 1974. A
pdf copy is here:

jmc.stanford.edu/artificial-intelligence/reviews/lighthill.html

McCarthy is best known to computer scientists as the father of Lisp but he is
also one of the founders of AI and the man who named it. The Lighthill Report
was a report by James Lighthill on the progress of AI research, commissionned
by the UK government and published in 1973. It was extremely negative and
caused AI funding to freeze for years, thus bringing on the first "AI winter"
and bascially killing Good, Old-Fashioned, logic-based AI.

~~~
syrrim
Science isn't about understanding how something works. It is, as you say,
about generating and testing hypotheses. In the case of machine learning,
those hypotheses take the form of a program and an associated criteria. The
hypothesis is verified by demonstrating the efficacy of the program. That a
machine learning specialist is able to consistently produce effective programs
demonstrates that they have a good understanding of the relationship between
the program and its functioning. This is true even if they couldn't explain
how the program "works" to the satisfaction of an observer.

~~~
YeGoblynQueenne
>> In the case of machine learning, those hypotheses take the form of a
program and an associated criteria. The hypothesis is verified by
demonstrating the efficacy of the program.

This is the first time I hear anything like that. Can you say where this idea
comes from? I've never seen it mentioned in any machine learning paper or
textbook etc.

------
abhgh
Recommend the paper "A Pendulum Swung Too Far"[0] as a nice take on
Rationalism vs Empiricism. Interestingly, the paper was written in 2007 when
the current crop of deep learning methods were not around (the provocative
Hinton/Sutsekever/Krizhevsky paper was in 2012) ... so the pendulum has swung
even farther! To the point that, as the post points out, we don't even have
good statistical justifications for network architecture design choices. We
have some answers, for some choices, but mostly a compendium of techniques
empirically validated to work very well.

[0] [http://languagelog.ldc.upenn.edu/myl/ldc/swung-too-
far.pdf](http://languagelog.ldc.upenn.edu/myl/ldc/swung-too-far.pdf)

------
mtgp1000
>What does that tell us? A few things, first the authors are completely
ignoring the danger of multiple hypothesis testing and generally piss on any
statistical foundations of their "research"

This is a really poor take, considering I make the same gut feel choices
professionally and they generally work in production.

This is still a brand new field and we fundamentally don't have answ re as to
why many of these tweaks work better than others. That shouldn't stop someone
from publishing a novel architecture or improvement.

~~~
7532yahoogmail
I do not object to publishing a better result. Like the paper said serious
craft goes into that and there's nothing wrong with typing it up and having it
published. But it's not science; one day one time some how some way the black
boxes have got to open up to address why. This is a very valid question and on
point criticism. Even in basic software development why (eg requirements)
should be known. It's not consequence free free to proceed otherwise.

~~~
mtgp1000
>But it's not science

That's just not true. We are probing a novel domain.

In fact how else would you expect this to proceed? We've discovered a new
phenomenon, which likely requires novel mathematics, yet through this exact
kind of experimentation we are building the intuition that will guide more
rigorous formalization later.

Sure, I get it, the quality on arxiv isn't the same as some physics journal;
but to dismiss this as unscientific is not only wrong but very much unfair.
I'm working the cutting edge at work, and we're documenting our discoveries as
we map the structure of a new frontier - if that isn't science, I don't know
what is.

tldr this is how science progresses in new domains before novel mathematics
and formalisms are developed to address the new class of problems. This is a
really exciting time if neural nets don't hit any serious blocks.

Edit: and by the way, my coworkers are all graduated educated scientists from
various backgrounds. What else are they doing if not science?

------
fabmilo
Wait a second. I am missing something. The definition of science is:

sci·ence /ˈsīəns/ noun the intellectual and practical activity encompassing
the systematic study of the structure and behavior of the physical and natural
world through observation and experiment.

I might be bias (as I write such GPU rigs), but from I don't see how is not a
scientific process trying different architecture configurations.

I like the Sturgeon's Law reference from this thread.

~~~
xapata
I'd rephrase the critique to say that you're doing _descriptive_ science. It's
like a naturalist drawing pictures of birds. The most interesting work comes
later, when someone suggests a theory to explain it all.

~~~
sdenton4
There's lots of theories! It's just that there's not really sufficient
mathematics available to determine which ones are provably correct. (My
favorite example of this is the seemingly never-ending arguments over the REAL
reason that batch normalization works...)

And there are real advances happening because of advances on the theory side,
as well: I would say that ResNet is an excellent example of this, bringing
some insights from differential equations into model architecture, and greatly
advancing the quality of classification.

All that said, I do worry that 'fundamental science' advances might get
overlooked if they don't contribute to new high scores. For example, if you
can prove that a certain model will 'work' (for some value of 'work'), the
model may be hobbled by the need to prove things, and thus not competitive...
In which case, the proof techniques might be lost in the howling void of the
arxiv.

------
Veedrac
Was Edison's invention of the light bulb not science because Thomson hadn't
discovered the electron yet? Should Edison have politely waited for a quantum
description of charge before daring to build something with it?

The fact that we don't fully understand why neural networks are so effective
does not imply it's not science.

~~~
burntoutfire
> Was Edison's invention of the light bulb not science because Thomson hadn't
> discovered the electron yet?

Umm... yes it wasn't? Invention of a light bulb was not a scientific
discovery. No new physical laws were discovered in the process. You can invent
something useful by bruteforcing through many trials and errors or even by
just plain luck, without creating any new scientific understanding along the
way. (Btw Wikipedia does not even regard Edison as a scientist.)

------
cs702
For many and perhaps most state-of-the-art models, the answer to "why?" is
"because it works."

As a practitioner, I would say Deep Learning is _a trade_. The more you work
with these models, the more you develop intuitions and habits for things that
work and things that don't, like a sculptor who learns to craft beautiful or
functional objects by chiseling stone. ("Why did you hit the marble that way?"
"Because it works.")

If we want to be more generous, we can call Deep Learning _an experimental
science_ , because some researchers working with deep models are truly doing
methodical, tedious, experimental work and documenting it for posterity. They
and everyone else, including me, hope that we will eventually be able to
answer the "whys." But there's no guarantee, a priori, that we will find
satisfactory answers.

Quoting Rich Sutton: "the actual contents of minds are tremendously,
irredeemably complex; we should stop trying to find simple ways to think about
the contents of minds, such as simple ways to think about space, objects,
multiple agents, or symmetries. All these are part of the arbitrary,
intrinsically-complex, outside world. They are not what should be built in, as
their complexity is endless; instead we should build in only the meta-methods
that can find and capture this arbitrary complexity. Essential to these
methods is that they can find good approximations, but the search for them
should be by our methods, not by us. We want AI agents that can discover like
we can, not which contain what we have discovered. Building in our discoveries
only makes it harder to see how the discovering process can be done." (Source:
[http://incompleteideas.net/IncIdeas/BitterLesson.html](http://incompleteideas.net/IncIdeas/BitterLesson.html))

Quoting Geoff Hinton: "One place where I do have technical expertise that’s
relevant is [whether] regulators should insist that you can explain how your
AI system works. I think that would be a complete disaster. People can’t
explain how they work, for most of the things they do. When you hire somebody,
the decision is based on all sorts of things you can quantify, and then all
sorts of gut feelings. People have no idea how they do that. If you ask them
to explain their decision, you are forcing them to make up a story. Neural
nets have a similar problem. When you train a neural net, it will learn a
billion numbers that represent the knowledge it has extracted from the
training data. If you put in an image, out comes the right decision, say,
whether this was a pedestrian or not. But if you ask “Why did it think that?”
well if there were any simple rules for deciding whether an image contains a
pedestrian or not, it would have been a solved problem ages ago." (Source:
[https://www.wired.com/story/googles-ai-guru-computers-
think-...](https://www.wired.com/story/googles-ai-guru-computers-think-more-
like-brains))

~~~
YeGoblynQueenne
>> As a practitioner, I would say Deep Learning is a trade.

"Trades" don't have conferences and journals and researchers whose job it is
to publish in them. No, deep learning is a field of research and it should be
treated as such and evaluated and -if necessary- criticised accordingly.

~~~
cs702
Deep learning practitioners in industry practice a trade. These individuals
should not be treated, evaluated, or criticized as if they were scientists.
However, I do agree that people who claim to be doing scientific research
should be judged as scientists :-)

> "Trades" don't have conferences and journals and researchers whose job it is
> to publish in them

Actually, there is a remarkably large number of associations, conferences,
journals, and researchers for a remarkably large number of trades. I mean, you
can find researchers in, say, the trucking trade figuring out whether using
one type of wheel or another can reduce costs per mile driven by a cent or two
in a particular class of trailer truck. The number of trade associations in
the US alone is close to 100,000, covering nearly every aspect and level of
skill in our economy:
[https://www.google.com/search?q=how+many+trade+associations+...](https://www.google.com/search?q=how+many+trade+associations+are+there+in+the+us)

~~~
YeGoblynQueenne
>> Deep learning practitioners in industry practice a trade.

That's a different statement than the one in the previous comment that I
quoted, that "deep learning is a trade". I disagree with that satement
specifically. Deep learning is not a trade. It's a subject of research in the
field of AI, which remains a scientific field, despite the shoddy science that
is typical in it.

That there are people who apply (or try to) the results of deep learning
research in the industry is another matter. People in the industry apply the
results of computer science research. That doesn't make computer science
research "a trade" in the sense that you say it.

------
killjoywashere
Q: Why did you build that super-car, Mr. Ferrari?

A: Because I can.

Q: Why did you build those tractors, Mr. Lamborghini?

A: Because people are hungry and need food.

Q: Mr. Lamborghini, then why did you build a super-car?

A: Because fuck Mr. Ferrari.

------
fizixer
Sturgeon's law.

