
The Limitations of Deep Learning - olivercameron
https://blog.keras.io/the-limitations-of-deep-learning.html
======
therajiv
As someone primarily interested in interpretation of deep models, I strongly
resonate with this warning against anthropomorphization of neural networks.
Deep learning isn't special; deep models tend to be more accurate than other
methods, but fundamentally they aren't much closer to working like the human
brain than e.g. gradient boosting models.

I think a lot of the issue stems from layman explanations of neural networks.
Pretty much every time DL is covered by media, there has to be some contrived
comparison to human brains; these descriptions frequently extend to DL
tutorials as well. It's important for that idea to be dispelled when people
actually start applying deep models. The model's intuition doesn't work like a
human's, and that can often lead to unsatisfying conclusions (e.g. the panda
--> gibbon example that Francois presents).

Unrelatedly, if people were more cautious about anthropomorphization, we'd
probably have to deal a lot less with the irresponsible AI fearmongering that
seems to dominate public opinion of the field. (I'm not trying to undermine
the danger of AI models here, I just take issue with how most of the populace
views the field.)

~~~
nerdponx
Well said. It's just curve fitting.

~~~
pishpash
Maybe everything is "curve fitting." \-- Note: I think it's more hierarchical
than that but curve fitting is certainly one of the important capabilities of
biological systems.

~~~
kxyvr
I don't think so. There's an incredibly important art and science to model
selection that is not encapsulated in curve fitting. For example, say we
observe a boy throwing a ball and we want to predict where the ball will land.
From basic physics, we know the model is `y = 0.5 a t^2 + v0 t + y0` where `a`
is the acceleration due to gravity, `v0` is the initial velocity, and `y0` is
the initial height. After observing one or two thrown balls, even with error,
we can estimate the parameters `a`, `v0`, and `y0` relatively well.
Alternatively, we could apply a generic machine learning model to this
problem. Eventually, it will work, but how much more data do we need? How many
additional parameters do we need? Do the machine learning parameters have
physical meaning like those in the original model? In this case, I contend the
original model is superior.

Now, certainly, there are cases where we don't have a good or known model and
machine learning is an extremely important tool for analyzing these cases.
However, the process of making this determination and choosing what model to
use is not solved by curve fitting or machine learning. This is a decision
made by a person. Perhaps some day that will change, and that will be a major
advance in intelligent systems, but we don't have that now and it's not clear
to me how extending existing methods will lead us there.

Basically, I agree with the sentiment of the grandparent post. Machine
learning is largely just curve fitting. How and when to apply a machine
learning model vs another model is currently a decision left up to the user.

~~~
pishpash
You're talking about the complexity of the model. If you take a purely input-
output view of the world (which by the way, even classical Physics does),
every problem _is_ curve fitting in a sufficiently high dimensional space.
There is no _conceptual_ problem here. There is perhaps a complexity problem,
but that's why I wrote that "I think it's more hierarchical than that."

~~~
eli_gottlieb
>If you take a purely input-output view of the world (which by the way, even
classical Physics does), every problem _is_ curve fitting in a sufficiently
high dimensional space.

Not all spaces are Euclidean, and "purely input-output" still contains a lot
of room for counterfactuals that ML models fail to capture.

~~~
nerdponx
What do you mean by counterfactuals? NNs are function approximation
algorithms, in any geometry. No ifs ands or buts about it.

~~~
eli_gottlieb
Oh, I agree that neural networks are function approximators with respect to
some geometry. When I say "counterfactuals", I'm talking about typical Bayes-
net style counterfactuals, but as also used in cognitive psychology. We _know_
that human minds evaluate counterfactual statements in order to test and infer
causal structure. We thus know that neural networks are insufficient for
"real" cognition.

------
toisanji
There is some good information in there and I agree with the limitations he
states, but his conclusion is completely made up.

"To lift some of these limitations and start competing with human brains, we
need to move away from straightforward input-to-output mappings, and on to
reasoning and abstraction."

There are tens of thousands of scientists and researchers who are studying the
brain from every level and we are making tiny dents into understanding it. We
have no idea what the key ingredient is , nor if it is 1 or many ingredients
that will take us to the next level. Look at deep learning, we had the
techniques for it since the 70's, yet it is only now that we can start to
exploit it. Some people think the next thing is the connectome, time,
forgetting neurons, oscillations, number counting, embodied
cognition,emotions,etc. No one really knows and it is very hard to test, the
only "smart beings" we know of are ourselves and we can't really do
experiments on humans because of laws and ethical reasons. Computer Scientists
like many of us here like to theorize on how AI could work, but very little of
it is tested out. I wish we had a faster way to test out more competing
theories and models.

~~~
eli_gottlieb
>I wish we had a faster way to test out more competing theories and models.

Luckily, the state of actual cognitive science and neuroscience is fairly far
ahead of, "Gosh there's all these things and we just don't know."
Unfortunately, MIT-style cogsci hasn't generated New Jerseyan fast-though-
wrong algorithms for Silicon Valley to hype up, so the popular press keeps
spreading the myth of our total ignorance.

Besides which, we do know what's missing from deep learning: the ability to
express anything other than a trivial Euclidean-space topological structure.
We know that real data is sampled from a world subject to cause-and-effect,
and that any manifold describing the data should carry the causal structure in
its own topology.

------
Houshalter
This article is a bit misleading. I believe NNs are a lot like the human
brain. But just the lowest level of our brain. What psychologists might call
"procedural knowledge".

Example: learning to ride a bike. You have no idea how you do it. You can't
explain it in words. It requires tons of trial and error. You can give a bike
to a physicist that has a perfect deep understanding of the laws of physics.
And they won't be any better at riding than a kid.

And after you learn to ride, change the bike. Take one where the handle is
inversed. And turning it right turns the wheel left. No matter how good you
are at riding a normal bike, no matter how easy it seems it should be, it's
very hard. Requires relearning how to ride basically from scratch. And when
you are done, you will even have trouble going back to a normal bike. This
sounds familiar to the problems of deep reinforcement learning, right?

If you use only the parts of the brain you use to ride a bike, would you be
able to do any of the tasks described in the article? E.g. learn to guide
spacecraft trajectories with little training, through purely analog controls
and muscle memory? Can you even sort a list in your head without the use of
pencil and paper?

Similarly recognizing a toothbrush as a baseball bat isn't as bizarre as you
think. Most NNs get one pass over an image. Imagine you were flashed that
image for just a millisecond. And given no time to process it. No time to even
scan it with your eyes! You certain you wouldn't make any mistakes?

But we can augment NNs with attention, with feedback to lower layers from
higher layers, and other tricks that might make them more like human vision.
It's just very expensive.

And that's another limitation. Our largest networks are incredibly tiny
compared to the human brain. It's amazing they can do anything at all. It's
unrealistic to expect them to be flawless.

~~~
BooglyWoo
It's a good article in a lot of ways, and provides some warnings that many
neural net evangelists should take to heart, but I agree it has some problems.

It's a bit unclear whether Fchollet is asserting that (A) Deep Learning has
fundamental theoretical limitations on what it can achieve, or rather (B) that
we have yet to discover ways of extracting human-like performance from it.

Certainly I agree with (B) that the current generation of models are little
more than 'pattern matching', and the SOTA CNNs are, at best, something like
small pieces of visual cortex or insect brains. But rather than deriding this
limitation I'm more impressed at the range of tasks "mere" pattern matching is
able to do so well - that's my takeaway.

But I also disagree with the distinction he makes between "local" and
"extreme" generalization, or at least would contend that it's not a hard, or
particularly meaningful, epistemic distinction. It is totally unsurprising
that high-level planning and abstract reasoning capabilities are lacking in
neural nets because the tasks we set them are so narrowly focused in scope. A
neural net doesn't have a childhood, a desire/need to sustain itself, it
doesn't grapple with its identity and mortality, set life goals for itself,
forge relationships with others, or ponder the cosmos. And these types of
quintessentially human activities are what I believe our capacities for high-
level planning, reasoning with formal logic etc. arose to service. For this
reason it's not obvious to me that a deep-learning-like system (with
sufficient conception of causality, scarcity of resources, sanctity of life
and so forth) would ALWAYS have to expend 1000s of fruitless trials crashing
the rocket into the moon. It's conceivable that a system could know to develop
an internal model of celestial mechanics and use it as a kind of staging area
to plan trajectories.

I think there's a danger of questionable philosophy of mind assertions
creeping into the discussion here (I've already read several poor or
irrelevant expositions of Searle's Chinese Room in the comments). The high-
level planning, and "true understanding" stuff sounds very much like what was
debated for the last 25 years in philosophy of mind circles, under the rubric
of "systematicity" in connectionist computational theories of mind. While I
don't want to attempt a single-sentence exposition of this complicated debate,
I will say that the requirement for "real understanding" (read systematicity)
in AI systems, beyond mechanistic manipulation of tokens, is one that has been
often criticised as ill-posed and potentially lacking even in human thought;
leading to many movements of the goalposts vis-à-vis what "real understanding"
actually is.

It's not clear to me that "real understanding" is not, or at least cannot be
legitimately conceptualized as, some kind of geometric transformation from
inputs to outputs - not least because vector spaces and their morphisms are
pretty general mathematical objects.

EDIT: a word

~~~
glenstein
I similarly find myself frustrated with philosophy of mind "contributions" to
conversations on deep learning/consciousness/AI. There seems to be a lot of
equivocation between the things you label as (a) and (b) above, and a lot of
apathy toward distinguishing between them. But (a) and (b) are completely
different things, and too often it seems like critics of computers doing smart
things treat arguments for one like they are arguments for the other.

Probably the most famous AI critic, Hubert Dreyfus, said "current claims and
hopes for progress in models for making computers intelligent are like the
belief that someone climbing a tree is making progress toward reaching the
moon." But it _is_ progress. Because by climbing a tree I've gained much more
than height. I actually did move toward the moon. I've gained the insight that
I'm using the right principle.

------
CountSessine
Surely we shouldn't rush to anthropomorphize neural networks, but we'd
ignoring the obvious if we didn't at least note that neural networks do seem
to share some structural similarities with our own brains, at least at a very
low level, and that they seem to do well with a lot of pattern-recognition
problems that we've traditionally considered to be co-incident with brains
rather than logical systems.

The article notes, " _Machine learning models have no access to such
experiences and thus cannot "understand" their inputs in any human-relatable
way_". But this ignores a lot of the subtlety in psychological models of human
consciousness. In particular, I'm thinking of Dual Process Theory as typified
by Kahneman's "System 1" and "System 2". System 1 is described as a tireless
but largely unconscious and heavily biased pattern recognizer - subject to
strange fallacies and working on heuristics and cribs, it reacts to it's
environment when it believes that it recognizes stimuli, and notifies the more
conscious "System 2" when it doesn't.

At the very least it seems like neural networks have a lot in common with
Kahneman's "System 1".

~~~
eli_gottlieb
>In particular, I'm thinking of Dual Process Theory

Which has been at least partly debunked as psychology's replication crisis
went on, and has been called into question on the neuroscientific angle as
well.

~~~
CountSessine
Partly, yes - especially with ego depletion on the ropes. I'm not sure that
dual process theory needs to be thrown out along with ego depletion, though.

~~~
eli_gottlieb
I can see three reasons to "throw it out":

1) Replication failure, plain and simple.

2) Overfitting. There are dozens to hundreds of "cognitive biases" on _lists_
:
[https://en.wikipedia.org/wiki/List_of_cognitive_biases](https://en.wikipedia.org/wiki/List_of_cognitive_biases).
When you have hundreds of individual points, you really ought to draw some
principles, and the principle should not be, "The system generating all this
is rigid and inflexible."

3) Imprecision! Again, _dozens to hundreds_ of cognitive biases. What possible
behavior or cognitive performance _can 't_ be assimilated into the heuristics
and biases theory? What can falsify it _overall_ , even after so many of its
individual supporting experiments and predictions have fallen down?

It looks like a mere taxonomy of observations, not a substantive _theory_.

~~~
CountSessine
_1) Replication failure, plain and simple._

How many meta-analyses have been conducted as of 2017 showing one result or
the other? I don't think ego depletion itself has been thoroughly "debunked"
yet. If it is a real effect, it's probably quite small - but I don't think
that ego depletion has been thrown in the bin just yet.

 _2) Overfitting. There are dozens to hundreds of "cognitive biases" on lists:
[https://en.wikipedia.org/wiki/List_of_cognitive_biases](https://en.wikipedia.org/wiki/List_of_cognitive_biases).
When you have hundreds of individual points, you really ought to draw some
principles, and the principle should not be, "The system generating all this
is rigid and inflexible."_

 _3) Imprecision! Again, dozens to hundreds of cognitive biases. What possible
behavior or cognitive performance can 't be assimilated into the heuristics
and biases theory? What can falsify it overall, even after so many of its
individual supporting experiments and predictions have fallen down?_

Wait a second - has anyone ever tried to explain the "IKEA Effect" using Dual
Process Theory? What does a laundry-list of supposed cognitive biases have to
do with the theory? Is anyone really trying to explain/predict all this
almanac-of-cognitive-failings with Dual Process?

~~~
eli_gottlieb
>Is anyone really trying to explain/predict all this almanac-of-cognitive-
failings with Dual Process?

To my understanding, yes. That's basically what Dual Process theories _exist_
for: to separate the brain into heuristic/bias processing as one process, and
computationally expensive model-based cause-and-effect reasoning as another
process. Various known cognitive processes or results are then sort of
classified on one side of the line or another.

When you apply Dual Process paradigms to specific corners of cognition, they
can be useful. For example, I've seen papers purporting to show that measured
uncertainty allows model-free and model-based reinforcement learning
algorithms to trade off decision-making "authority". This is _less elegant_
than an explicitly precision-measuring free-energy counterpart, but it's still
a viable hypothesis about how the brain can implement a form of bounded
rationality when bounded in both sample data and compute power.

But when you scale Dual Processes up to a whole-brain theory, it's just too
good at describing _anything_ that involves dichotomizing into a "fast-and-
frugal" form of processing and another expensive, reconstructive form of
processing. One of the big issues here is that _besides_ the potentially false
original evidence for Dual Processes, we don't necessarily have reason to
believe there exists any _dichotomy_ , rather than a more continuous tradeoff
between frugal heuristic processing and difficult reconstructive processing.
The precision-weighting model-selection theory actually makes much more sense
here.

~~~
CountSessine
This is a fantastic answer - thank you, Eli. So what do you think of the
original article?

~~~
eli_gottlieb
>This is a fantastic answer - thank you, Eli.

Thanks! I've been doing a lot of amateur reading in cog-sci and theoretical
neurosci. The subject enthuses me enough that I'm applying to PhD programs in
it this upcoming season.

>So what do you think of the original article?

Thorough and accurate. I'll give a little expansion via my own thought. One
thing taught in every theoretically-focused ML class is the No Free Lunch
Theorem. In colloquial terms it says, "If you don't make _some_ simplifying
assumptions about the function you're trying to learn (and the distribution
noising your data), you can't reliably learn."

I think experts learn this, appreciate it as a point of theory, and then often
forget to really bring it back up and rethink it where it's applicable. _All_
statistical learning takes place subject to assumptions of "niceness". _Which_
assumptions, though?

Seems to me like:

* If you make certain "niceness" assumptions about the functions in your hypothesis space, but few to none about the distribution, you're a Machine Learner.

* If you make niceness assumptions about your distribution, but don't quite _care_ about the generating function itself, you're an Applied Statistician.

* If you make niceness assumptions about your _data_ , that it was generated from some family of distributions on which you can make inferences, you're a fully frequentist or Bayesian statistician.

* If you want to make almost no assumptions about the generating process yielding the data, but still want just enough assumptions to make reasoning possible, you may be working in the vicinity of any of cognitive science, neuroscience, or artificial intelligence.

The key thing you always have to remind yourself is: you _are_ making
assumptions. The question is: which ones? The original article reminds us of a
whole lot of the assumptions behind current deep learning:

* The "layers" we care about are compositions of a continuous nonlinear function with a linear transform.

* The functions we care about are compositions of "layers".

* The transforms we care about are probably convolutions or just linear-and-rectified, or just linear-and-sigmoid.

* Composing layers enables gradient information to "fan out" from the loss function to wider and wider places in the early layers.

* The data spaces we care about are usually Euclidean.

These are things every expert knows, but which most people only _question_
when it's time to look at the limitations of current methods. The author of
the original article appears well-versed in everything, and I'm really excited
to see what they've got for the next part.

------
siliconc0w
A neat technique to help 'explain' models is LIME:
[https://www.oreilly.com/learning/introduction-to-local-
inter...](https://www.oreilly.com/learning/introduction-to-local-
interpretable-model-agnostic-explanations-lime)

There is a video here
[https://www.youtube.com/watch?v=hUnRCxnydCc](https://www.youtube.com/watch?v=hUnRCxnydCc)

I think this has some better examples than the Panda vs Gibbon example in the
OP if you want to 'see' why a model may classify a tree-frog as a tree-frog vs
a billiard (for example). IMO this suggests some level of anthropomorphizing
is useful for understanding and building models as the pixels the model picks
up aren't really too dissimilar to what I imagine a naive, simple, mind might
use. (i.e the tree-frog's goofy face) We like to look at faces for lots of
reasons but one of them probably is because they're usually more distinct
which is the same, rough, reason why the model likes the face. This is
interesting (to me at least) even if it's just matrix multiplication (or
uncrumpling high dimensional manifolds) underneath the hood,

------
cm2187
I think the requirement for a large amount of data is the biggest objection to
the reflex "AI will replace [insert your profession here] soon" that many
techies, in particular on HN, have.

There are many professions where there is very little data available to learn
from. In some case (self-driving), companies will invest large amount of money
to build this data, by running lots of test self-driving cars, or paying
people to create the data, and it is viable given the size of the market
behind. But the typical high-value intellectual profession is often a niche
market with a handful of specialists in the world. Think of a trader of
financial institutions bonds, or a lawyer specialized in cross-border mining
acquisitions, a physician specialist of a rare disease or a salesperson for
aviation parts. What data are you going to train your algorithm with?

The second objection, probably equally important, also applies to "software
will replace [insert your boring repetitive mindless profession here]", even
after 30 years of broad adoption of computers. If you decide to automate some
repetitive mundane tasks, you can spare the salary of the guys who did these
tasks, but now you need to pay the salary of a full team of AI specialists /
software developers. Now for many tasks (CAD, accounting, mailings, etc), the
market is big enough to justify a software company making this investment. But
there is a huge number of professions where you are never going to break even,
and where humans are still paid to do stupid tasks that a software could
easily do today (even in VBA), and will keep doing so until the cost of
developing and maintaining software or AI has dropped to zero.

I don't see that happening in my life. In fact I am not even sure we are
training that many more computer science specialists than 10 years ago. Again,
didn't happen with software for very basic things, why would it happen with AI
for more complicated things.

~~~
kmicklas
> and will keep doing so until the cost of developing and maintaining software
> or AI has dropped to zero.

I have no idea about the progress of AI, but normal software will get an order
of magnitude cheaper to develop as we slowly wake up from the Unix/worse-is-
better/everything-is-text mindset and abandon the dynamically typed and
imperative languages, broken systems abstractions, etc. that hold us back.

~~~
sadlion
I sincerely would like to know what you think the alternatives are?

~~~
goatlover
Sounds something like Haskell with a Smalltalk environment. Functional,
statically typed with powerful type extensions, but with an image instead of
text files that you modify.

From just using Jupyter Notebooks, I can see the appeal of working with a live
environment, and it's just a fancy REPL, not a full Lisp or Smalltalk
environment.

~~~
cm2187
If it has to be a general public language, I'm afraid it will have to be light
on special characters and abbreviations or acronyms that made sense 30 years
ago. I'd say a Basic or Python-like language, but modernised, and with strong
typing to enable the IDE to help a lot the users with auto-completion and
error checking.

But if you think about it, most business users are even intimidated by VBA. So
it will have to be very fluffy, and I don't think you can spare the mandatory
coding 101 teaching at school.

------
meh2frdf
Correct me if I'm wrong but I don't see that with 'deep learning' we have
answered/solved any of the philosophical problems of AI that existed 25 years
ago (stopped paying attention about then).

Yes we have engineered better NN implementations and have more compute power,
and thus can solve a broader set of engineering problems with this tool, but
is that it?

~~~
nv-vn
Yep. The whole machine learning craze is just fueled by the fact that it's now
feasible to create models for handwriting/voice/image recognition that
actually work reliably. But in terms of the underlying technology, we haven't
had some "breakthrough" that explained how the brain works or anything even
close to that.

~~~
landon32
This is totally true, but I think it's still important to note that while
something like Artificial General Intelligence is still way beyond the state
of the art, the state of the art still has a huge impact on the world. A tiny
slice of that can be seen in autonomous vehicles and the impact that they seem
poised to have.

~~~
brittohalloran
Don't underestimate the self fulfilling prophecy effect. Quite possible that
the massive influx into the field right now will move the needle.

~~~
landon32
Hmm, sometimes I think that we won't get super close to AGI until we can
actually model something the size of a Human Brain (in terms of neurons or
Synapses). Human Brain has 1B+ Neurons, or 4Qu+ synapses. So that's 12.5 GB
all at once to deal with, if you're representing neurons as either 0 or 1.
However, in reality Neurons are much more complicated, and could only treat
them as binary if you have a neuromorphic computer. So we would need to deal
with many many times that many GB at once, even if we had really efficient
ways of storing the data.

That's a lot of data to deal with, especially since you need to train it,
running huge computations using each neuron.

I know nothing about hardware, and this is a very crude prediction/estimation
of how AGI would happen, but my point is that we might be limited by Hardware
for a few more years.

------
kowdermeister
> In short, deep learning models do not have any understanding of their input,
> at least not in any human sense. Our own understanding of images, sounds,
> and language, is grounded in our sensorimotor experience as humans—as
> embodied earthly creatures.

Well maybe we should train systems with all our sensory inputs first, like
newborns leans about the world. Then make these models available open source
like we release operating systems so others can build on top of that.

For example we have ImageNet, but we don't have WalkNet, TasteNet, TouchNet,
SmellNet, HearNet... or other extremely detailed sensory data recorded for an
extended time. And these should be connected to match the experiences. At
least I have no idea they are out there :)

~~~
jonobelotti
Brooks' 'Intelligence Without Representation'
([http://people.csail.mit.edu/brooks/papers/representation.pdf](http://people.csail.mit.edu/brooks/papers/representation.pdf))
starts with a pretty strong argument imo against the story of 'stick-together'
AGI you're describing.

~~~
webmaven
Thanks for the link to this interesting paper.

I think we're seeing some recapitulation of those arguments WRT 'ensembles of
DL models' approaches.

~~~
thundergolfer
I agree. Google has come out with some papers that are, to put it harshly,
basic gluing together of DL models followed by loads of training on their
compute resources.

~~~
webmaven
Not just Google. The FractalNet paper comes to mind.

------
debbiedowner
People doing empirical experiments cannot claim to know the limits of their
experimental apparatus.

While the design process of deep networks remains founded in trial and error,
and there are no convergence theorems and approximation guarantees, no one can
be sure what deep learning can do, and what it could never do.

------
andreyk
"Here's what you should remember: the only real success of deep learning so
far has been the ability to map space X to space Y using a continuous
geometric transform, given large amounts of human-annotated data."

This statement has a few problems - there is no real reason to interpret the
transforms as geometric (they are fundamentally just processing a bunch of
numbers into other numbers, in what sense is this geometric), and the focus on
human-annotated data is not quite right (Deep RL and other things such as
representation learning have also achieved impressive results in Deep
Learning). More importantly, saying " a deep learning model is "just" a chain
of simple, continuous geometric transformations " is pretty misleading; things
like the Neural Turing Machine have shown that enough composed simple
functions can do pretty surprisingly complex stuff. It's good to point out
that most of deep learning is just fancy input->output mappings, but I feel
like this post somewhat overstates the limitations.

~~~
divideby0829
yeah this was my main problem, I guess he is technically right because they
are geometric but many of his analogies like the paper crumpling were deeply
misleading as they would imply that the transformations are linear. The fact
that they are not is fundamental to neural networks working.

~~~
tsbertalan
Paper-crumpling is nonlinear. Maybe your complaint is rather that paper-
crumpling is "only" a topology-preserving diffeomorphism?

------
eanzenberg
This point is very well made: 'local generalization vs. extreme
generalization.' Advanced NN's today can locally generalize quite well and
there's a lot of research spent to inch their generalization further out. This
will probably be done by increasing NN size or increasing the NN building-
blocks complexity.

~~~
xamuel
Or maybe increasing NN size/complexity is the 21st century version of adding
epicycles to make geocentrism work.

[http://wiki.c2.com/?AddingEpicycles](http://wiki.c2.com/?AddingEpicycles)

~~~
rhaps0dy
Heh, but it makes geocentrism works better! And we don't yet know how 21st
century heliocentrism will look like, while adding epicycles is less daunting.

~~~
mojomark
Yay, I found the rabbit hole - technically, no cellestial body rotates purely
around the other (thanks mass!). So, perhaps adding epicycles wasn't erronious
after all - just a measurement from a different reference point.

------
pc2g4d
Programmers contemplating the automation of programming:

"To lift some of these limitations and start competing with human brains, we
need to move away from straightforward input-to-output mappings, and on to
reasoning and abstraction. A likely appropriate substrate for abstract
modeling of various situations and concepts is that of computer programs. We
have said before (Note: in Deep Learning with Python) that machine learning
models could be defined as "learnable programs"; currently we can only learn
programs that belong to a very narrow and specific subset of all possible
programs. But what if we could learn any program, in a modular and reusable
way? Let's see in the next post what the road ahead may look like."

~~~
visarga
The author said in a Twitter conversation today that he is aware that this
phrase is ignoring something essential - namely, that we have systems with
memory and attention. That is something different than simple X to y mappings.
With memory you can do general computation, recursivity, graphs, anything.
They work well on some problems such as translation, but still need to become
much better in order to match general purpose programming. But at least we're
past the X->y phase.

~~~
divideby0829
considering they're the author of a python based machine learning library I
would sure hope so. Still it seems like a pretty grievous oversight in writing
the dang thing at all considering how at least in my fields of research
memory-ful networks are increasingly popular.

~~~
visarga
It was reserved for part 2.

------
gallerdude
I'm sorry, but I don't understand why wider & deeper networks won't do the
job. If it took "sufficiently large" networks and "sufficiently many"
examples, I don't understand why it wouldn't just take another order of
magnitude of "sufficiency."

If you look at the example with the blue dots on the bottom, would it not just
take many more blue dots to fill in what the neural network doesn't know? I
understand that adding more blue dots isn't easy - we'll need a huge amount of
training data, and huge amounts of compute to follow; but if increasing the
scale is what got these to work in the first place, I don't see we shouldn't
try to scale it up even more.

~~~
nshm
"sufficiently large" could be much more than number of atoms in the universe.
You just do not have resources to run computation at such scale.

~~~
randcraw
This is my problem with the thesis that simply scaling deep nets to new
heights will ultimately subsume all brain function. If it takes weeks to train
a simple object recognizer deep net, how long would it take a grand unified
deep net to learn to tie its shoelaces? Puberty?

------
eli_gottlieb
>But what if we could learn any program, in a modular and reusable way? Let's
see in the next post what the road ahead may look like.

I'm really looking forward to this. If it comes out looking like something
faster and more usable than Bayesian program induction, RNNs, neural Turing
Machines, or Solomonoff Induction, we'll have something really revolutionary
on our hands!

------
fnl
Put a lot simpler: Even DL is still only very complex, statistical pattern
matching.

While pattern matching can be applied to model the process of cognition, DL
cannot really model abstractive intelligence on its own (unless we phrase it
as a pattern learning problem, viz. transfer learning, on a _very_ specific
abstraction task), and much less can it model consciousness.

------
cs702
Yes.

Here's how I've been explaining this to non-technical people lately:

"We do not have intelligent machines that can reason. They don't exist yet.
What we have today is machines that can learn to recognize patterns at higher
levels of abstraction. For example, for imagine recognition, we have machines
that can learn to recognize patterns at the level of pixels as well as at the
level of textures, shapes, and objects."

If anyone has a better way of explaining deep learning to non-technical people
in a few short sentences, I'd love to see it. Post it here!

------
randcraw
I really enjoyed this article. It's the first attempt I've seen to assess deep
learning toward the integrated end of human level cognition or AGI.

I found one point especially noteworthy: " So even though a deep learning
model can be interpreted as a kind of program, inversely most programs cannot
be expressed as deep learning models—for most tasks, either there exists no
corresponding practically-sized deep neural network that solves the task, or
even if there exists one, it may not be learnable, i.e. the corresponding
geometric transform may be far too complex, or there may not be appropriate
data available to learn it.

Scaling up current deep learning techniques by stacking more layers and using
more training data can only superficially palliate some of these issues. It
will not solve the more fundamental problem that deep learning models are very
limited in what they can represent, and that most of the programs that one may
wish to learn cannot be expressed as a continuous geometric morphing of a data
manifold. "

What he seems to be suggesting is that a human level cognition built from deep
nets will not be a single unified end-to-end "mind" but a conglomeration of
many nets, each with different roles, i.e., a confederation or "society" of
deep nets.

I suspect Minsky would have agreed, and then suggested that the interesting
part is how one defines, instantiates, and then interconnects the components
of this society.

------
lordnacho
I'm excited to hear about how we bring about abstraction.

I was wondering how a NN would go about discovering F = ma and the laws of
motion. As far as I can tell, it has a lot of similarities to how humans would
do it. You'd roll balls down slopes like in high school and get a lot of data.
And from that you'd find there's a straight line model in there if you do some
simple transformations.

But how would you come to hypothesise about what factors matter, and what
factors don't? And what about new models of behaviour that weren't in your
original set? How would the experimental setup come about in the first place?
It doesn't seem likely that people reason simply by jumbling up some models
(it's a line / it's inverse distance squared / only mass matters / it matters
what color it is / etc), but that may just be education getting in my way.

A machine could of course test these hypotheses, but they'd have to be
generated from somewhere, and I suspect there's at least a hint of something
aesthetic about it. For instance you have some friction in your ball/slope
experiment. The machine finds the model that contains the friction, so it's
right in some sense. But the lesson we were trying to learn was a much simpler
behaviour, where deviation was something that could be ignored until further
study focussed on it.

~~~
thearn4
I've had similar thoughts when it comes to recognizing the underlying
(potential) simplicity of a phenomena of interest.

For example, consider a toy experiment where you take dozens of high speed
sensors pointed a rig in order to study basic spring dynamics (i.e. Hooke's
law).

You could apply "big data analytics" or ML methods to break apart the dynamics
to predict future positions past on past positions.

But hopefully, somewhere along the way, you have some means of recognizing
that it is a simple 1D phenomena and that most of the volume of data that you
collected is fairly pointless for that goal.

~~~
jacquesm
Almost all deep learning progress is optimization on a scale going from
'incredibly inefficient use of space and time' to 'quite wasteful' to
'optimal'. You're jumping the gap from 'quite wasteful' to 'optimal' in one
step because you _understand the problem_. If you could find a way to do that
algorithmically you likely would have created an actual AI.

------
ilaksh
Actually there are quite a few researchers working on applying newer NN
research to systems that incorporate sensorimotor input, experience, etc. and
more generally, some of them are combining an AGI approach with those new NN
techniques. And there has been research coming out with different types of NNs
and ways to address problems like overfitting or slow learning/requiring huge
datasets, etc. When he says something about abstraction and reasoning, yes
that is important but it seems like something NNish may be a necessary part of
that because the logical/symbolic approaches to things like reasoning have
previously mainly been proven inadequate for real-world complexity and
generally the expectations we have for these systems.

Search for things like "Towards Deep Developmental Learning" or "Overcoming
catastrophic forgetting in neural networks" or "Feynman Universal Dynamical"
or "Wang Emotional NARS". No one seems to have put together everything or
totally solved all of the problems but there are lots of exciting developments
in the direction of animal/human-like intelligence, with advanced NNs seeming
to be an important part (although not necessarily in their most common form,
or the only possible approach).

------
pron
> Doing this well is a game-changer for essentially every industry, but it is
> still a very long way from human-level AI.

We're still a long way from even _insect_ level "intelligence" (if it could
even be called that), hence the harm in calling it AI in the first place. The
fact that machine learning performs some _particular_ tasks better than humans
means little. That was true of computers since their inception. The question
of how much closer we are to human-level AI than to the starting point of
machine learning and neural networks over 70 years ago is very much an open
question. That after 70 years of research into neural networks in particular
and to machine learning in general, we are still far from insect-level
intelligence makes anyone suggesting a timeline for human-level AI sound
foolish (although hypothetically, the leap from insect-level intelligence to
human-level could be technically simple, but we really have no idea).

------
plaidfuji
As a chemical engineer who started learning deep learning after learning
regular old regression-based empirical modeling, my interpretation of deep
learning is that it's just high-dimensional non-linear interpolation.

If what you're trying to predict can't be represented as some combination of
your existing data, it breaks immediately. Data drives everything; all models
are wrong, but some are useful. (George Box)

~~~
plaidfuji
Incidentally, humans aren't very good at extrapolation, either, but our
ability to generate good hypotheses differentiates us strongly from these
models.

------
danielam
"This ability [...] to perform abstraction and reasoning, is arguably the
defining characteristic of human cognition."

He's on the right track. Of course, the general thrust goes beyond deep
learning. The projection of intelligence onto computers is first and foremost
wrong because computers are not able, not even in principle, to engage in
abstraction, and claims to the contrary make for notoriously bad,
reductionistic philosophy. Ultimately, such claims underestimate what it takes
to understand and apprehend reality and overestimate what a desiccated,
reductionistic account of mind and the broader world could actually
accommodate vis-a-vis the apprehension and intelligibility of the world.

Take your apprehension of the concept "horse". The concept is not a concrete
thing in the world. We have concrete instances of things int he world that
"embody" the concept, but "horse" is not itself concrete. It is abstract and
irreducible. Furthermore, because it is a concept, it has meaning. Computers
are devoid of semantics. They are, as Searle has said ad nauseam, purely
syntactic machines. Indeed, I'd take that further and say that actual,
physical computers (as opposed to abstract, formal constructions like Turing
machines) aren't even syntactic machines. They do not even truly compute. They
simulate computation.

That being said, computers are a magnificent invention. The ability to
simulate computation over formalisms -- which themselves are products of human
beings who first formed abstract concepts on which those formalisms are based
-- is fantastic. But it is pure science fiction to project intelligence onto
them. If deep learning and AI broadly prove anything, it is that in the narrow
applications where AI performs spectacularly, it is possible to substitute
what amounts to a mechanical process for human intelligence.

~~~
ThomPete
The Chinese Room argument is one of the least convincing arguments against AI.
Of course the man in the room isn't conscious neither is the individual
neurons in your brain. It's the whole house that become conscious.

The reality is that we just don't know.

------
deepGem
This is because a deep learning model is "just" a chain of simple, continuous
geometric transformations mapping one vector space into another.

Per my understanding - Each vector space represents the full state of that
layer. Which is probably why the transformations work for such vector spaces.

A sorting algorithm unfortunately cannot be modeled as a set of vector spaces
each representing the full state. For instance, an intermediary state of a
quick sort algorithm does not represent the full state. Even if a human was to
look at that intermediary step in isolation, they will have no clue as to what
that state represents. On the contrary, if you observe the visualized
activations of an intermediate layer in VGG , you can understand that the
layer represents some elements of an image.

------
latently
The brain is a dynamic system and (some) neural networks are also dynamic
systems, and a three layer neural network can learn to approximate any
function. Thus, a neural network can approximate brain function arbitrarily
well given time and space. Whether that simulation is conscious is another
story.

The Computational Cognitive Neuroscience Lab has been studying this topic for
decades and has an online textbook here:

[http://grey.colorado.edu/CompCogNeuro](http://grey.colorado.edu/CompCogNeuro)

The "emergent" deep learning simulator is focused on using these kinds of
models to model the brain:

[http://grey.colorado.edu/emergent](http://grey.colorado.edu/emergent)

~~~
geofft
That's about as interesting as saying that a Taylor series can approximate any
analytic function arbitrarily well given time and space. Or that a lookup
table can approximate any function arbitrarily well given time and space: see
also the Chinese room example.

The first question is whether that neural network is _learnable_. Sure, some
configuration of neurons may exist. Is it possible given enough time and space
to discover what that configuration is, given a set of inputs and outputs?

The second question is whether "enough time and space" means "beyond the
lifetime and resources of anyone alive," in which case it seems perfectly
reasonable to me to call it a limitation. I generally want my software to work
within my lifetime.

~~~
latently
I like your comment. The real question is whether they are conscious.

The analogy between deep neural networks and the brain has proven to be very
fruitful. Other analogies may as well. See our upcoming paper for more info.

[https://grey.colorado.edu/mediawiki/sites/mingus/images/3/3a...](https://grey.colorado.edu/mediawiki/sites/mingus/images/3/3a/ccn_style.pdf)

~~~
nojvek
I think a lot of people end up mixing being alive with being conscious. Is a
tree conscious? Is a self driving car conscious?

If we use the definition "Aware of its surroundings, responding and acting
towards a certain goal" then a lot of things fit that definition.

When an AI plays the atari games, learns from it and plays at a human level, I
would call it conscious. It's not a human level conscious agent but conscious
nonetheless.

~~~
latently
Consciousness has a specific meaning -
[https://en.wikipedia.org/wiki/Qualia](https://en.wikipedia.org/wiki/Qualia)

------
reader5000
Recurrent models do not simply map from one vector space to another and could
very much be interpreted as reasoning about their environment. Of course they
are significantly more difficult to train and backprop through time seems a
bit of a hack.

~~~
pishpash
Sure they do. The spaces are just augmented with timestep related dimensions.

~~~
unlikelymordant
No they aren't? RNNs have state that gets modified as time goes on. The RNN
has to learn what is important to save as state, and how to modify it in
response to different inputs. There is no explicit time-stamping.

~~~
webmaven
There is an implicit ordering of timesteps ("before" and "after") though,
right? If you have that, you can dispense with an explicit time dimension.

~~~
divideby0829
not necessarily, depending on the usage RNN based models are sometimes trained
in both directions, i.e. for every sample of say videos show it to the network
in its natural time direction and then also reversed. This is motivated some
say to eliminate dependence on specific order of sequences but instead to
train an integrator.

~~~
webmaven
So, time's arrow can be reversed, and the model can thus extrapolate both
forward and backward. Cool!

However, that doesn't actually eliminate the axis/dimension. Eliminating
timestamps only makes the dimension a _unitless_ scalar (IOW 'time'
tautologically increments at a 'rate' of 'one frame per frame').

------
denfromufa
If the deep learning network has enough layers, then can't it start
incorporating "abstract" ideas common to any learning task? E.g. could we re-
use some layers for image/speech recognition & NLP?

~~~
unlikelymordant
this is exactly what happens in transfer learning. A recent paper by google (
[https://research.googleblog.com/2017/07/revisiting-
unreasona...](https://research.googleblog.com/2017/07/revisiting-unreasonable-
effectiveness.html) ) shows that pre-training on a very large image database
leads to improvements in state of the art for several different image
problems. This is because the weights required for one image problem are not
necessarily all that different from another image problem, especially in the
early layers. There may not be as much common ground beteen images and e.g.
NLP. Perhaps at much higher abstraction levels, but we aren't there yet.

~~~
webmaven
Transfer learning has been shown to improve training times in other modes
(such using an image classification model to initialize an NLP model) over
randomly initialized values.

------
19eightyfour
If this article is correct about limitations, couldn't one simply include a
Turing machine model into the process to train algorithms?

Some ideas:

\- The vectors are Turing tapes, or

\- Each point in a tape is a DNN, or

\- The "tape" is actually a "tree" each point in the tape is actually a branch
point of a tree with probabilities going each way, and the DNN model can
"prune this tree" to refine the set of "spanning trees" / programs.

Or, hehe, maybe I'm leading people off track. I know absolutely nothing about
DNN ( except I remember some classes on gradient descent and SVMs from
bioinformatics ).

~~~
nullc
You can bolt all kinds of funny structures into some DNN system, but if the
system doesn't have well behaved gradients (or if it isn't even
differentiable) it won't train.

------
msoad
Then people are assuming Deep Learning can be applied to a Self Driving Car
System end-to-end! Can you imagine the outcome?!

~~~
webmaven
Yes. Death Race 2000.

------
zfrenchee
My qualm with this article is disappointingly poorly backed up. The author
makes claims, but does not justify those claims well enough to convince anyone
but people who already agree with him. In that sense, this piece is an opinion
piece, masquerading as a science.

> This is because a deep learning model is "just" a chain of simple,
> continuous geometric transformations mapping one vector space into another.
> All it can do is map one data manifold X into another manifold Y, assuming
> the existence of a learnable continuous transform from X to Y, and the
> availability of a dense sampling of X:Y to use as training data. So even
> though a deep learning model can be interpreted as a kind of program,
> inversely most programs cannot be expressed as deep learning models
> [why?]—for most tasks, either there exists no corresponding practically-
> sized deep neural network that solves the task [why?], or even if there
> exists one, it may not be learnable, i.e. the corresponding geometric
> transform may be far too complex [???], or there may not be appropriate data
> available to learn it [like what?].

> Scaling up current deep learning techniques by stacking more layers and
> using more training data can only superficially palliate some of these
> issues [why?]. It will not solve the more fundamental problem that deep
> learning models are very limited in what they can represent, and that most
> of the programs that one may wish to learn cannot be expressed as a
> continuous geometric morphing of a data manifold. [really? why?]

I tend to disagree with these opinions, but I think the authors opinions
aren't unreasonable, I just wish he would explain them rather than re-
iterating them.

~~~
ndh2
For one, input and output size has to be fixed. All these NNs doing image
transformations or recognition only work on fixed-size images. How would you
sort a set of integers of arbitrary size using a neural network? What does
"solve with a NN" even mean in that context?

Another problems/limitation I can think of is that in NNs you don't have
state. The NN can't push something on a stack, and then iterate. How do you
divide and conquer using NNs?

Are NNs Turing complete? I don't see how they possibly could be.

~~~
unlikelymordant
Input and output sizes don't have to be fixed. E.g. speech recognition doesn't
work with fixed sized inputs. Natural language processing deals with many
different length sequences. seq2seq networks are explicitly designed to deal
with problems that have variable length inputs and outputs that are also
variable in length and different from the input.

How would you sort integers? using neural turing machines:
[https://arxiv.org/abs/1410.5401](https://arxiv.org/abs/1410.5401)

NMTs and other memory network architectures have explicit memory as state
(including stacks!), indeed any recurrent neural net has state.

Are NNs Turing complete? Yes!
[http://binds.cs.umass.edu/papers/1992_Siegelmann_COLT.pdf](http://binds.cs.umass.edu/papers/1992_Siegelmann_COLT.pdf)

~~~
ndh2
Interesting, thanks! On
[https://www.tensorflow.org/tutorials/seq2seq](https://www.tensorflow.org/tutorials/seq2seq)
I found a link to
[https://arxiv.org/abs/1406.1078](https://arxiv.org/abs/1406.1078), which says

> $One RNN encodes a sequence of symbols into a fixed-length vector
> representation, and the other decodes the representation into another
> sequence of symbols.$

To me it sounds like they use an RNN to learn a hash function.

Thanks for the NTM link, I'll check it out.

------
thanatropism
This is evergreen:

[https://en.m.wikipedia.org/wiki/Hubert_Dreyfus%27s_views_on_...](https://en.m.wikipedia.org/wiki/Hubert_Dreyfus%27s_views_on_artificial_intelligence)

See also, if you can, the film "Being in the world", which features Dreyfus.

------
LeanderK
the author raises some valid points, but i don't like the style it is written
in. He just makes some elaborate claims about the limitation of Deep Learning,
but conveys why they are limitations. I don't disagree about the fact that
there are limits to Deep Learning and many may be impossible to overcome
without completely new approaches. I would like to see more emphasis on why
things, like generating code from descriptions, that are theoretically
possible, are absolutely impossible and out of reach today and not make the
intention that the tasks itself is impossible (like the halting-problem).

------
LeicaLatte
This is why Elon Musk is projecting. We are long ways away from AI.

------
ezioamf
This is why I don't know if it will be possible (at current limitations) to
let insect like brains to fully drive our cars. It may never be good enough.

~~~
pishpash
Insects can drive themselves quite well, occasional splatters aside. This is
one of the tasks that is I feel tractable. However, propose letting insects to
drive and people will never accept it, but somehow they trust the SV hype men.

------
nimish
This is basically the Chinese Room argument though?

~~~
krastanov
Not really. Deep learning does not give you an Artificial General Intelligence
(what the Chinese Room is supposed to be). The author just explains why this
is so (admittedly, in a handwavy, not necessarily convincing fashion).

------
graycat
On the limitations of machine learning as in the OP, the OP is correct.

So, right, current approaches to "machine learning* as in the OP have some
serious "limitations". But this point is a small, tiny special case of
something else much larger and more important: Current approaches to "machine
learning" as in the OP are essentially some applied math, and applied math is
commonly much more powerful than machine learning as in the OP and has much
less severe limitations.

Really, "machine learning" as in the OP is not _learning_ in any significantly
meaningful sense at all. Really, apparently, the whole field of "machine
learning" is heavily just hype from the deceptive label "machine learning".
That hype is deceptive, apparently deliberately so, and unprofessional.

Broadly machine learning as in the OP is a case of old empirical curve fitting
where there is a long history with a lot of approaches quite different from
what is in the OP. Some of the approaches are under some circumstances much
more powerful than what is in the OP.

The attention to machine learning is omitting a huge body of highly polished
knowledge usually much more powerful. In a cooking analogy, you are being sold
a state fair corn dog, which can be good, instead of everything in Escoffier,

Prosper Montagné, _Larousse Gastronomique: The Encyclopedia of Food, Wine, and
Cookery_ , ISBN 0-517-503336, Crown Publishers, New York, 1961.

Essentially, for machine learning as in the OP, if (A) have a LOT of
_training_ data, (B) a lot of _testing_ data, (C) by gradient descent or
whatever build a _model_ of some kind that fits the training data, and (D) the
model also predicts well on the testing data, then (E) may have found
something of value.

But the test in (D) is about the only assurance of any value. And the value in
(D) needs an assumption: Applications of the model will in some suitable
sense, rarely made clear, be _close_ to the training data.

Such fitting goes back at least to

Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone,
_Classification and Regression Trees_ , ISBN 0-534-98054-6, Wadsworth &
Brooks/Cole, Pacific Grove, California, 1984.

not nearly new. This work is commonly called CART, and there has long been
corresponding software.

And CART goes back to versions of regression analysis that go back maybe 100
years.

So, sure, in regression analysis, we are given points on an X-Y coordinate
system and want to fit a straight line so that as a function of points on the
X axis the line does well approximating the points on the X-Y plot. Being more
specific could use some mathematical notation awkward for simple typing and,
really, likely not needed here.

Well, to generalize, the X axis can have several dimensions, that is,
accommodate several variables. The result is _multiple linear regression_.

For more, there is a lot with a lot of guarantees. Can find those in short and
easy form in

Alexander M. Mood, Franklin A. Graybill, and Duane C. Boas, _Introduction to
the Theory of Statistics, Third Edition_ , McGraw-Hill, New York, 1974.

with more detail but still easy form in

N. R. Draper and H. Smith, _Applied Regression Analysis_ , John Wiley and
Sons, New York, 1968.

with much more detail and carefully done in

C. Radhakrishna Rao, _Linear Statistical Inference and Its Applications:
Second Edition_ , ISBN 0-471-70823-2, John Wiley and Sons, New York, 1967.

Right, this stuff is not nearly new.

So, with some assumptions, get lots of guarantees on the accuracy of the
fitted model.

This is all old stuff.

The work in machine learning has added some details to the old issue of _over
fitting_ , but, really, the math in old regression takes that into
consideration \-- a case of over fitting will usually show up in larger
estimates for errors.

There is also spline fitting, fitting from Fourier analysis, autoregressive
integrated moving average processes,

David R. Brillinger, _Time Series Analysis: Data Analysis and Theory, Expanded
Edition_ , ISBN 0-8162-1150-7, Holden-Day, San Francisco, 1981.

and much more.

But, let's see some examples of applied math that totally knocks the socks off
model fitting:

(1) Early in civilization, people noticed the stars and the ones that moved in
complicated paths, the _planets_. Well Ptolemy built some empirical models
based on _epi-cycles_ that seemed to fit the data well and have good
predictive value.

But much better work was from Kepler who discovered that, really, if assume
that the sun stays still and the earth moves around the sun, then the paths of
planets are just ellipses.

Next Newton invented the second law of motion, the law of gravity, and
calculus and used them to explain the ellipses.

So, what Kepler and Newton did was far ahead of what Ptolemy did.

Or, all Ptolemy did was just some empirical fitting, and Kepler and Newton
explained what was really going on and, in particular, came up with much
better predictive models.

Empirical fitting lost out badly.

Note that once Kepler assumed that the sun stands still and the earth moves
around the sun, actually he didn't need much data to determine the ellipses.
And Newton needed nearly no data at all except to check is results.

Or, Kepler and Newton had some good ideas, and Ptolemy had only empirical
fitting.

(2) The history of physical science is just awash in models derived from
scientific principles that are, then, verified by fits to data.

E.g., some first principles derivations shows what the acoustic power spectrum
of the 3 K background radiation should be, and the fit to the actual data from
WMAP, etc. was astoundingly close.

News Flash: Commonly some real science or even just real engineering
principles totally knocks the socks off empirical fitting, for much less data.

(3) E.g., here is a fun example I worked up while in a part time job in grad
school: I got some useful predictions for an enormously complicated situation
out of a little applied math and nearly no data at all.

I was asked to predict what the survivability of the US SSBN fleet would be
under a special scenario of global nuclear war limited to sea.

Well, there was a WWII analysis by B. Koopman that showed that in search, say,
of a submarine for a surface ship, an airplane for a submarine, etc. the
encounter rates were approximately a Poisson process.

So, for all the forces in that war at sea, for the number of forces surviving,
with some simplifying assumptions, we have a continuous time, discrete state
space Markov process subordinated to a Poisson process. The details of the
Markov process are from a little data about detection radii and the
probabilities at a detection, one dies, the other dies, both die, or neither
die.

That's all there was to the set up of the problem, the _model_.

Then to evaluate the model, just use Monte Carlo to run off, say, 500 sample
paths, average those, appeal to the strong law of large numbers, and presto,
bingo, done. Also can easily put up some confidence intervals.

The customers were happy.

Try to do that analysis with big data and machine learning and will be in
deep, bubbling, smelly, reeking, flaming, black and orange, toxic sticky
stuff.

So, a little applied math, some first principles of physical science, or some
solid engineering data commonly totally knocks the socks off machine learning
as in the OP.

~~~
srean
There is a whole lot of difference between curve fitting and curve fitting
with performance guarantees on future data under a distribution free (limited
dependence model).

BTW the 'machine learning' term is Russian coinage and its genesis lies in
non-paramteric statistics, the key result that sparked it all off was Vapnik
and Chervonenkis's result that is essentially a much generalized and non-
asymptotic version of Glivenko Cantelli. The other result was that of Stone
that showed universal algorithms that can achieve the Bayes error in the limit
not only exist but also constructed such an algorithm. This was the first time
it was established that 'learning' is possible.

------
x40r15x
I am sorry but GMO is actually bad for you.... Monsanto tried to spread gmo
corn in France, they tested it on rats for a year and the rats developed
multiple tumors the size of an egg.

~~~
x40r15x
Oops wrong article

------
known
DL/ML == Wisdom of Crowds

------
erikb
I don't get it. If reasoning is not an option how does deep learning beat the
boardgame go?

~~~
backpropaganda
Memorisation + small amounts of generalization.

~~~
erikb
Unlikely. If it's mostly memorisation it couldn't learn from playing itself.

And what you describe is how AI beats chess. The problem with that is that it
is a quite inhuman way to play. But AlphaGo plays quite humanly.

~~~
backpropaganda
1\. Imagine infinite compute capability. Exhaustively play all possible games,
and use that to figure out best moves at any state. This is essentially what
Alphago did, but using translation variance to reduce the search space.

2\. There is no contradiction here. We just have to accept that human-like
play can emerge from memorization.

~~~
erikb
Can you explain what from your point of view is the difference between AlphaGo
and a Chess AI? Because to me it sounds like the one should have resulted as
an evolution from the other if it would be that simple.

~~~
backpropaganda
Yes. In Chess, it's relatively easy to judge how good a board position is, and
thus people have been successful by hand-engineering the board position
evaluator (also called value function in RL lingo), and then just doing tree
search to take the action which improves board position the most. In Go,
evaluating board position is much more difficult, and it's not possible to
approximate the value function by hand-engineered code. Thus, AlphaGO
approximates the value by simulating the game till win/lose from arbitrary
board positions to evaluate its value. This doesn't really require neural
networks. You could also do the same with table lookup. What neural networks
offer here is some translation invariance generalization, and capability to
compress the table into fewer parameters by identifying common input features.
It's possible to achieve AlphaGo performance by just having a BIG table of
state-values and using some kernel to do nearest neighbour search (such as
done by deepmind here:
[https://arxiv.org/abs/1606.04460](https://arxiv.org/abs/1606.04460))

------
sarah5
nice article

------
beachbum8029
Pretty interesting that he says reasoning and long term planning are
impossible tasks for a neural net, when those tasks are done by billions of
neural nets every day. :^)

~~~
igammarays
Pretty interesting that you seem to have conclusively determined that our
brain is nothing but a neural net.

------
deepnotderp
I'd like to offer a somewhat contrasting viewpoint (although this might not
sit well with people): deep nets aren't AGI, but they're pretty damn good.
There's mounting evidence that they learn similar to how we do, at least in
vision; [https://arxiv.org/abs/1706.08606](https://arxiv.org/abs/1706.08606)
and
[https://www.nature.com/articles/srep27755](https://www.nature.com/articles/srep27755)

There's quite a few others but these were the most readily available papers.

Are deep nets AGI? No, but they're a lot better than Mr.Chollet gives them
credit for.

~~~
emodendroket
Similar, perhaps, but certainly not the same, since no human being, at
whatever stage of development, would be confused by the adversarial image of a
panda given as an example.

------
AndrewKemendo
_the only real success of deep learning so far has been the ability to map
space X to space Y using a continuous geometric transform, given large amounts
of human-annotated data._

Yes, but that's what human's do too, only much much better from the
generalized perspective.

I think that fundamentally this IS the paradigm for AGI, but we are in the
pre-infant days of optimization across the board (data, efficiency, tagging
etc...).

So I wholeheartedly agree with the post, that we shouldn't cheer yet, but we
should also recognize that we are on the right track.

I say all this because prior to getting into DL and more specifically
Reinforcement Learning (which is WAY under studied IMO), I was working with
Bayesian Expert Systems as a path to AI/AGI. RL totally transformed how I saw
the problem and in my mind offers a concrete pathway to AGI.

