
The future of deep learning - nicolrx
https://blog.keras.io/the-future-of-deep-learning.html
======
computerex
> In DeepMind's AlphaGo, for example, most of the "intelligence" on display is
> designed and hard-coded by expert programmers (e.g. Monte-Carlo tree
> search);

Not true. This paraphrases the original paper:
[https://www.tastehit.com/blog/google-deepmind-alphago-how-
it...](https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/)

> They tested their best-performing policy network against Pachi, the
> strongest open-source Go program, and which relies on 100,000 simulations of
> MCTS at each turn. AlphaGo's policy network won 85% of the games against
> Pachi! I find this result truly remarkable. A fast feed-forward architecture
> (a convolutional network) was able to outperform a system that relies
> extensively on search.

Also, this article reeked of AGI ideas. Deep learning isn't trying to solve
AGI. Reasoning and abstraction and high level AGI concepts that I don't think
apply to deep learning. I don't know the path to AGI but I don't think it'll
be deep learning. I think it would have to be fundamentally different.

~~~
taneq
> Deep learning isn't trying to solve AGI.

Well, I dunno about "deep learning", but AGI is DeepMind's explicitly stated
goal.

~~~
computerex
And your source for this is? Could not find any such claim on their site.

~~~
taneq
> We really believe that if you solved intelligence in a very general way,
> like we're trying to do at DeepMind, then step 2 ['use intelligence to solve
> everything else'] would naturally follow on.

They go on to talk about general purpose learning machines.

Source:
[https://youtu.be/ZyUFy29z3Cw?t=4m42s](https://youtu.be/ZyUFy29z3Cw?t=4m42s)

~~~
computerex
Dr. Hassabis is awesome, but that video and the language is misleading to a
layman. He is distinguishing between expert driven systems that rely on
heuristics/feature engineering and between systems that learn from raw input
and derive their own optimal set of features (unsupervised learning).

This is a far cry from AGI. I think Dr. Hassabis rather in a tongue and cheek
manner played with the terminology in the video. Deep learning and all the
modern AI stuff you hear about is within the realm of "narrow AI", or more
formally, applied AI. In his video, he uses "narrow AI" to define systems that
rely on expert based heuristics and feature engineering, and general purpose
AI to be what they are currently doing with reinforcement learning.

Whilst it's wonderful that their advancement in reinforcement learning has
been applied to various different problems successfully, it shouldn't be
confused with AGI.

AGI is on a totally different playing field. I don't think we are
substantially closer to AGI than we were 50 years ago, and I would be very
interested in anyone arguing the opposite.

I think at this point the only company trying to seriously tackle AGI is:
[https://numenta.com/](https://numenta.com/)

------
amelius
What about the future of _jobs_ in the field of deep learning?

EDIT: I'm thinking deep learning will become much like web development is
today. Everybody can do it, and only a few experts will work at the
technological frontier and develop tools and libraries for everybody else to
use.

Therefore, if one invests time in DL, then I suppose it better be a serious
effort (at research level), rather than at the level of invoking a library,
because soon everybody can do that.

~~~
droidist2
Everybody can do web development? Then why do so many people complain about
how complicated it is?

~~~
eli_gottlieb
Because it was somehow designed to be as complicated, difficult, and utterly
un-modular as possible. I actually have a more difficult time fully testing a
commit's worth of Rails dev than I ever did with a commit's worth of embedded
firmware.

------
randcraw
I enjoyed part 1 of Chollet's two articles today but am less fond of this one.
It suggests that deep learning will expand from its present capabilities of
recognizing patterns to one day master logical relations and employ a rich
knowledge base of general facts, growing into a situated general problem
solver that one day may equal or surpass human cognition. Maybe. But he then
proposes that deep nets will rise to these heights of self-organization and
purposefulness using one of the weakest and slowest forms of AI, namely
evolutionary strategies?

I don't think so.

The many problems bedeviling the expansion of an AI's competence at one
specific task into mastery of more general and more complex tasks are legend.
Alas neither deep nets nor genetic algorithms have shown any way to address
classic AGI roadblocks like: 1) the enormity of the possible solution space
when synthesizing candidate solutions, and 2) the enormous number of training
examples needed to learn the multitude of common sense facts common to all
problem spaces, and 3) how to translate existing specific problem solutions
into novel general ones. Wait, wait, there's more...

These roadblocks are common to all forms of AI. The prospect of replacing
heuristic strategies with zero knowledge techniques (like GA trial and error)
or curated knowledge bases with only example-based learning is unrealistic and
infeasible. Likewise, the notion that a sufficient number of deep nets can
span all the info and problem spaces that will be needed for AGI is _quite_
implausible. While quite impressive at the lowest levels of AI (pattern
matching), deep learning has yet to address intermediate and high level AI
implementation challenges like these. Until it does, there's little reason to
believe DL will be equally good at implementing executive cognitive functions.

Yes DeepMind solved Go using AlphaGo's deep nets (and monte carlo tree
search). But 10 and 20 years before that IBM Watson solved Jeopardy and IBM
Deep Blue solved chess. At the time, everyone was duly impressed. Yet today
nobody is suggesting that the AI methods at the heart of those game solutions
will one day pave the yellow brick road to AI Oz.

In another 10 years, I predict it's just as likely that AlphaGo's deep nets
will be a bust as a boom, at least when it comes to building deep AI like HAL
9000.

------
therajiv
TLDR is that models will become more abstract (current pattern recognition
will blend with formal reasoning and abstraction), modular (think transfer
learning, but taken to its extreme - every trained model's learned
representations should be applicable to other tasks), and automated (ML
experts will spend less time in the repetitive training/optimization cycle,
instead focusing more on how models apply to their specific domain).

~~~
radarsat1
I think it's true, but I hope this synergy between logic and pattern
recognition actually happens, as I feel like this has been proposed for years
but never really come to fruition. However, with recent work on differentiable
communicating agents, differentiable memory etc., perhaps it now has a chance
to get there.

~~~
eli_gottlieb
The author says not everything _should_ be differentiable. Intuitively, I
agree, but the question is how to do a sufficiently fast search through a
high-dimensional space when you don't have a gradient.

~~~
CuriouslyC
If you don't have a gradient, one tactic is to make the most of the situation.
Give your model the Bayesian treatment, and sample from the posterior using
MCMC. This is slow, but you end up with posteriors on your parameter values,
which is a huge win.

~~~
eli_gottlieb
Yeah, I've been a big fan of probabilistic programming for a while. The real
problem is that getting Monte Carlo methods to converge and produce a _large_
sample from the posterior takes orders of magnitude more time than running an
optimizer to descend a gradient. Hey, you can even make it a probabilistic
gradient: variational inference! But then you still have a hard time with
discrete, nondifferentiable structure.

------
toisanji
This is part 2 from the post yesterday:
[https://news.ycombinator.com/item?id=14790251](https://news.ycombinator.com/item?id=14790251)

And the author posted a comment on hn:

"fchollet: Hardly a "made-up" conclusion -- just a teaser for the next post,
which deals with how we can achieve "extreme generalization" via abstraction
and reasoning, and how we can concretely implement those in machine learning
models."

I like the ideas presented in the post, but its not concrete or new at
all.Basically he writes "everything will get better".

I do agree with the point that we need to move away from strictly differential
learning though. All deep learning problems only work on systems that have
derivates so we can do backpropagation. I dont think the brain learns with
backpropagation at all.

* AutoML, there are dozens of these type of systems already, he mentions one already in the post called HyperOpt. So we will continue to use this systems and they will get smarter? Many of these systems are basically grid search/brute force. Do you think the brain is doing brute force at all? We have to use these now because there are no universal correct hyperparameters for tuning these models. As long as we build AI models the way we do now, we will have to do this hyperparameter tuning. Yes, these will get better, again, nothing new here.

* He talks about reusable modules. Everyone in the deep learning community has been talking about this a lot, its called transfer learning and people are using it now, and working on making it better all the time. We currently have "model zoos" which are databases of pretrained models that you can use. If you want to see a great scifi short piece on what neural network mini programs could look like written by the head of computer vision at tesla, check out this post: [http://karpathy.github.io/2015/11/14/ai/](http://karpathy.github.io/2015/11/14/ai/)

~~~
ipunchghosts
Everyone makes the assumption that computers should get to be as smart as
humans but in some ways, its the other way around. For example, the human
brain is not a turing machine, it doesnt have memory (in the sense that its
lossy). You need memory to have a turing machine so with a paper and pencil, a
human is a turing machining but a very slow run. Compare the difference to
read and write on paper than a computer has to access ram.

I think there will be some kind of meta deep learing (still using deep
learning but compose of algebras which are augmented compared to today's
standards). We have already started this by using pretrained networks for
tasks. There is no reason RNNs won't go this way (i imagine they already are
but this isnt my research area specifically) after all, RNNS are a turing
machine.

------
nzonbi
Interesting article, in a difficult topic. Speculating about the future of
deep learning. The author deserves recognition for writing about this. In my
personal opinion, within the next 10 years, there will be systems exhibiting
basic general intelligence behavior. I am currently doing early hobbist
research on it, and I see it as feasible. These system will not be very
powerful initially. They will exist and work in simpler simulated
environments. Eventually we will be able to make these systems powerful enough
to handle the real world. Although that will probably not be easy.

I somewhat disagree with the author. I don't think that deep learning systems
of the future are going to generate "programs", composed of programming
primitives. In my speculative view, the key for general intelligence is not
very far from our current knowledge. Deep learning, as currently we have, is a
good enough basic tool. There are no magic improvements to the current deep
learning algorithms, hidden around the corner. Rather what I think will enable
general intelligence, is assembling systems of deep learning networks in the
right setup. Some of the structure of these systems will be similar to
traditional programs. But the models they generate will not resemble computer
programs. They will be more like data graphs.

I expect within 10 years there will be computer agents capable of
communicating in simplified, but functional languages. Full human language
capability will come after that. And within 20 years I expect artificial
general intelligence to exist. At least in a basic form. That is my personal
view. I am currently working on this.

~~~
LrnByTeach
20 year time frame that is around 2040 for AGI Artificial General Intelligence
in the it's BASIC Form seems in line with many experts in this filed.

> I expect within 10 years there will be computer agents capable of
> communicating in simplified, but functional languages. Full human language
> capability will come after that. And within 20 years I expect artificial
> general intelligence to exist. At least in a basic form. That is my personal
> view. I am currently working on this.

~~~
Berobero
> 20 year time frame ... seems in line with many experts in this filed

When has "20 years" _not_ been in line with the predictions of experts for the
advent of AGI?

~~~
fnl
And quantum computing, as well as fusion generators... :-)

~~~
shmageggy
Yup, [https://xkcd.com/678/](https://xkcd.com/678/) and its flavor text

------
jdonaldson
Glad to see Deep Learning "coming down to earth". This is the first high
profile post I've seen that spells out exactly how DL models will become
reconfigurable, purpose-built tools, and what a workflow might look like.
We're still a long way aways from treating them like software components.

~~~
Cacti
I mean, these are topics that have been discussed countless times over the
years and in some cases decades.

It's all well and good to say we need generalizable machines, and something
other than backprop, and something closer to traditional programs, but we all
know this. The issue is that no one knows what this would even mean, never
mind how one would go about implementing it. In the few cases we do know how,
the results are horrible compared to the methods we already use.

We use the methods we do today because they work, not because we think they
are the best, or because we don't understand the limitations of our models.

~~~
jdonaldson
True, there's been discussions, but from what I've seen it's mostly flag
planting or vague pop-eng fodder that project directors dish out to tech
journalists. Having Keras make a statement on this carries far more weight,
because fchollet is not selling a product, or pushing an agenda, or creating a
walled garden of some sort.

The only thing that's a bit off about Keras is that it's mostly the efforts of
one guy. Sure, there's many other contributors, but they don't seem to be
acknowledged. I've never seen anyone else speak for the project. I'd really
like to see a neutral party emerge for deep learning practice and tooling,
before the whole industry gets sucked into a single dominant ecosystem like
AWS.

~~~
droidist2
Do you think with Google's adoption of Keras for TensorFlow it'll get more
resources dedicated to it?

------
primaryobjects
Here are the results of my research into program synthesis using genetic
algorithms.

Using Artificial Intelligence to Write Self-Modifying/Improving Programs

[http://www.primaryobjects.com/2013/01/27/using-artificial-
in...](http://www.primaryobjects.com/2013/01/27/using-artificial-intelligence-
to-write-self-modifying-improving-programs/)

There is always a research paper, if you prefer the sciency format.

BF-Programmer: A Counterintuitive Approach to Autonomously Building Simplistic
Programs Using Genetic Algorithms

[http://www.primaryobjects.com/bf-
programmer-2017.pdf](http://www.primaryobjects.com/bf-programmer-2017.pdf)

------
kirillkh
Seeing how gradient descent is such a pinnacle of deep learning, I can't help
wondering: is this how our brain learns? If not, then what prevents us from
implementing deep learning the same way?

~~~
rsiqueira
One of the most consistent theory about how our brain learns is described in
HTM (Hierarchical Temporal Memory), a more biologically inspired neural
network. See Jeff Hawkins' "On Intelligence". It is based on:

* Input of continuous unlabeled time-based patterns.

* Associative Hebbian Learning (when distinct inputs/patterns come together, they are neuron-wired together). Synapses can be modified via experience. See "Hebbian Theory".

* The brain is a prediction machine: it is always trying to predict the future based on past learned patterns. Learning happens when reality does not match the originally prediction and we rewire the world model based on new input. See "Bayesian approaches to brain function".

* Input signals are processed by many layers, each one creating more abstraction from the previous one, from sensory neurons to the highest cortex layers.

* Each region of the hierarchy forms invariant memories (what a typical region of cortex learns is sequences of invariant representations).

* There is lots of feedback (highest level neurons back to the lowest levels). In some structures (e.g. the thalamus, that is a kind of "hub of information") connections going backward (toward the input) exceed the connections going forward by almost a factor of ten.

* Brain uses Sparse Distributed Memory (SDM). See SDM by Pentti Kanerva (NASA researcher).

* Neuron models have many more variable/parameters (that can be used to transfer or process information) than usual nodes/links from artificial neural networks. E.g.: Long-term potentiation vs Long-term depression, neuronal Habituation vs Sensitization, inhibitory vs excitatory neurons, firing rates, synchronization, neuromodulation, homeostasis and more.

* The backward propagation of errors in artificial neural networks only occurs during the learning phase. But the brain is always learning and updating weights and relationships between patterns, given new inputs.

* During repetitive learning, representations of objects move down the cortical hierarchy (from short-term memory to long-term memory), forming invariant memories.

* The brain needs to replay the memory (memory rehearsal) of a learned stimulus so it can be stored in long-term memory.

* The job of any cortical region is to find out how inputs are related (pattern recognition), to memorize the sequence of correlations between them, and to use this memory to predict how the inputs will behave in the future.

* Predictive coding: the brain is constantly generating and updating hypotheses that predict sensory input at varying levels of abstraction.

~~~
eli_gottlieb
Jeff Hawkins is kind of a crank when it comes to neuroscience, and his AI
companies have tended not to publish state-of-the-art results on machine
learning problems either.

~~~
mannigfaltig
(Replying here for visibility.) In a different comment branch you mentioned
counterfactuals. I've watched a video lecture about counterfactuals in
graphical models by Pearl, but I'm not exactly seeing the significance as a
"missing piece" in AI. Would you mind explaining a bit what you exactly mean?

Do counterfactuals have something to do with learning from negative examples
and simulations? For example, if one shoots a ball and misses the goal to the
right, one does not 'mindlessly' penalize the circuits that led to the exact
motor decisions that were involved, but instead, one simulates alternative
actions and uses e.g. (in this case linear) relationships between e.g. the
angle of the foot or the wind speed and the shooting direction. The next time,
one hence tries to aim slightly to the left.

Or are you referring to a much more fundamental level and my example might
rather be a learning strategy that is more likely acquired by trial & error,
reinforcement learning, meta learning ("learning how to learn") and/or via the
shared concept space of language and culture?

Is it maybe related to e.g. prototype-based associative recall and a
counterfactual is basically an alternative way of interpreting the data? "What
error signal would I get, if I had interpreted X as Y?"

Or does it come from the Bayesian approach where you marginalize out _all_
hypotheses, including the factual one that corresponds to the state of the
world, but also all counterfactual hypotheses. So, including counterfactuals
means going beyond the maximum likelihood point estimate e.g. by communicating
confidence intervals or even entire distributions from neurons to neurons or
neuron populations to other neuron populations?

~~~
eli_gottlieb
Counterfactuals in Pearl's sense are what allow particular models to be
_causal_ : to represent cause and effect under intervention, as opposed to
mere correlation. This is an important part of how to build models that think
like people[1].

[1] [https://arxiv.org/abs/1604.00289](https://arxiv.org/abs/1604.00289)

~~~
mannigfaltig
Is it in particular the dot product (correlation) in MLPs that prevents them
from inferring all causal structures in the data? So, instead of template
matching of co-occurrences of features in the layer below, we (also) need to
learn whether and how one feature causes the other?

~~~
eli_gottlieb
Again, it's the lack of _counterfactuals_ : the ability to intervene on a node
and cut it off from its parents, then see what happens, and the ability to
perform inferences over discrete spaces.

~~~
mannigfaltig
Are there any concrete attempts at transferring this concept to MLPs? E.g. by
overriding the values of particular nodes/features by feedback connections?

~~~
eli_gottlieb
No, because neural nets _do not work that way_ , even when they output
actions. Making things More Neural doesn't make them better, and AFAIK, not
everything good _can_ be made More Neural.

~~~
mannigfaltig
_> Because neural nets do not work that way_

Are there works that expose this limitation of MLPs more formally?

 _> not everything good can be made More Neural._

Neural networks are universal function approximators, so you probably mean not
everything good can be made with MLPs trained by _gradient descent_?

 _> It's the lack of [...] the ability to perform inferences over discrete
spaces._

How would you judge the extent to which AlphaGo has learned to react to single
discrete changes in the input. It seems that it learned very well to react
very sharply to whether a single stone is placed at a strategically
significant position.

------
guicho271828
Regarding logic and DL, there is NeSy workshop in London [http://neural-
symbolic.org/](http://neural-symbolic.org/)

------
crypticlizard
Are there popular modern libraries that do program synthesis? Although I've
thought about this and read about the concept on hn, I've not heard it
mentioned seriously or frequently or strenuously as a thing to do either just
for fun or to get a job doing it. This could be a popular way to solve
programming problems without needing programmers. I think this truly would
kick off AI as a very personal experience for the masses because they would
use AI basically like they do already do now with a search engine. People
would use a virtual editor to design their software using off the shelf parts
freely available. The level of program complexity could really skyrocket as
people now have more control over what and how they run programs because they
can easily design it themselves. Everyone could design their own personal
Facebook or Twitter and probably a whole new series of websites too complex or
for other reasons not invented yet.

For instance, you want to program the personality of a toy, so you search
around using the AI search engine for parts that might work. Or you want a
relationship advice coach so you put it together using personalities you like,
taking only the parts you want from each personality. Or another example would
be just to make remixes of media you like. Because everything works without
programming anyone can participate.

~~~
randcraw
Check out Genetic Programming:
[https://en.wikipedia.org/wiki/Genetic_programming](https://en.wikipedia.org/wiki/Genetic_programming)

AFAIK GP remains the primary means to automate the synthesis of software.
Though it was introduced perhaps 30 years ago, it hasn't been an active area
of research for the past 20, AFAIK.

------
lopatin
I'm also interested to see how the worlds of program synthesis (specifically
type directed, proof-checking, dependently typed stuff) can combine with deep
learning. If recent neural nets have such great results on large amounts of
unstructured data, imagine what they can do with a type lattice.

~~~
gtani
Recent baby steps in gradient checking:
[https://news.ycombinator.com/item?id=14739491](https://news.ycombinator.com/item?id=14739491)

------
ipunchghosts
Great work! Glad someone can finally explain this to the masses in an easy to
understand way. Looking forward to the future!

------
Kunix
About libraries of models, it would be useful to have open source pre-trained
models which can be augmented through github-like push requests of training
data together with label sets.

It would allow to maintain versioned versions of always improving models
everyone can update with a `npm update`, `git pull` or equivalent.

------
scientist
Self-driving cars are expected to take over the roads, however no programmer
is able to write code that does this directly, without machine learning.
However, programmers have built all kinds of software of great value, from
operating systems to databases, desktop software and so on. Much of this
software is open source and artificial systems can learn from it. Therefore,
it could well be that, in the end, it would be easier to build artificial
systems that learn to automatically develop such software than systems that
autonomously drive cars, if the right methodologies are used. The author is
right to say that neural program synthesis is the next big thing, and this
also motivated me to switch my research to this field. If you have a PhD and
are interested in working in neural program synthesis, please check out these
available positions: [http://rist.ro/job-a3](http://rist.ro/job-a3)

------
amelius
I'm wondering if we will ever figure out how nature performs the equivalent of
backpropagation, and if that will change how we work with artificial neural
networks.

------
nextstar
I'm excited for the easy to use tools that have to be coming out relatively
soon. There are a lot right now, but the few I've used weren't super intuitive
like I feel like they could be.

------
MR4D
Compression.

That one word disrupts his whole point of view. This idea that we need orders
and orders of magnitude more data seems insane. What we need is to figure out
how to be more effective with each layer of data, and be able to have
compression between the tensor layers.

The brain does a great job of throwing away information, and yet we can
reconstruct pretty detailed memories. Somehow I find it hard to believe that
all of that data is orders of magnitude above where we are today. Much more
efficient, yes. And that's through compression.

~~~
MR4D
Crap. I just realized why this got voted down - I posted my comment on the
wrong article.

I guess that's what I get after walking away for 30 minutes before posting.
Doh!

