
Differentiable Plasticity: A New Method Learning to Learn - myhrvold
https://ubere.ng/2qlFZHy
======
no_identd
"What if the plasticity of the connections was under the control of the
network itself, as it seems to be in biological brains through the influence
of neuromodulators?"

Anyone who wishes to explore this idea would do well to go back to the basics
of neural nets and read Warren McCulloch's seminal papers on neural nets, from
the 40s:

[http://www.cse.chalmers.se/~coquand/AUTOMATA/mcp.pdf](http://www.cse.chalmers.se/~coquand/AUTOMATA/mcp.pdf)
A Logical Calculus of the ideas immanent in nervous activity

[http://vordenker.de/ggphilosophy/mcculloch_heterarchy.pdf](http://vordenker.de/ggphilosophy/mcculloch_heterarchy.pdf)
A heterarchy of values determined by the topology of neural nets

(After having read those two papers, one can then try to make sense of Heinz
von Förster's masterpiece,
[http://www.univie.ac.at/constructivism/archive/fulltexts/127...](http://www.univie.ac.at/constructivism/archive/fulltexts/1270.pdf),
Objects: Tokens for (Eigen-)Behaviors, which also bears some relevance to this
matter. However, most people find it incomprehensible.)

~~~
nairboon
Thank you for those papers! Do you have some further suggestions that build
upon Försters idea?

~~~
no_identd
Try anything citing von Förster's paper that was written by:

• Louis H. Kauffman if you're mostly interested in the pure math aspect of it.
He wrote numerous papers on the topic of von Förster's paper, you can find one
them here: [https://arxiv.org/abs/1109.1892](https://arxiv.org/abs/1109.1892)

• JM Stern and CAB Pereira if you're mostly interested in the application to
statistics and fundamental questions of the epistemology of statistics. I
wrote a thread on their works on my Twitter account at some point:
[https://twitter.com/no_identd/status/877883663014400000](https://twitter.com/no_identd/status/877883663014400000)

Try this AMAZING paper by AF Zimpel:
[http://www.emeraldinsight.com/doi/pdf/10.1108/03684920510581...](http://www.emeraldinsight.com/doi/pdf/10.1108/03684920510581701)

Also, this paper by Heinz von Förster (first order author) and Karl H. Miiller
(second order editor), where Miller basically took a lot of old papers by von
Förster and rearranged them to give a more coherent view for certain types of
readers:

[http://www.univie.ac.at/constructivism/archive/fulltexts/309...](http://www.univie.ac.at/constructivism/archive/fulltexts/3098.pdf)

------
dpflan
Interesting. Some highlighted links from the writeup:

1\. _Differentiable plasticity: training plastic neural networks with
backpropagation_
([https://arxiv.org/abs/1804.02464](https://arxiv.org/abs/1804.02464))

2\. _Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic
Artificial Neural Networks_
([https://arxiv.org/abs/1703.10371](https://arxiv.org/abs/1703.10371))

3\. Github for the project: [https://github.com/uber-common/differentiable-
plasticity](https://github.com/uber-common/differentiable-plasticity)

4\. _Learning to Learn_ ([http://bair.berkeley.edu/blog/2017/07/18/learning-
to-learn/](http://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/))

5\. Meta-Learning: [http://metalearning.ml/](http://metalearning.ml/)

------
trextrex
Very cool. It's interesting how powerful the recurrent network becomes with
the addition of the learned hebbian term. For context, even without the
Hebbian term, recurrent networks can learn to learn to do quite interesting
things (Hochreiter et al. 2001).

Shameless plug -- our lab recently ported LSTMs to spiking networks without a
significant loss in performance, and showed that learning to learn works quite
well even with spiking networks (Bellec et al. 2018).

So it seems like this method of learning to learn could provide a extremely
biologically realistic and fundamental paradigm for fast learning. The
addition of the Hebbian term neatly fits in with this paradigm too.

Hochreiter et al. 2001:
[http://link.springer.com/chapter/10.1007/3-540-44668-0_13](http://link.springer.com/chapter/10.1007/3-540-44668-0_13)

Bellec et al. 2018:
[https://arxiv.org/abs/1803.09574](https://arxiv.org/abs/1803.09574)

------
dchichkov
It’d be interesting to compare this approach against a simpler baseline:
setting a _different_ (10 – 100 times higher?) learning rate for a _fraction_
(10% ?) of neurons in an LSTM.

~~~
fizx
Interesting... I mean knockout already does this (sort of), by setting the
learning rate of 10% artificially low. I'm not sure if inverting the common
thing is useful, but it might be fun to try.

------
letitgo12345
Is the plasticity update guaranteed to reach equilibrium assuming the network
is run on iid data (as in do the H_{ij} values reach a fixed point)?

Edit: Seems like it should be reached eventually as the equilibrium point is
H_ij = y_i * y_j and they keep doing a weighted average of the former with the
latter (this is not a proof ofc as y_i * y_j keeps changing with each sample).

------
adrianratnapala
So the "plastic component" of a connection strength is a thing which decays
away exponentially, but is replenished whenever the two endpoints do the same
thing.

I have heard that neuroscientists have an adage "fire together wire together".
Is that all that ML people mean by "plasticity".

------
signa11
very cool stuff ! it might be possible to use this stuff for pruning edges
which are not that plastic as well.

------
whatever1
Good luck with getting even more suboptimal solutions with this extra non
linearity.

No wonder why when your autonomous cars are plowing into people or walls you
have no clue of what is going on.

~~~
atomical
Seems like a LIDAR problem.

~~~
whatever1
Teslas do not have lidars. They still drive happily towards stationary
objects.

Because somehow we accepted that very bad probabilistic solutions to a very
tough problem are good enough.

~~~
petters
Holmes: You have a Lidar problem.

Tesla: But we don't use Lidars!

Holmes: That's the problem.

~~~
ben_w
Humans don’t have lidar. If machine vision isn’t good enough to drive a car at
human levels without lidar, _then it isn’t good enough yet_.

(IIUC, it is superhuman overall but with some really dumb edgecases where
humans go “well that was obviously wrong, why didn’t it see the thing?”)

~~~
Firadeoclus
But that's only because our edge cases and the machine vision edge cases don't
match, since the underlying concepts are different.

And frankly, we probably don't want them to match. The main goal should be to
make the cars drive better than humans on average, even if they're not
perfect. And we know that there will always be edge cases.

~~~
ben_w
Agree they should not match, however I think it is also important that people
trust the tech, which in turn requires it to not make any mistakes we would
consider obvious. People are _really_ bad at estimating risk, a problem which
I think will be easier to overcome this way.

A system which makes exactly the same mistakes as a competent human but never
gets tired, never uses the phone, has 360 degree vision with no blind spots…
even that would be a huge improvement over the status quo. On the other hand,
a system which crashes because it missed something Joe Average calls obvious
when they see the black box pictures on the news… that system will never be
trusted enough to replace human drivers, not even when it has a tenth of the
fatality rate.

