
Theories of Error Back-Propagation in the Brain - nopinsight
https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(19)30012-9
======
GChevalier
> "The relationship of spike-time-dependent plasticity to other models
> requires further clarifying work"

\- I got something for that, in fact I think I discovered that backpropagation
engenders STDP:

[https://github.com/guillaume-chevalier/Spiking-Neural-
Networ...](https://github.com/guillaume-chevalier/Spiking-Neural-Network-SNN-
with-PyTorch-where-Backpropagation-engenders-STDP)

~~~
marmaduke
It was interesting to read how you did a spiking eural network in PyTorch, but
it seems your neurons' states are coupled continuously in time, whereas in the
brain, it would be the opposite, ie. the spike timing carries information and
not the state values.

> backprop engenders STDP

This is backwards I think, but definitely an interesting association to make

------
rdlecler1
It’s interesting that most AI models focus on learning, but in biology the
heavy lifting was all done in the developmental evolution and neurogenesis.
You’re not going to teach your dog relativity with back propagation—-yet we
treat BP like it was going to solve all the problems in AI. Until we start
focusing more on neural architecture I have no worries about AI taking over
the world.

~~~
olooney
> You’re not going to teach your dog relativity with back propagation

What do you mean? RNN's can be taught to do math using BP[1]. Heck, you can do
a simple version yourself in less than an hour[2]. They can't do tensor
calculus yet (AFAIK) but there doesn't seem be any reason why BP in particular
would be the stumbling block; the difficulty is finding the right
representation.

> yet we treat BP like it was going to solve all the problems in AI

Except for reinforcement learning which does not use BP to solve the Bellma
equations[3], AlphaZero which uses minimax[4], AutoML based on non-gradient
methods such as bayesian optimization[5], etc. Research into CNNs and RNNs for
image, text, and voice processing does tend to focus on BP, but not
inappropriately so considering that that approach continues to make rapid
progress in those domains.

> Until we start focusing more on neural architecture I have no worries about
> AI taking over the world.

I can't tell if you're arguing for or against. Are you suggesting that we
_should_ focus more on "neural architecture" _so that_ AI can take over the
world? Or are you approving of the current narrow focus for reasons of safety?

[1]
[https://arxiv.org/pdf/1809.08590.pdf](https://arxiv.org/pdf/1809.08590.pdf)

[2] [https://machinelearningmastery.com/learn-add-numbers-
seq2seq...](https://machinelearningmastery.com/learn-add-numbers-seq2seq-
recurrent-neural-networks/)

[3] [https://towardsdatascience.com/introduction-to-various-
reinf...](https://towardsdatascience.com/introduction-to-various-
reinforcement-learning-algorithms-i-q-learning-sarsa-dqn-ddpg-72a5e0cb6287)

[4]
[https://www.depthfirstlearning.com/2018/AlphaGoZero](https://www.depthfirstlearning.com/2018/AlphaGoZero)

[5]
[https://en.wikipedia.org/wiki/Bayesian_optimization](https://en.wikipedia.org/wiki/Bayesian_optimization)

~~~
YeGoblynQueenne
>> RNN's can be taught to do math using BP

The paper you cite and every single paper on artificial neural nets learning
to "do maths" or "do arithmetic" etc that has every been published. are only
showing neural nets learning [1] the results of specific operations between
numbers up to a certain value. There is no work that shows neural nets
learning to do arbitrary arithmetic operations on arbitrary numbers.

To put it very clearly: neural nets can't learn to "do maths" in any general
sense.

Neural nets are infamously incapable of generalising beyond their training
dataset and the OP is very reasonably skeptical of comparisons between
artificial and natural neural networks, given that only the latter are known
to be able to generalise from few examples seen very few times.

______________

[1] As in overfitting to.

~~~
kahnjw
>Neural nets are infamously incapable of generalising beyond their training
dataset

In what sense? DNNs can certainly generalize to examples outside the training
set. I agree that you get no guarantees on performance for samples drawn from
a distribution that is different than the training/test set. Not having
guaranteed expected performance isn't the same as "incapable" of generalizing
to a new distribution.

~~~
YeGoblynQueenne
Sorry for the late reply.

To clarify, when I say "training dataset" I mean the entire dataset used for
training. Not the training partition in a cross-validation training/test
split.

Neural nets can interpolate between the data points in their training dataset,
but cannot extrapolate to cover data points outside this dataset. This is not
a matter of architecture. DNNs are no exception. Neural nets are just very bad
at generalising.

This lack of generalisation ability is why neural nets need to be trained with
huge amounts of data, the more the better. Because they can't generalise, the
only way to get them to recognise more instances of a class is to show them
more examples of that class.

Here's a longer discussion of this:

[https://blog.keras.io/the-limitations-of-deep-
learning.html](https://blog.keras.io/the-limitations-of-deep-learning.html)

------
mjpuser
I have been on similar kicks for drawing similarities between the two, but it
never pans out to anything useful. I.e how does this move the science forward?

I can play devil's advocate (aka classic hacker news' commenter) and say what
about inhibiting neurons? Tonic neurons? What about the ability for the brain
to do recall after hearing a name once? Procedural memory? Does the
relationship you point out help you understand any of these things?

------
subroutine
As a neurobiologist, this interests me. I'm leaving this comment because I
don't have time to read the article rn, but what to come back and see what HN
thinks about it. Briefly, and naive to the content of the article, here is my
concern. Consider this network architecture...

You have 3000 input neurons that each makes ~10 synaptic connections with many
downstream neurons, but let's focus on just one of those downstream neurons.
This downstream neuron thus has 30k synaptic connections along its dendrite.
At some time t its cell body (specifically the axon hillock area) receives
enough graded electrical potential from some number of those synapses to fire
an 'action potential' (a single electrical impulse). Some of those inputs
contributed bursts of weak inputs, some may have contributed several strong
inputs, some of the inputs were from synapses on very distal regions of the
dendrite (therefore the graded signal was significantly weakened by the time
it reached the cell body) some were from synapses very near the cell body. But
they all integrate at the cell body, at which point, it doesn't matter where
they came from - they've arrived is all that matters now - and together their
strength is enough to evoke an action potential.

Let's say this action potential impulse resulted in an error of some sort,
somewhere downstream in the network. If you're playing by the rules of
supervised learning with backprop, the synapses that evoked the signal
producing the error-impulse should be made weaker. How? In biological NNs this
is the impossible question. Signals that could target individual synapses do
not propagate up axons through cell bodies up dendrites and back to synapses
(you do have slow signaling that is for homeostatic scaling, but this is
thought to simply scale network input up or down across all synapses).

This means you will need another group of neurons to send error signals back.
Maybe these recurrent projections exist, and say they did, these recurrent
projections would need to (1) form a 3rd party connection at every synapse and
(2) know which synapses were to blame for the error. Those don't seem like
trivial phenomena from a neurophysiology perspective. I've read theories on
local molecular tags or peptide synthesis, but they all seemed very much hand-
wavy. Anything that has gained traction on those hypotheses hasn't stood up to
scrutiny.

The simplest theory to me, is that cortical networks on the size of human
brains don't explicitly 'forget'. That is, they only learn. When you call
someone Bob, and they correct you saying their name is Mike, your neural nets
don't erase Bob and replace it with Mike. They remember the old name, the new
name, the embarrassing situation, all of it. Biological neural nets don't need
to explicitly forget because forgetting is (unfortunately?) a byproduct of
being a biological entity. Also, neurons have finite resources (they are
essentially zero-sum), so you simply will not see infinite run-up of synaptic
strengths (which you might in some artificial NN without setting certain
hyperparameters). Together, this suggests that recency is a fairly dominant
factor when it comes to associative learning (and what gets forgotten) in
biological NN.

~~~
terminalhealth
> recurrent projections would need to (1) form a 3rd party connection at every
> synapse and (2) know which synapses were to blame for the error

The most striking result in this regard is that one can get backprop with
random backprojections. It works regardless because, on average, the error
vector will be less than 90° from the true error vector (which is good enough
for hillclimbing) and the dynamics play out in a way that the learned weights
adjust to the random projections:

[https://www.nature.com/articles/ncomms13276](https://www.nature.com/articles/ncomms13276)

That being said, it does seem to be the case that the brain simply memorizes
an awful lot which must work by a different mechanism besides backprop because
backprop cannot do one-shot learning. I think one-shot learning is how the
brain gets past large discontinuities in the model fitness landscape: It can
learn linguistic, logical rules, fragments of general computations and facts
that are discovered by exploration (which includes learning about whether it
was Bob or Mike) and passed on culturally. The brain basically outsources the
problem of tunneling through large discontinuities, to cultural/individual
trial-and-error and episodic memory. The greatest consequence is that these
bodies of knowledge can concern the improvement of the organization of
knowledge itself, resulting in a positive feedback loop in model fitness
(especially science/Bayesian updating). Though such bodies of knowledge evolve
respecting a learnability-by-hillclimbing soft constraint, which implies they
often form a neat latent space where similar codes are organized to belong to
similar meanings/representations, as this can easily be learned by stochastic
hillclimbing (repetition) because each time the brain processes related
information, it is nudged towards the latent space that is meant to be
learned. Many parts of the world happen to be learnable in this way because
everything is kinda smooth and continuous. Small causes tend to have small
effects as everything consists of a myriad of small particles that affect each
other in smooth ways if you squint at them. Though obviously not everything
can be learned this way (implying large discontinuities) which is where a
brute memorization based on reward and punishment comes in handy.

~~~
marmaduke
> brain simply memorizes an awful lot which must work by a different mechanism
> besides backprop because backprop cannot do one-shot learning

If you look through a neuroscience textbook section on memory systems, it's
commonly suggested that the hippocampus does the one shot learning and
transfers that over time to the cortex. This is backed up by clinical case
studies.

> The brain basically outsources the problem of tunneling through large
> discontinuities, to cultural/individual trial-and-error and episodic memory

That seems like a good strategy. It also reminds me of AlphaGo's Monte Carlo
search + neural network training setup. Since the search is non differential,
you do lots of simulations and apply a differentiable DL model to the results
to approximate a possibly discontinuous landscape

~~~
terminalhealth
> If you look through a neuroscience textbook section on memory systems, it's
> commonly suggested that the hippocampus does the one shot learning and
> transfers that over time to the cortex. This is backed up by clinical case
> studies.

HC's role in episodic memory and consolidation via dreams seems kinda
plausible, though I would not put much weight on it. I think dreams are a way
of training a GAN-like discrimination between reality and imagination:

[http://gershmanlab.webfactional.com/pubs/GenerativeAdversari...](http://gershmanlab.webfactional.com/pubs/GenerativeAdversarialBrain.pdf)

Repetition of any kind likely does improve the model, even if it's merely
simulation/dreaming.

> AlphaGo's Monte Carlo search + neural network

I think, in effect, MCTS amounts to something like bagging/boosting/mixture of
experts, as it computes a weighted average of the predictions when exploring
different branches. But sure, the search mechanism implements a function which
a recurrent neural network could probably not discover as it hides behind
substantial discontinuities in fitness landscape (it's not a structure which
you can uncover step by step, but you immediately need tree structure, a
search recursion etc.). The RNN would likely need to conceptualize the search
process (subvocally but) linguistically like humans do, which requires
structure for the sequential composition of stable prototypes (symbols) which
likely requires a one-shot sequential memory. I think even the human mind does
not literally do MCTS (would require an overhead that the brain is just not
capable of), but some heuristic approximation thereof. The brain can simulate
MCTS by linguistic means, though, even if it's just words of wisdom like "take
counsel with your pillow", which literally means explore the hypothesis space
some more and let the temporal differences backup better value estimates.

------
londons_explore
"Were guessing plausible ways the brain might work in ways similar to the
machine learning world, but have no data to back it up"

~~~
ahartmetz
Backpropagation feels like a pretty gross hack to me anyway. It is a powerful
component that still needs a large amount of additional hand-engineering to do
more than recognize shapes it has seen before. And it's quite mathematical,
which seems kind of difficult to implement in a brain.

Edit: The article and code try to show how backpropagation might work given
neuron behavior. I'll leave discussing the quality of the model to biologists.

~~~
marmaduke
> And it's quite mathematical, which seems kind of difficult to implement in a
> brain.

Another commenter pointed out that you don't need perfect backprop, some
random backprijections are sufficient.

------
marmaduke
The problem with the brain, is that you find what you go looking for.

