
A Theory of Sequence Memory in Neocortex - eaguyhn
https://blog.acolyer.org/2017/10/23/why-neurons-have-thousands-of-synapses-a-theory-of-sequence-memory-in-neocortex/
======
iandanforth
One of the key points of this theory is that its learning rule is _hebbian_. A
variation of "If it fires together it wires together." Individual synapses
make or break connections based on if the input neuron firing immediately
proceeds the firing of the next neuron in the chain.

This is distinctly different than say a recurrent neural network where all
neurons have outputs at each timestep and weights are updated based on the
derived contribution of each weight to the quality of a final output.

The first rule is very local, both in time and space, and the second is in
some senses "global."

Backprop has obviously had _much_ more success in ML than any Hebbian learning
rule. However we're pretty sure biological learning is essentially Hebbian.
The merger / reconciliation of both these rules is one of the most interesting
areas of research right now.

~~~
tom_wilde
Sounds interesting, any papers that I/we should read..?

~~~
iandanforth
I might start with this paper from Bengio et al. and then read more recent
papers which cite it. The problem hasn't been solved though so don't be
surprised if you're not fully convinced by the arguments :)

[https://arxiv.org/pdf/1502.04156.pdf](https://arxiv.org/pdf/1502.04156.pdf)

------
zackmorris
To distill down the article:

Most of what we're seeing now with neural networks was pretty well understood
20+ years ago. The hard part is forming hierarchies or clusters of networks
that are only active in certain scenarios, so that we can scale learning
beyond simple pattern recognition (which is effectively a solved problem
today).

I've read very few satisfactory articles on how to go about teaching large
neural nets to learn multiple distinct patterns and know which networks to
engage in different scenarios. This article is suggesting that dendrites form
that orchestration layer by waiting for enough specific patterns to happen in
close enough proximity that the neuron becomes engaged and goes to work when
the rest of the pattern arrives. This gives the network some ability to
predict what is coming, whereas the simple neural networks we model would only
fire when the whole pattern matches.

There is a bit of hand waving here and I probably missed something so please
add any insights of your own thanks!

------
amthewiz
I am working on a hierarchical predictive spatio-spectro-temporal pattern
model, which uses convolution-like patterns that use time and frequency of
spikes, where each spike indicates a pattern 'match'. Patterns evolve over
time from weak, non-specific (in space, time and frequency) preferences to
strong, specific preferences. I think this approach has the potential to
become the substrate on which we could build general AI.

I am very curious to know what labs/companies are using this approach.

~~~
nairboon
Could you point to some papers about that model?

~~~
amthewiz
No papers yet, still in (early) implementation/experimentation phase.
Currently surveying research about minimal sentience/consciousness exhibited
by animals and corresponding neural correlates. If there is enough data to
isolate functional basics of minimal consciousness, I would like to implement
that as the first "function" of the HTM model.

See this blog post for why I chose consciousness as the primitive to build
(instead of say demoing the model's characteristics on common tasks like mnist
digit recognition) - [https://medium.com/creating-artificial-
consciousness/the-cas...](https://medium.com/creating-artificial-
consciousness/the-case-for-artificial-consciousness-1aed97ba1670)

~~~
tmzt
Something like a small rodent or sufficiently intelligent cockroach wondering
around in OpenAI?

------
rand_r
> The neuronal model used in most artificial intelligence networks contains
> few synapses and no dendrites.

Does anyone understand why this is true?

In a typical multi-layer network, doesn't each node in a lower layer connect
to every node in a higher layer? All the {L(i, n-1), L(a, n)} edges going from
the nodes in layer n-1 to to a particular node (a) in layer n would constitute
a dendrite.

~~~
ewjordan
In this article they're using the word "dendrite" to mean a bit more than just
an input to a neuron (which does exist in NN models) - one of the often
criticized simplifications of most neural net models is that they don't
account for the structure and function of the entire dendritic tree.
Specifically, in the image you're referencing in the article they point out
that synapses very near the soma directly cause action potentials, whereas
further out they will cause some depolarization but not actually cause the
cell to fire.

The way some authors explain it is that you can think of a neuron as _really_
having an entire neural net embedded inside of it (based on the structure and
function of the dendritic tree), that does some non-trivial amount of
information processing even before you consider the connectivity of neurons to
each other. Exactly how non-trivial that is is a deeper question, but it's
worth noting that these structures are extremely plastic, and change over
timescales of seconds to minutes, so it's not hard to imagine that these
details are significant.

~~~
rand_r
Ah, thanks. So another way of wording it would be to say that the activation
function of nodes in a neural network is a gross oversimplification of the
amount of logic that happens via dendrites in actual neurons.

------
bobosha
This theory is very intriguing and intuitively makes sense. I read Hawkins'
book way back in 2004 but there haven't been any successful implementations by
Numenta. Mostly toy examples AFAICT. I would love to be wrong and see some
great applications come out of this (much like the AI spring 2011- using conv
nets etc.)

~~~
shepardrtc
I think cortical.io has had success with Numenta.

I also read Hawkin's book and loved it. I'm still hoping that they'll get
something going.

~~~
godelmachine
Do we need a background in neurobiology to understand Hawkin's - "On
Intelligence"?

~~~
boothead
No - it's actually pretty straightforward. I think I should read it again :-)

------
RingwormOne
A very interesting model. It aligns with that general principle of biology
that dictates biological systems should do as little work as necessary. The
robustness to cell death exhibited in figure B near the end is reminiscent of
our own brain's robustness.

~~~
Gibbon1
Knowing not much, I wonder if randomly disconnecting neurons during training
would make the resulting network more robust.

------
canjobear
I'm not convinced that the function of the brain is prediction: it seems that
prediction is subsidiary to defining an action policy.

Prediction is only useful for an organism inasmuch as it allows it to
calculate expected future utility for actions. And these actions are the only
things that lead to different outcomes in terms of evolutionary fitness, thus
the actions that are ultimately output by the brain are the only things that
determined its evolution.

If the purpose of prediction is to calculate expected future utility of
different actions, then it does not follow that a _general-purpose_ prediction
device will be useful, because the prediction device might use all its energy
predicting aspects of the environment along dimensions that are irrelevant to
utility. A useful prediction device would only predict along useful
dimensions, and may be very different in behavior from a generalized
autoencoder. As an example of this difference as it comes up in the field of
deep learning, consider the case of machine translation: you could either
train a sequence model (LSTM or whatever) to autoencode sequences---i.e., to
be able to predict them---and then use the resulting representations to do
translation, or you could train end-to-end where the objective function is
translation quality. It turns out the latter yields better results.

Maybe the brain, or part of the brain, is a prediction engine and another part
does action selection based on the predictions. But then why would we identify
intelligence with the prediction engine part rather than the two parts
combined? Searching for an optimal action is a very different task than
prediction; it seems you need both to have what anyone would call
intelligence.

~~~
jamesrcole
There's a danger here in getting too caught up in words (where what they're
trying to capture is what matters), but can't "searching for an optional
action" accurately be seen as a kind of prediction?

------
daveguy
I think that Hawkins has a great summary on some things that are necessary for
intelligence [0]. Personally I think the tie between motor actuation and
sensory detection is one of the most important -- did a project on that in
2010. Motor-sensory link is essentially how all intelligence evolved. Even
c.elegans has neurons to determine whether it should wiggle one way or
another.

On the surface he seems to be cramming a whole lot of "solutions" into his HTM
(hierarchical temporal memory). HTM is an interesting implementation and the
sparse coding is definitely a benefit. However I think he is focused too much
on his baby and not on other techniques that may fulfill the necessary
components more efficiently.

That is just on the surface. With a product / research balance maybe we just
aren't seeing all the cool things going on underneath in research, but it does
seem like that research will be shoe-horned into HTM whether or not it is the
best architecture.

[0]
[https://www.youtube.com/watch?v=4y43qwS8fl4&app=desktop](https://www.youtube.com/watch?v=4y43qwS8fl4&app=desktop)

Starting at ~8m20s

------
mannigfaltig
_> By firing earlier it inhibits neighboring cells, creating highly sparse
patterns of activity for correctly predicted inputs._

This part is so vague. It seems to lack an explanation of how interneurons
inhibit other neurons nearby. Also, wouldn‘t sparsity even occur without the
early firing enabled by distal pattern matching?

 _> When relatively few neurons are active relative to the population, then
such pattern recognition is robust._

Why?

~~~
scottmp10
It's true that the columnar inhibition is not the best supported part of the
theory but seems a very strong deduction based on many papers that don't
directly support but show the same behavior as this part of the theory. We
know that minicolumns exist and cells in minicolumns share receptive fields.
But they aren't always active together and HTM theory provides a hypothesis
for why. This very recent preprint paper out of Michael Berry's lab at
Princeton shows almost identical behavior that HTM sequence memory would
predict and I'm not aware of other theories that would have predicted this
behavior:

[https://www.biorxiv.org/content/early/2017/10/03/197608](https://www.biorxiv.org/content/early/2017/10/03/197608)

As far as sparsity supporting robust pattern recognition, this paper details
the math that shows this:

[https://arxiv.org/abs/1503.07469](https://arxiv.org/abs/1503.07469)

