
Attention and Memory in Deep Learning and NLP - dennybritz
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
======
MrQuincle
Early work on attention models is done by Itti, Koch, and Niebur. [1,2]. It's
called "saliency" and I think Denny would consider this more along the lines
of what the concept of "attention" should be (considering his own
words/reservations in using the term). Koch is currently studying neural
correlates, Itti is still working on this topic though. Niebur is into the
neuroscience part of it (nematode expert).

There is a lot of neuroscientific work on attention, really a lot! Overt and
covert attention. Microsaccades, very small eye movements, with already a
bunch of possible functional roles. Almost everything we know about the brains
of little kids is by studying where they look at and where they pay attention
to.

Structure-wise attention models can be quite simple. The structure that is
often seen is a WTA (winner-take-all) network with subsequent serial
inhibition. The first winner is inhibited, so the next winner can come on
stage. This is the same system as Baars has in his global workspace theory
[3]. It is also the same method as in mundane RANSAC models [4]. That's a
workhorse of computer vision in which a consensus/voting model can be used to
have data points voting for higher-level structures. When one structure is
detected, votes for it are removed, and the next most salient structure can be
voted for.

[1] [http://ilab.usc.edu/bu/](http://ilab.usc.edu/bu/)

[2] [http://cns-alumni.bu.edu/~yazdan/pdf/Itti_etal98pami.pdf](http://cns-
alumni.bu.edu/~yazdan/pdf/Itti_etal98pami.pdf)

[3]
[https://en.wikipedia.org/wiki/Global_Workspace_Theory](https://en.wikipedia.org/wiki/Global_Workspace_Theory)

[4]
[https://en.wikipedia.org/wiki/RANSAC](https://en.wikipedia.org/wiki/RANSAC)

------
andreyk
"I consider the approach of reversing a sentence a “hack”. It makes things
work better in practice, but it’s not a principled solution."

I had the same feeling about boundary box recommendations/guesses that were
used to speed up object recognition with Deep Learning fairly recently. Just
as with a sliding box approach it is intuitive and works, but it also seems
quite inelegant and like a better approach should be possible. Visual
attention seems like it should work much better in the long term, so it is
exciting the field has come to a point where it has been developed.

------
zappo2938
I'm curious which people on Hacker New are interested in this field?

~~~
har777
Why wouldn't they be excited ? It's an exciting field. I'm sure plenty of
people are learning about ml in their free time.

~~~
zappo2938
This post had 75 upvotes and only 1 comment. Not a lot of bikeshedding -- that
suggests many people are excited about this field and very few are actively
involved. I was curious if there are people here working with deep learning
and nlp either doing research professionally or in their spare time. It means
to me that there might be a lot of opportunity in this field.

~~~
har777
I agree. People are really excited but it takes some knowledge to make a
intelligent comment to this. I for one read a lot of NIPS paper's but don't
understand most of them enough.

