
Inferring Algorithmic Patterns with Stack - antman
https://research.facebook.com/blog/1642778845966521/inferring-algorithmic-patterns-with-stack/
======
andrewtbham
Here is a recent Facebook research paper that addresses the problem I believe
this paper is intended to solve.

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
[http://arxiv.org/pdf/1502.05698v6.pdf](http://arxiv.org/pdf/1502.05698v6.pdf)

This recent paper is an algorithm to solve simple question answering from the
context of a conversation... like the ones in the older paper... for instance.

John is in the playground.

Bob is in the office.

Where is John? A:playground

I believe a traditional LSTM RNN would have likely cleared the information
about John from from it's long short term memory vector. Whereas you would
still have access to the information about John in a stack.

fta: "Since the machine can essentially learn how to operate its memory, it
can learn how to program itself by producing sequences of instructions."

~~~
Smerity
If people are interested in playing with the dataset from "Towards AI-Complete
Question Answering: A Set of Prerequisite Toy Tasks", I added an example to
Keras that uses recurrent neural networks[1]. It achieves similar accuracy to
the baseline LSTM approach in the paper. I also did a write-up that explains
the task and architecture[2].

The dataset (bAbi tasks[3]) released by Facebook is interesting but has a
number of limitations. The code used to produce it is also not open source. If
someone has the interest and/or past experience in MUD style environments,
creating an open source dataset would be amazing!

[1]:
[https://github.com/fchollet/keras/blob/master/examples/babi_...](https://github.com/fchollet/keras/blob/master/examples/babi_rnn.py)

[2]:
[http://smerity.com/articles/2015/keras_qa.html](http://smerity.com/articles/2015/keras_qa.html)

[3]:
[https://research.facebook.com/researchers/1543934539189348](https://research.facebook.com/researchers/1543934539189348)

~~~
nl
Oh!! This is something I must try.

I remember you've mentioned issues with the bAbi question set previously. Have
you (or anyone else) tried using the same approach on a non-artificial
dataset?

I'm thinking something like blending this with the approach Richard Socher
used on the Quizbowl code[1].

[1] [http://cs.umd.edu/~miyyer/qblearn/](http://cs.umd.edu/~miyyer/qblearn/)

------
reader5000
I wonder why gradient descent (gating all compute/memory operations with
parameters, gradient updates to parameter vectors along sample loss gradient)
ended up dominant over genetic programming (algorithms as parse trees,
mutate/recombine to minimize sample loss). Both seem equally theoretically
unjustified, although I guess SGD is more friendly for gpus.

Also I wonder if all programming should just be done by showing a sgd-solver
example i/o. Of course this notion has been around since the 80s with prolog
but apparently representing "concepts" as 1000 dimension feature vectors is
more robust than single boolean variables.

~~~
lars
With SGD, you know how to take steps in the direction of a better solution.
You don't have that with genetic programming, there you just know if a
solution happened to be better.

~~~
p1esk
There's a weight perturbation algorithm, which is somewhat similar to genetic
programming. The reason it's not as popular as SGD/backprop is it's
computationally intensive, and it's hard to parallelize.

------
waterlesscloud
Can someone knowledgeable in the field elaborate on the differences between
this and other NN approaches that also use additional memory structures (LSTM,
etc)?

~~~
andrewtbham
I will hazard a guess as to why this might be better than LSTM. LSTM maintains
a vector that represents the long short term memory, but once information is
cleared from the vector, there is no way to get it back... whereas a stack you
can keep going further down the stack if needed.

[http://colah.github.io/posts/2015-08-Understanding-
LSTMs/](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

~~~
sxyuan
One related work that came to mind was Neural Turing Machines, which augments
the LSTM network with a (fixed size) memory [1]. I did see a brief reference
to the Graves et al. paper in the paper linked to in the original post, but
I'm also wondering if someone more knowledgeable could elaborate on the
difference.

My own guess would be that the models are similar (both using NNs with
external memory, trained through backprop), but this has a simpler memory
structure and so could be trained more easily. It seems like the choice of
regular RNNs vs. an LSTM network doesn't matter much since the Graves paper
also compared LSTM vs. feedforward controllers and the results were about the
same, with the feedforward controllers actually learning a bit faster. The key
of both works seems to be in making a continuous system that can be trained to
use external memory through backprop.

[1] [http://arxiv.org/abs/1410.5401](http://arxiv.org/abs/1410.5401)

------
discardorama
"It should be noted that similar line of research has been pursued by several
research groups already in the 90's"

I chuckled at that, because I bet it's a reference to Schmidhuber. :D

------
skybrian
I wonder if this sort of thing would be useful in american fuzzy lop? (It
already uses genetic algorithms.)

------
andrewtbham
Reasons this high has probability of credibility: Research probably used on
Facebook digital assistant M. Tomas Mikolov, the author of word2vec. Paper
_with_ source code.

[http://arxiv.org/pdf/1503.01007.pdf](http://arxiv.org/pdf/1503.01007.pdf)

[https://github.com/facebook/Stack-RNN](https://github.com/facebook/Stack-RNN)

~~~
ching_wow_ka
I doubt that it's used in M.

"Why not just build M with neural nets from the beginning? Without the right
data, neural nets couldn’t provide a service much more powerful than, well,
Siri, and Wit.ai’s tech can get things started with relatively little data.
“This is a good way to bootstrap. With a few thousand data-points, you can
start to build a model,” Lebrun says. “Then, using this model, you get more
data, and once you have about a million data points, you go to Yann and get
some deep learning.”" [1]

[1] [http://www.wired.com/2015/08/how-facebook-m-
works/](http://www.wired.com/2015/08/how-facebook-m-works/)

~~~
andrewtbham
That was a month ago... I'm sure they have some data by now or at least a plan
on what to do once they get more data.

