
Towards Neural Network-based Reasoning - lucidrains
http://arxiv.org/abs/1508.05508
======
nl
So this is really a subset of question answering.

A month ago I said[1] I thought the Memory Network approach to these tasks was
some of the most important neural network being done atm. Now this comes along
and jumps from 33.5% to 87% of the path finding question answering task, and
to 97.9% on positional reasoning.

I don't know what the human benchmarks are, but I'd guess the positional
reasoning rate is close to human levels.

[1]
[https://news.ycombinator.com/item?id=9960852](https://news.ycombinator.com/item?id=9960852)

~~~
ma2rten
I should point out that is an artificial dataset generated by computers
according to some simple rules. Doing good on these doesn't necessarily mean
anything.

Also, they didn't report the accuracy on any of the other 18 tasks. That makes
me believe that they did pretty badly on those.

~~~
336f5
The /r/machinelearning discussion
([https://www.reddit.com/r/MachineLearning/comments/3iayyz/neu...](https://www.reddit.com/r/MachineLearning/comments/3iayyz/neural_reasoner_achieves_90_accuracies_on/))
makes an interesting point about the 18 tasks thing:

> That was my first instinct too. Going back to the dynamic memory networks
> paper though, it turns out (§4.1) that these are the only two tasks it
> didn't solve almost-perfectly.

------
hellofunk
So, a logic system using neural nets rather than a search database ala Prolog.
This could be exciting stuff if it works well.

------
Smerity
If people are interested in getting started with question answering on the
bAbi dataset, I wrote a baseline RNN example that's now in Keras[1].

Whilst I love the theory behind the bAbi dataset, there are issues with the
data itself that still need to be fixed, particularly that there is
duplication in the datasets[2]. This duplication leads to less unique data
than expected and even overlaps between training and testing data. The problem
gets worse as you get to the 10k subsets[3]. Most worryingly, authors who I've
spoken to aren't aware of these issues.

As an example, QA17 (Positional Reasoning) which is one of the focus areas of
this paper has two major issues. First, instead of 10,000 / 1,000 unique train
/ test questions, duplication means it only has 5,812 / 632\. Second, 15% of
the exact training questions in the 10k subset are repeated in the test set.

I've contacted the creators of the dataset at Facebook but they don't seem to
think the issue is that extreme. The authors of papers using the bAbi dataset
are also not noting which version of the data they've used, which is
problematic if there are future updates...

[1]:
[https://github.com/fchollet/keras/blob/master/examples/babi_...](https://github.com/fchollet/keras/blob/master/examples/babi_rnn.py)

[2]: [http://smerity.com/articles/2015/keras_qa.html#dataset-
issue...](http://smerity.com/articles/2015/keras_qa.html#dataset-issues)

[3]:
[https://gist.github.com/Smerity/8ceb539c125cbe648bfe](https://gist.github.com/Smerity/8ceb539c125cbe648bfe)

------
fpgaminer
If I understand the paper correctly, the architecture is very odd. As far as I
can tell, it's equivalent to a deep CNN, with the facts being fed into every
couple layers. Why did they break it up into parallel invocations of DNNs, and
several layers of DNNs? Very strange.

I imagine a simplified architecture of RNN->Highway CNN->Answerer would
achieve better results while being more homogeneous.

~~~
jbaiter
Can you explain what you mean with "Highway CNN"? It's the first time I've
heard the term and Google doesn't help much either.

~~~
336f5
[http://arxiv.org/abs/1505.00387](http://arxiv.org/abs/1505.00387) Very deep
NNs, which try to bypass the training difficulties in propagating error all
the way from the bottom to the top by adding some special shortcuts between
nodes in distant layers.

------
jsf666
I would like to see how a system like this would fare against something like
the Einstein puzzle. And how it would fare against a human when it comes to
real-world tasks and descriptions. Still pretty amazing and shows real
progress.

------
amelius
I'm interested in some examples of what this technique would be capable of.

~~~
jbaiter
It's right there in the paper, table 1, page 9.

    
    
      Task I: path finding
      1.The office is east of the hallway.
      2.The kitchen is north of the office.
      3.The garden is west of the bedroom.
      4.The office is west of the garden.
      5.The bathroom is north of the garden.
      How do you go from the kitchen to the garden?
      -> south, east, relies on 2 and 4
      How do you go from the office to the bathroom?
      -> east, north, relies on 4 and 5
    
      Task II: positional reasoning
      1.The triangle is above the pink rectangle.
      2.The blue square is to the left of the triangle.
      Is the pink rectangle to the right of the blue square?
      -> Yes, relies on 1 and 2
      Is the blue square below the pink rectangle?
      -> No, relies on 1 and 2

~~~
amelius
Thanks for digging this up :) But it seems it is also quite simple to solve
such puzzles using conventional (non-neural) programming. So my next question
would be: what is the advantage of using neural networks? Or, how far can this
be pushed?

~~~
jbaiter
Well, usually you have to write quite a lot of code to be able to answer such
natural language questions.

For university I'm currently writing something similar (with Prolog), its
basically a pipeline that goes something like this: (1) natural language ->
(2) syntax tree -> (3) annotate syntax tree with logical (in my case, lambda)
terms that represent the semantics -> (4) use semantics to query knowledge
base.

(2) requires you to write a grammar, which is a lot of work, even for a very
limited subset of your target language. (3) is even harder, for every term and
grammatical construction that you want to be able to parse, you have to write
rules that translate the semantics into something the computer understands.
And (4) is quite time-consuming as well, since you have to translate your
knowledge into something that the computer can understand (in my case, logical
terms).

From what I can understand, this system allows you to skip (2), (3) and (4)
i.e. you just feed the system your facts and questions and it will perform the
reasoning by itself, no more hand-written rules. Basically, you feed it only
natural language (easy to do, even for untrained people!) and it does the rest
on its own. Which is a huge timesaver, obviously :-)

~~~
amelius
Okay this sounds interesting.

Do you know how they manage to solve arbitrary-length paths using a finite-
depth neural network? Do they use an iterative approach?

And what if I want the system to learn relations inside an arbitrarily large
corpus of text, does this mean that the neural network also needs to be of
increasing complexity?

~~~
336f5
> Do you know how they manage to solve arbitrary-length paths using a finite-
> depth neural network? Do they use an iterative approach?

This is not overall an RNN; they only use an RNN layer in the initial layer to
parse the questions & facts (pg3), then it gets fed into a regular feedforward
deep network. I think any specific implementation of this would have a limit;
for example, figure 2 implies that each 'fact' gets its own stack of DNNs, so
for k facts you need k stacks of DNNs. And each stack can only do so much
computation based on how many layers there are, so they can only solve certain
length problems before forgetting/running out of time.

So I would guess the answers to your questions are 'it doesn't, 'no', and
'yes'. (None of which is necessarily bad. It's not like humans can solve
arbitrary-length problems in our heads either.)

------
anirul
got a 403 on the link?

