
Highest results yet for machine reading comprehension on two benchmark tests - rahulmehrotra
http://www.theverge.com/2016/6/8/11876760/maluuba-epireader-machine-learning-comprehension-reading-text
======
cromwellian
Is this really reading comprehension, being able to fill in a template by
finding facts in the story? I mean, I get that it's able to parse the query
and find the best fact within the story to fit that slot, but this is more
like a better search indexer/Google Factbook then true comprehension.

My daughter is in 3rd grade, and 3rd grade reading comprehension requires an
ability to make inferences about the text and reasons behind people's acts
even if the verbatim text isn't present.

How could a pure deep learning approach work here, when to be able to make
inferences, you'd need knowledge about an enormous number of entities.

Would this systems be able to answer a question like "Where did Snow White's
friends or housemates go" if the word "friend" or "housemate" weren't in the
text? Or perhaps an easy question for a child: "How do you think Snow White
felt after the dwarves left?"

~~~
andrewljohnson
_Would this systems be able to answer a question like "Where did Snow White's
friends or housemates go" if the word "friend" or "housemate" weren't in the
text?_

A compounded system would. This fill-in-the-blank neural net, combined with
another neural net to do synonyms (match the word from text that would align
with friend/housemate)... that's like Word Vectors:
[https://www.tensorflow.org/versions/r0.9/tutorials/word2vec/...](https://www.tensorflow.org/versions/r0.9/tutorials/word2vec/index.html).

I think your daughter has these sorts of learnings and more. Also, if we give
a computer/robot a bunch of these neural nets (reading, word-veccing, etc),
and it is also programmed as a chatbot, we're going to start having a lot more
existential questions about consciousness.

This convergence of simple systems to form "consciousness" seems to match some
recent evolutionary science
too:([http://www.theatlantic.com/science/archive/2016/06/how-
consc...](http://www.theatlantic.com/science/archive/2016/06/how-
consciousness-evolved/485558/). Though maybe I'm not making a fair leap.

------
rahulmehrotra
Here is the paper:
[http://arxiv.org/abs/1606.02270](http://arxiv.org/abs/1606.02270)

"EpiReader does this using two neural networks, a type of AI inspired by how
neurons work in the human brain. The first neural net picks a set of likely
answers based on its understanding of the paragraph. The second evaluates the
reasoning used by the first to come up with the right answer."

------
YeGoblynQueenne
Well, that's a nice tnetennba. Their results are actually more impressive than
what's reported because the human baseline is in the 80% ish, so they're
getting close (but only on this test so don't expect Her just yet).

On the flip side, the complete test includes different categories of word:
principal noun ("named entity"), noun, verb and preposition. This team -like
others- only report their results on the two noun categories because LSTM RNNs
already get human-like results (that's in the paper). Facebook's original
paper noted that nouns and principal nouns are hard for statistical methods. I
say probably because there's not enough of them to go around in text, Facebook
says it's because of local-only context.

The point is that this is a state of the art result only on named entities and
nouns. If you want to process verbs and preposition, good old LSTM are your
friend. If you want both, it's going to hurt.

Also, Facebook used Memory Networks and reported they do much better on nouns
than LSTM. This paper uses... er, it's complicated. There's GRU and
bidrectional RNNs, and Convnets, besides that it's a bit fuzzy. In any case it
looks like the complex architectures have the edge now.

But, I don't want to see too much into this because this is not some
competition, like ImageNet, the teams all have the entire data and there's
nothing protecting them from overfitting to the test set.

~~~
akirajimbo
AFAIK some split of the children's book test dataset are actually not that
well-formed. For example the Verb split, about 1/4 of the correct answers are
'said', which means you can get ~25% accuracy by just choosing 'said'. Then it
may make less sense to make too much effort on those ones?

------
svantana
Interestingly, another research team published slightly better results across
the board on the same data sets, on the same day [1]. Also, both teams are
Canadian, I believe.

[1] [http://arxiv.org/abs/1606.02245](http://arxiv.org/abs/1606.02245)

~~~
rahulmehrotra
You are correct - both teams are Canadian and part of Maluuba Research Lab in
Montreal. [www.maluuba.com]

Here is the difference between the two:

The EpiReader model uses a two-stage process to arrive at an answer. The first
stage extracts a small set of candidate answers from the text passage and the
second stage turns these into hypotheses and then tests them through neural
inference.
[[http://arxiv.org/abs/1606.02270](http://arxiv.org/abs/1606.02270)]

The Alternating Iterative Attention model goes through a dynamic procedure of
focusing on different parts of the question and the document in turn.
[[http://arxiv.org/abs/1606.02245](http://arxiv.org/abs/1606.02245)]

Both models achieve fairly similar accuracy averaged across the two datasets,
both beat the existing state of the art results by Google DeepMind, Facebook,
and IBM.

------
crypto5
" Both corpora use Clozestyle questions (Taylor, 1953), which are formulated
by replacing a word or phrase in a given sentence with a placeholder token."

Looks like it is more task of text search, not "reading comprehension"..

~~~
rahulmehrotra
Cloze-style is just the technical term for a fill-in-the-blanks question, and
fill-in-the-blanks questions can be mapped to standard questions:

For example: _____ really missed the dwarfs -> Who really missed the dwarfs?

