
A Simple AI Capable of Basic Reading Comprehension - youngprogrammer
http://blog.ayoungprogrammer.com/2015/09/a-simple-artificial-intelligence.html
======
kajecounterhack
The difference between this graph and propositional logic is only that the
predicates joining concepts are arbitrary instead of logic operators. In that
sense, this is like Google Knowledge Graph / Freebase.

[https://en.wikipedia.org/wiki/Propositional_calculus#Solvers](https://en.wikipedia.org/wiki/Propositional_calculus#Solvers)

Solving a set of propositional logic statements is NP-Complete. I'd argue
"reading comprehension" is actually knowing the state of the world after a
piece of text, which requires solving how these predicates interact. For
example, if the paragraph is

    
    
      Bobby picked up the toy. Then he put down the toy.
    

This "semantic memory" does not "comprehend" where the toy is, and this is a
relatively simple example. I think the title "basic reading comprehension" is
thus inaccurate. Perhaps a better title is "A simple knowledge graph" or "A
simple semantic memory"

~~~
aamar
Classic example:

    
    
        The iron ball fell on the glass table, and it shattered.
        The glass ball fell on the iron table, and it shattered.
    

Human readers will pick the correct antecedent for "it" in each case, so it's
not ambiguous. But correct interpretation depends on knowing something about
how likely glass and iron are to shatter.

I love these kinds of examples, but I don't think they ought to significantly
deter the kind of work shown here; it depends on the application, but basic
interpretation could be very useful even if it can't handle every case.

~~~
kajecounterhack
Yeah, it's a bit of a nitpick over definition of "comprehension" but the
program itself is obviously useful, I mean it's a poor man's version of
Freebase. Freebase/Google Knowledge Graph doesn't claim to "comprehend"
anything, it is just a large graph datastructure with an efficient querying
mechanism. That's what this is (...given this is a poor man's version,
efficient querying mechanism may be lacking for larger graphs)

~~~
aamar
We don't have to worry about the philosophical questions of comprehension to
find cases like this present challenges. These types of example (better
articulated in mrec's link) are cases where we can imagine straightforward,
plausible queries that the computer would be unable to answer: "what
shattered?" or "what remains?" Some of the examples are such that simple web
searches wouldn't be helpful in resolving the ambiguity; you'd need some
"common knowledge." (Again: not that that's an insurmountable or _necessarily_
relevant problem.)

------
borkabrak
I love the ambition. But when you stated your next goals, I'm afraid I thought
of this:

[http://xkcd.com/1425/](http://xkcd.com/1425/)

I may very well be wrong, and I'd love it if you showed me I am. Good luck. As
I said, I really would be excited to see this go farther.

~~~
nyamhap
When was that xkcd posted? - couldn't not find a date.

Since machine vision is such a fast moving field, I think date is relevant for
understanding the context of this xkcd. Is image recognition of a bird so
inconceivable at the moment? Perhaps the joke now would be - "I'll need one
researcher and one year"

~~~
wslh
You are taking the xkcd strip very literally instead of understanding the
central concept on it.

You comment reminds me of this:
[http://www.uh.edu/engines/epi879.htm](http://www.uh.edu/engines/epi879.htm)

~~~
blazespin
I think the central concept is believe a smart engineer when he says something
can be done, but not always when he says it can't be done.

~~~
aqwwe
I think that it means that thing #1 and thing $2 might appear to be just as
difficult to untrained eyes while one would require very little work because
of what's already available (libraries, data or whatever) and the other would
be a very large task because most of the work hasn't been done yet (but of
course a different engineer might be able to find more efficient ways to do
things)

------
mratzloff
Some thoughts about creating a system like this:

Any successful implementation of comprehension must progressively enhance the
world model based on additional information. Furthermore it must understand
some basic rules, such as, "Any subject, set of subjects, or actions can be
represented multiple ways."

So if you said, "Mary's brother is Sam," or "Mary has a brother named Sam," or
"Mary's brother is named Sam," the world model must collocate the meanings
"Sam" and "brother" for Mary and be able to respond to queries about either.

Further, if you mention that John also has a brother named Sam, and then you
mention Sam in an ambiguous context, the program should be smart enough to
ask, "Which Sam? Mary's brother or John's?" Infocom games did this; you are
building a more flexible world model builder, but the parser would operate
similarly.

There is also the concept of recency. If I talk about "John's brother Sam",
ignoring the fact that pronoun references to "he" should be contextually
mapped correctly, and then mention Sam, the program should not need to ask
which Sam I mean. It would be like talking to someone who wasn't paying
attention.

Finally, there is also the concept of confidence. In the face of ambiguity
that can't be resolved, a confidence rating should be assigned based on
available information and future answers should be based on that confidence.

I suspect that if someone were to create a language parser that could create a
mostly-accurate world model AND modify itself based on new rules it read
(e.g., "When some says 'he' after referring to someone's name, they are almost
certainly talking about the man they previously referred to"), you would be
90% of the way to creating a useful virtual intelligence. It would of course
not be able to reason or have opinions of its own, but it would be extremely
useful as a virtual assistant that could learn your preferences over time.

~~~
phkahler
>> Finally, there is also the concept of confidence. In the face of ambiguity
that can't be resolved, a confidence rating should be assigned based on
available information and future answers should be based on that confidence.

That should probably be the first thing ;-) Every word has a probable meaning,
and when all of them fit together into a coherent way including context, then
the meaning is correct (probably). This also encompasses the use of pronouns,
which you touch on. An inability to resolve ambiguity should also point to the
appropriate question to ask for clarification, which you also mentioned. I
think your points are all great, just wanted to point out that I think the
notion of confidence should be more central.

I'd also go as far as saying part of the internal model of the world should
also have a confidence. Any failure to understand a sentence may actually be a
problem with its world view. But now I'm rambling.

~~~
mratzloff
_I 'd also go as far as saying part of the internal model of the world should
also have a confidence. Any failure to understand a sentence may actually be a
problem with its world view._

Yes, but that happens automatically by virtue of learning through natural
language teaching. Put the VI through school.

------
creyer
I think the AI should do more than just correlate some verbs. It should also
be capable of understanding concepts like. If I give the following: "John and
I are brothers. My mother has a brother named James. " And we ask: "What is
the name of my uncle?" The initial results are great but in my humble opinion
the big quest is to make computer learn concepts.

~~~
veb
You obviously sound like you're interested! Why not fork the repo, contribute?
:) It's a great demo project, and I love seeing these on HN.

OP - Instead of linking directly to en/Stanford Parser etc, you should get
together a list of dependencies people need to run your application. Usually
as easy as 'pip install pattern' (for the 'ImportError: No module named en')
which is `import pattern.en` :-)

I like it! It's nearly 2am so better catch some sleep, but I'm definitely
going to have a look further tomorrow.

~~~
youngprogrammer
> It's a great demo project, and I love seeing these on HN.

Thanks!

> OP - Instead of linking directly to en/Stanford Parser etc, you should get
> together a list of dependencies people need to run your application. Usually
> as easy as 'pip install pattern' (for the 'ImportError: No module named en')
> which is `import pattern.en` :-)

I didn't actually pip install anything for my project, I just downloaded and
extracted the Stanford Parser, and Nodebox Linguistics libraries. The setup
should be in the readme. I'll try to see if I can find the pip dependencies
and update the readme.

------
vonnik
This is a really neat, rules-based, Chomskyian NLP (as opposed to the
statistical kind represented by Word2vec). It's an old division...

[http://norvig.com/chomsky.html](http://norvig.com/chomsky.html)

The essential question is: Can it go from basic reading comprehension to
advanced just by adding more rules. Is intelligence simply 10 million rules?
If so, how do we go about creating new rules as language evolves? By hard-
coding them, as in the example code?

The real test for general AI and NLP is how well it, well, generalizes; i.e.
how well does it deal with situations we have not explicitly anticipated?

In my opinion, the fuzzy, statistical methods @davesullivan mentions have a
better chance at generalizing (although they may well be augmented by rules-
based AI).

If an AI doesn't have a good way of transferring what it knows to novel
problems, then it is severely limited. It's treating the world like a canned
problem with a finite number of possibilities, like chess or checkers, when in
fact the world is much more complex.

The way DeepMind combines deep learning and reinforcement learning is one way
of acknowledging that complexity.

Deep learning learns patterns in raw sensory data, which means it can ingest
and handle the new. Reinforcement learning learns to perform actions over a
series of unknown states, improving its choices by monitoring the rewards it
receives for those actions. They both maximize within uncertainty, and I think
that's our best bet going forward.

Because the world, and language, cannot be known in their entirety. The
number, motion and interrelation of the atoms of air in the room where I'm
typing this are all too large and complex to be computable. Their fluid
dynamics can only be vaguely guessed at, not deterministically predicted in a
few lines of code.

The trick will be to bridge the gap between the hard-coded, limited rules and
the unlimited recombinations of language, which is inventing new rules and
words all the time.

~~~
youngprogrammer
> The essential question is: Can it go from basic reading comprehension to
> advanced just by adding more rules. Is intelligence simply 10 million rules?
> If so, how do we go about creating new rules as language evolves? By hard-
> coding them, as in the example code?

I believe that it can go from basic reading comprehension to more advanced by
adding many rules but of course manually adding them is not very feasible or
scaleable.

> In my opinion, the fuzzy, statistical methods @davesullivan mentions have a
> better chance at generalizing (although they may well be augmented by rules-
> based AI).

I agree that a statistical model would be better since it will be able to
handle more complexity. It would be much easier to train the rules from a
dataset instead of hard coding all of them and it would be able to adapt to
new rules as well. However, I could not find a good data set for the task I
wanted.

------
dave_sullivan
Not to be discouraging, but I think research along these lines
[http://arxiv.org/abs/1503.08895](http://arxiv.org/abs/1503.08895) stands an
exponentially better chance of leading towards what OP is talking about. For
anyone interested in "building AI", read that paper and all it's references.

~~~
kaffeemitsahne
Just because it could be done with the current hip thing (neural nets) doesn't
mean all other approaches should be disregarded.

~~~
dave_sullivan
They're the current hip thing because they work really well and keep working
better. They take a fundamentally different (I think better) approach than
he's taking.

My recommendation comes from a more informed place than "Dur, neural
networks!" Of course, this is my opinion, and you are welcome to have a
different one.

What is your recommendation on the topic of most promising research areas for
teaching reading comprehension to computers? Skip deep learning and read what
instead? Or the OP has it figured out?

~~~
kaffeemitsahne
I think the strength lies in combining NNs with less fuzzy approaches (like
the OP, or more explicit pattern matching). Coulda made that a bit clearer, I
have to admit. Cuz who would want to spend an hour training their net on a big
GPU for every new command they add? :P

To take a concrete example: we trust neural networks to do the handwriting
recognition at the postal office, but once the address is digitized we use a
simple database.

~~~
dave_sullivan
>> I think the strength lies in combining NNs with less fuzzy approaches

Haha, I think you'll find the deep learning camp agrees. Read the paper I
posted, that's what the research is about (going from fuzzy knowledge to more
specific/discrete knowledge.)

~~~
kaffeemitsahne
It's interesting stuff. Don't really see why you were being so dismissive
towards OP tho.

------
ClintEhrlich
Thanks for sharing your work. As a hobby, I have spent years working on
extracting semantic ontologies from natural language, so it was fun to see
someone else's take on the problem.

As other have mentioned, you will make progress more efficiently if you survey
the linguistics literature, where a tremendous number of very smart people
have spent decades grappling with the same essential problems.

Heterodox linguistics is a veritable goldmine of ideas that can be implemented
in AI. My favorite approach is Richard Hudson's "word grammar," which you can
read about here:
[http://www.phon.ucl.ac.uk/home/dick/wg.htm](http://www.phon.ucl.ac.uk/home/dick/wg.htm)

Word grammar is particularly well suited for coding, because it strips away
lots of arbitrary linguistic formalisms in favor of a flexible, network-
centric framework. Some of the core principles, like default inheritance, were
actually taken directly from computer science.

------
anigbrowl
Nice work, but but isn't this way of breaking down sentence structure already
a standard thing? I get the impression that the writer didn't find much on
this in the AI literature, but reinvented a wheel that has been long-
established in linguistics departments.

[https://en.wikipedia.org/wiki/Sentence_diagram](https://en.wikipedia.org/wiki/Sentence_diagram)

I don't mean that in a dismissive way - even if it was a reinvention of the
wheel it's still an elegant and useful one. It would be interesting to work up
to larger chunks of text, and also to encapsulate ambiuities in some way, such
that if presented with a sentence that admits of two meanings the program
could honestly say 'I don't know, tell me more.'

------
antome
I always find it interesting how many things can be represented by finite
state automata, and related concepts. I wonder what languages/libraries
specialise specifically in handling machines of a directed-graph style? I
would imagine VHDL and co. have functionality for it.

------
logicallee
I was intrigued by the initial transcript, but disappointed that it was edited
(lightly). For the question "Why did Mary cheer?" the screenshot showed that
the simple AI literally answered "because IT be HER FIRST TIME WINNING". This
is correct, but the author edited it into a correct sentence for us.

I think it is too much to call this "capable of basic reading comprehension".
Surely, "simple sentence parser can answer reading comprehension questions"
would be more correct?

I mean it is quite an accomplishment, but there is no understanding here. In
some natural languages with less inflection or change in word order, for
example, you could answer any "why" questions with the regex
/($question_string) because (.+?)\\./ against some source corpus, and then $2
will contain your answer. It doesn't work with English due to slight changes
in word order, but surely it would be too much to state that this regex is
capable of basic reading comprehension in languages it does work in.... If I'm
allowed to massage the question the way the author massaged the output, look
at this fine result:

[http://ideone.com/84QSHC](http://ideone.com/84QSHC) (output at bottom)

Would you say that 14-line Perl program is capable of basic reading
comprehension? I wouldn't!

~~~
youngprogrammer
> I was intrigued by the initial transcript, but disappointed that it was
> edited (lightly). For the question "Why did Mary cheer?" the screenshot
> showed that the simple AI literally answered "because IT be HER FIRST TIME
> WINNING". This is correct, but the author edited it into a correct sentence
> for us.

For some reason, the library I used to get the present tense of "was" is "be".
I had a manually fix for this but accidentally removed it when I was cleaning
up the code. Sorry if I disappointed you

> I think it is too much to call this "capable of basic reading
> comprehension". Surely, "simple sentence parser can answer reading
> comprehension questions" would be more correct?

I would say it is capable of basic reading comprehension because it is
attempts to build relationships between different objects although the
relationships are weak. When humans do reading comprehension, we do what my
program does which is try to parse the sentence and understand the
relationships. But brains are also able to augment a lot more information and
thus be more flexible with understanding.

> Would you say that 14-line Perl program is capable of basic reading
> comprehension? I wouldn't!

I would say it is not because it does not understanding the relation between
objects; it only understands that sentences starting with "why" should be
answered with everything after the string "because". Also, in your program, if
there are two instances of "because" in the source, your program will only
choose the first one.

~~~
logicallee
Thanks for the reply! Thanks for sharing your work with us as well. It's
impressive.

It's not disappointing, it's quite a feat. I suppose I would say that what I
was trying to say is that "parsing", while impressive, is not a large part of
"understanding" in my opinion. To use an analogy: almost by definition,
compilers parse languages like C++ much better than humans do (basically
perfectly, unless there's literally a bug in the compiler or it doesn't follow
the standard due to some error).

But that doesn't mean they _understand_ the programs (at all.) A compiler has
no idea on an algorithmic level what a program might be doing (and if you
remove comments, maybe a person won't understand it either, if they're not
familiar with the algorithm.)

So my basic objection is that you're really calling this reading
comprehension, but I don't think anything is actually being "understood"; just
parsed. A better title would be as I suggested: simple AI correctly answers
reading comprehension questions.

The reason that I object to "comprehension" is that these days there really
are a few "deep learning" systems, that can possibly synthesize information.
(I don't know that much about them.) I don't think it's fair to elevate
semantic parsing to the level of comprehension.

However the comment by tariqali34 makes a good point, that perhaps this is a
criticism of reading comprehension tests. I know in multiple-choice tests from
standardized exams, I've been able to correctly answer reading comprehension
questions about texts that I didn't even read (by finding just the sentence
that talks about it), or in other cases, texts that were too technical and
that I didn't understand.

I would say that I would be able to answer a question about some biomedical
excerpt that I can't understand a word of, I just can't make heads or tails of
it, let's say:

 _In order to study the physiological roles of AGC kinases, a commonly used
approach has been to over-express the active forms in cells. However, due to
the overlapping substrate specificities of many AGC kinases, it is likely that
the over-expression of one member of this kinase subfamily will result in the
phosphorylation of substrates that are normally phosphorylated by another AGC
kinase. Another strategy has been to over-express catalytically inactive
‘dominant negative’ mutants of AGC kinases in cells. However, such mutants are
likely to interact with and inhibit the upstream protein kinase(s) that they
are is activated by, and thus prevent the ‘upstream’ kinase(s) from
phosphorylation of other cellular substrates. For example, a dominant negative
RSK may interact with ERK1 /ERK2 preventing the activation of MSK isoforms and
hence the phosphorylation of CREB (cAMP-response-element-binding protein) [9].
Furthermore, in Saccharomyces cerevisiae, over-expression of catalytically
inactive Rck2p, a kinase that binds to and is activated by the Hog1P MAPK,
sequestered the substrate-docking site of the Hog1P kinase, thereby preventing
Hog1P from interacting with other substrates. Thus catalytically inactive
Rck2P is acting as a dominant negative mutant of Hog1P and not Rck2P._

I couldn't answer a REAL reading-comprehension test about this: I just have no
idea what it's REALLY talking about, I don't actually understand it.
(Obviously on a syntactic level, it's not hard to parse.) I don't know what
over-expression is, I don't know what a kinase is, I don't know what
phosphorylation is. I don't understand the text. But if the questions are
simple, perhaps I could answer some reading comprehension questions about this
by parroting back quotations from it. Syntactically, there's nothing difficult
here. I just don't understand it.

So I think it's unfair to call sentence parsing real reading comprehension,
even if sometimes reading comprehension tests fail to differentiate between
the two. You can parse sentences perfectly while understanding nothing. For
example, I could answer the question "what is wrong with studying the
physiological role of AGC kinases by overexpressing the active forms in
cells?" which the first sentence refers to. I can just quote the second
sentence "Due to the overlapping substrate specificities of many AGC kinases,
it is likely that the over-expression of one member of this kinase subfamily
will result in the phosphorylation of substrates that are normally
phosphorylated by another AGC kinase". I don't understand, but I think I
correctly parroted.

So there are real issues in determining comprehension. The higher the level of
the question that is asked, the harder it is to answer without actually
understanding the text.

I did find your work very interesting, thank you.

~~~
youngprogrammer
> So my basic objection is that you're really calling this reading
> comprehension, but I don't think anything is actually being "understood";
> just parsed. A better title would be as I suggested: simple AI correctly
> answers reading comprehension questions.

I would argue that my program can understand the relationship between
different objects but I agree that it does not understand the meaning of the
relationships.

> I couldn't answer a REAL reading-comprehension test about this: I just have
> no idea what it's REALLY talking about, I don't actually understand it.
> (Obviously on a syntactic level, it's not hard to parse.) I don't know what
> over-expression is, I don't know what a kinase is, I don't know what
> phosphorylation is. I don't understand the text. But if the questions are
> simple, perhaps I could answer some reading comprehension questions about
> this by parroting back quotations from it. Syntactically, there's nothing
> difficult here. I just don't understand it.

I would also argue that you are doing very basic reading comprehension here.
You might not know the meaning of individual objects, but you understand that
"what is wrong" is "the overlapping substrate specificities of many AGC
kinases, it is likely that the over-expression of one member of this kinase
subfamily will result in the phosphorylation of substrates that are normally
phosphorylated by another AGC kinase". You might not know what that whole
phrase means, but you understand that its related to "whats wrong" with
"studying the physiological role of AGC kinases by overexpressing the active
forms in cells". I agree that my program is unable to do full comprehension in
not understanding the meaning of objects and relationships, but it can do very
basic comprehension in understanding what the relationships are.

> So I think it's unfair to call sentence parsing real reading comprehension,
> even if sometimes reading comprehension tests fail to differentiate between
> the two. You can parse sentences perfectly while understanding nothing.

I did not really call it "real" reading comprehension, but "basic" reading
comprehension. But I suppose "basic reading comprehension" is still a little
of a stretch. I think the real question here is: how can you really determine
if a program can understand something? What does understanding something
really mean? It is difficult to define something like this and it seems we
need some kind of "Turing test" for understanding.

> I did find your work very interesting, thank you.

Thanks!

------
markjspivey
this post and many of the comments here don't necessarily have a "semiotic" or
"usage-based language acquisition" or "emergent-grammar" nature about them ...

which is fine ... just different results ...

specifically:

1\. meaning is usage. 2\. structure emerges from usage.

this post and many of the comments have a world view akin to:

1\. meaning is structure. 2\. usage emerges from structure.

what i mean by this is that the "analysis" of the text doesn't exist in the
same "world" as the text.

meaning that its nothing like "real" (natural) language.

what i mean further by this is simple:

humans don't "use" natural language... it is an emergent property of other
systems of externalized behaviors and such by individual humans.

and such emergent properties and systems are also evident in any development
of this actual system and many of the comments here.

producing the comprehension introduces in-comprehensible things ... or at best
just divorces systems (discontinuous) ... at which point any thing can be any
thing, so debating it as such here doesn't even matter (value) .

im not exactly sure what im saying here but it is akin to:

1\. nthorder cybernetics (mostly like 3rd and 4th and such) 2\. autopoesis
(humberto maturana and francisco valera)

im going through the same type of analysis regarding "activity stream" type
APIs which use an "actor verb object" type form ... (usage from structure) ...

------
sylphiae
Really impressive program! I'm a beginner as well and I'd like to know about
how you learned to program. Your code looks advanced to me:)

------
Tobu
This rules-based framework is way too rigid for “reading comprehension”. It
would fail to recognize much from naturally written sentences.

------
acd
Very cool project. I would be good to have an AI similar to this and if it
could read and comprehend lots of research articles. I dream of a AI capable
of reading all research articles on the latest battery tech and then be able
to understand and make recomendations from that. The question I would like us
to ask the AI how would you create the worlds most efficient battery?

------
ThomPete
It's thanks to people like you things move forward for that I thank you.

Of course it's going to be challenging, who cares – you are going to learn a
lot of things even if this attempt fail and you are going to allow other
people to stand on your shoulders.

------
sabujp
this is first year ai stuff

