
Show HN: A Natural Language Query Engine Without Machine Learning - youngprogrammer
http://blog.ayoungprogrammer.com/2016/10/natural-lang-query-engine.html/
======
charlieegan3
I think you might get better results in the first stage using the dependency
parse from CoreNLP - rather than the phrasal parse. Online demo at
[http://corenlp.run](http://corenlp.run)

If you're willing to drop CoreNLP there's also
[https://demos.explosion.ai/displacy/](https://demos.explosion.ai/displacy/)
that's worth checking out.

~~~
alanlit
Nice work by the O.P.

Amusingly a year or so ago I took the Stanford Dependency parser and fed its
output tree into a Prolog system to try to pull out the semantics. It was used
to analyze business news (getting at the who's, what's and why's).

The easiest approach was to wrap a very simple DSL around Prolog (which, BTW,
Prolog is great at). Then in the DSL (which still retained logical variables
and backtracking) you could write things like:

%% Simple statements -- root is an announcement word whose subject and object
tell the story. %% 'IBM announced a new computer today' announce(Who, About,
What) ==> s+root(['announc', 'releas', 'introduc','launch', 'unveil',
'reveal', 'agre']), #Dep1, subject(Who), Dep1 >> object(About, What).

%% 'IBM has announced a partnership ...' is caught by the above. But 'IBM has
entered into a partnership ...' needs %% a little more work announce(Who,
About, announcement) ==> s+root(['enter']), #Obj, subject(Who), Obj >>
prep_pobj_chain(PPC), {PPC = [Prep|About]}.

I think a Prolog-based query planner as a front end to Sparql on Wikidata
could be quite interesting.

Alanl

~~~
alanlit
Bah -- try again so it is readable !!

    
    
        %% Simple statements -- root is an announcement word whose subject and object tell the story.
        %% 'IBM announced a new computer today'
        announce(Who, About, What) ==> 
            s+root(['announc', 'releas', 'introduc','launch', 'unveil', 'reveal', agre']), 		
            #Dep1,	
            subject(Who), 
            Dep1 >> object(About, What).
    	
        %% 'IBM has announced a partnership ...' is caught by the above. But 'IBM has entered into a   partnership ...' needs 
        %% a little more work
        announce(Who, About, announcement) ==> 
            s+root(['enter']),
            #Obj,
            subject(Who),
            Obj >> prep_pobj_chain(PPC),
            {PPC = [Prep|About]}.

------
steinsgate
Nice work! You said that you avoided machine learning because labeled data is
hard to find. What about unsupervised approaches?

Frankly speaking, I am a bit skeptical about pattern matching algorithms for
answering questions. It would help if you showed some kind of stats about your
algorithm's performance on a diverse question set. For example, you can scrape
simple quiz questions (and answers) from quiz sites [1] and report back on the
performance.

[1] [http://www.quiz-
zone.co.uk/questionsbydifficulty/1/0/answers...](http://www.quiz-
zone.co.uk/questionsbydifficulty/1/0/answers.html)

------
drdeca
In addition to the questions it does answer well, it also has these answers:

Q: "What is purpose" A: "Justin Bieber album" Q: "What is a car?" A: "country
in Africa" Q: "What is a male?" A: "capital of Maldives" Q: "What is a
female?" A: "human who is female (use with Property:P21 sex or gender). For
groups of females use with ''subclass of (P279)''"

my point in this comment is just to say that when it does give an odd answer,
it can be funny, not to say that it sometimes gives odd answers.

------
mrob
This seems almost completely useless. I tried ten questions, and only one was
answered, incorrectly (Moby Dick question misunderstood, answered as "novel by
Herman Melville"). I think even Ask Jeeves back in the 90s had better
performance than this. Questions tried:

how many lines of resolution are there in an ntsc television signal?

what is the melting point of tin/lead eutetic solder?

what species of whale was moby dick?

what grain is most often used to make beer?

what is the boiling point of water?

how many chromosomes does a normal human have?

what animal is known as "man's best friend"?

what fps did id software release in 1993?

what is the largest known prime number?

what is the clock rate of the arduino uno?

As a comparison, Google gives 8 correct answers directly (either as an special
info box, or as highlighted part of a web page), 1 correct answer as the 2nd
search result (Doom), and 1 incorrect answer (largest known prime).

~~~
azpoliak1
"This seems almost completely useless" \- seems pretty harsh. Someone making a
cool project, open sourcing it, and documenting it really well is something
that should be praised. Of course Google is going to do much better, its a
company focused on search.

------
imh
These things are always so interesting in their totally inhuman failure cases.
It can tell me George Washington was born in 1732, but doesn't know which
planet America is on (much less which planet George Washington was born on).

Also, it seems to have issues formatting dates before 1900 (for the bday one,
the answer it returns is more of an error message than an answer: "year=1732
is before 1900; the datetime strftime() methods require year >= 1900")

------
ecesena
Partially related - has anyone worked on natural language queries with time
expressions in it? Imagine analytics queries, where you want to count the
number of events/unique users, given certain conditions, and in a certain time
window. i'm particularly interested in the time aspect of it.

------
fspeech
Have you studied Prolog? Its matching (logical unification) capability may
give you some more ideas.

------
greglindahl
Very interesting! Nice to see how little code it is. I wonder how much work it
would be to get it to answer questions like "What is the biggest planet?" or
fix that "Who was Prime Minister of Canada in 1945" drops "of Canada"?

------
atoko
This is cool! I like how you've iterated on a central concept (NLP) with
different codebases.

Tip: The link to the source is pointing to github pages, which hasn't been set
up.

~~~
youngprogrammer
Thanks! Fixed the link.

------
mrcabada
This is nice! Would it be possible to run the code with other language models?
(Spanish, German, and any other CoreNLP language model)

~~~
youngprogrammer
Yes it should be possible! You would need to add the grammar matching rules
for those languages though.

------
youngprogrammer
Demo should be working now. The stanford parser getting dying from running out
of memory so I moved it to a another box

------
billconan
This Is cool! Is it easy to convert a mediawiki to the graph store your system
reads?

~~~
smsm42
While the note other commenter left is correct (wikidata is not your regular
mediawiki) you can also look at DBpedia which is doing pretty much what you
have suggested. The TLDR answer would be: "possible, but harder than it
seems".

------
alexcaps
Couldn't tell me who the CEO of Apple is... :(

