
Show HN: A NLP Library for Matching Parse Trees - youngprogrammer
https://github.com/ayoungprogrammer/Lango
======
thesoonerdev
If you clicked through to the two links from laretluval and brudgers, you can
see why natural language processing as a field is struggling to gain quick
adoption (in proportion to well understood concepts). Look at laretluval's
links: The people who are doing the hardcore research are doing a really poor
job of explaining what exactly they are trying to accomplish. Can a programmer
who is good at programming but not familiar with computer science concepts
actually figure out what exactly tregex does, even after reading the page a
few times? Do you seriously expect someone to download a PPT file (yes, ppt,
not a pdf) to understand the basics?

Contrast that with brudgers link - it is actually a readable summary even
though I personally think the person who posted that blog entry still needs to
learn more concepts in NLP/English grammar/hierarchical data structures to
scale the project \- all his examples are active voice \- using regex will
fail as the sentence becomes more run on like the one you are currently
reading \- hand crafting rules for English grammar is actually super hard
because even trained linguists sometimes disagree on the parse tree produced
by fairly short sentences (I think I learnt that from watching a YouTube video
by Chris Manning, unfortunately I don't have the reference right now)

I don't understand how the NLP community seems so oblivious to this issue.

~~~
laretluval
I agree with your criticism in general, but the lack of outsider-friendly
explanation here seems justified because something like a parse tree matcher
is more of a tool that's useful inside the NLP research community than for end
users. When does an end user ever need to find trees with a particular
syntactic structure? On the other hand, it is very useful for debugging
parsers, verifying annotation standards in corpora, etc.: things that NLP
researchers have to do.

There is some NLP software that does a great job of explaining what it does,
how it does it, and why this is useful. [http://spacy.io/](http://spacy.io/)
comes to mind. Maybe that's the happy exception.

~~~
thesoonerdev
Spacy's website looks good. Thanks for the heads up.

~~~
syllogism
Unfortunately the two demos 404 at the moment. I hope we can have everything
back online soon.

------
laretluval
Similar software for matching parse trees:

Tregex:
[http://nlp.stanford.edu/software/tregex.shtml](http://nlp.stanford.edu/software/tregex.shtml)
tgrep2: [http://tedlab.mit.edu/~dr/Tgrep2/](http://tedlab.mit.edu/~dr/Tgrep2/)

------
brudgers
Related blog post: [http://blog.ayoungprogrammer.com/2016/07/natural-language-
un...](http://blog.ayoungprogrammer.com/2016/07/natural-language-
understanding-by.html/)

------
chatmasta
This would be really cool to apply to programming languages. That is, matching
abstract syntax trees together.

This way you could identify similar chunks of code. I had an idea related to
this for identifying security vulnerabilities:
[https://news.ycombinator.com/item?id=11573547](https://news.ycombinator.com/item?id=11573547)

------
bpodgursky
Nice. I started on an impl in Java several years ago but never got far
([https://github.com/bpodgursky/nlpstore/blob/master/src/test/...](https://github.com/bpodgursky/nlpstore/blob/master/src/test/java/com/bpodgursky/nlpstore/graph/query/TestQuerier.java)).

