
Stanford Natural Language Parser - aphextron
http://nlp.stanford.edu:8080/parser/index.jsp
======
jackfoxy
Thanks to Sergey Tihon in Minsk the Stanford NLP library gets made available
as a .NET library about as fast as new releases come out. [http://sergey-
tihon.github.io/Stanford.NLP.NET/](http://sergey-
tihon.github.io/Stanford.NLP.NET/)

~~~
fowlerpower
Just checked that out. It looks fantastic, big thanks for sharing.

I haven't seen too many things for NLP on .NET. I have run across a ton things
on Python or C++/C. Would love to see more inn .NET.

------
YeGoblynQueenne
My favourite example phrase to mess with NL parsers:

 _Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo_

Stanford fails miserably on this one, even at the tagging step:

    
    
      Buffalo/NNP
      buffalo/NNP
      Buffalo/NNP
      buffalo/JJ
      buffalo/JJ
      buffalo/NN
      Buffalo/NNP
      buffalo/NNP
    

... but then, most humans also find that sentence really hard to parse.

Check it out:

[https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal...](https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo)

~~~
xtiansimon
That's great. But let me ask, technically, how is the sentence "a
grammatically correct sentence in American English" without punctuation? For
NLP, shouldn't this first be regarded as a 'special case' of unpunctuated
sentences? (and BTW, the parser doesn't handle the punctuated version any
better).

~~~
jcranmer
That sentence is virtually the same parse tree as "The man I saw crossed the
street," which is a totally reasonable sentence to most English speakers. The
difference in the parsing is that the "I" in this case corresponds to "Buffalo
buffalo", a noun phrase--and the sentence is confusing because we have a
tendency to add the "that" for noun phrases instead of single-word
noun/pronouns, which the sentence is missing.

~~~
YeGoblynQueenne
For me it was confusing the first time I read it because I didn't know that
"buffalo" can mean "to bully". Without that information it doesn't make sense
at all.

So I never thought there's something inherently difficulty in the _syntax_ of
the sentence- I thought it's just that unexpected third use of "buffalo" that
makes it so strange.

Of course, now that I think of it, what you say makes sense also. For an NL
parser, the variant use of "buffalo" shouldn't be an issue. You could even say
that lacking the context of what a "buffalo" really is, a parser should even
find it easier to parse the sentence than a human, who must pause to try and
understand what it _means_.

Maybe it's not such a bad sentence to test parsers after all.

~~~
WildUtah
'Buffalo' doesn't mean 'bully' in English. It means 'con.'

~~~
eriknstr
The Wikipedia article above says that though uncommon in usage, the buffalo
verbs in that sentence mean "to bully, harass, or intimidate" or "to baffle".

~~~
jholman
AIUI, it's only "to bully" in the sense of verbally overwhelming someone into
believing what you what them to believe. It's like the word "to snow" (in the
sense that means "to con").

~~~
douche
I would read "buffaloing" somebody as a less vulgar version of bullshitting,
which is also a way of lying about something, mostly to get them to buy
something they don't actually need.

------
brudgers
Parser home: [http://nlp.stanford.edu/software/lex-
parser.shtml](http://nlp.stanford.edu/software/lex-parser.shtml)

------
0x54MUR41
My undergraduate research used Stanford Parser for getting word tags in a
sentence before calculating its similarity with another sentence. Stanford
Parser results are good, but it fails in some sentences. If you need a better
alternative than this (for JVM languages), I would recommend NLP4J [1].

[1]: [https://emorynlp.github.io/nlp4j/](https://emorynlp.github.io/nlp4j/)

------
wiradikusuma
Not sure if this is the right thread to ask, wanted to ask in
[https://news.ycombinator.com/item?id=13445255](https://news.ycombinator.com/item?id=13445255)
but I missed the boat:

How difficult is it to use NLP to do keyword extraction for non-English
languages? Any pointer to get started? (not NLP in general, but how to tailor
it for non-English)

~~~
gattilorenz
As long as you have a tokenizer, i.e. you can split words in a sentence,
language doesn't matter for some basic techniques.

Two examples that easily come to mind: TF-IDF [1] and TextRank [2,3].

These are good to get started, but if you want to know more about the state of
the art, searching on Google Scholar for "keyword extraction $language" is
your best bet (and maybe "keyword extraction overview", also in Scholar).

[1] [http://www.joyofdata.de/blog/tf-idf-statistic-keyword-
extrac...](http://www.joyofdata.de/blog/tf-idf-statistic-keyword-extraction/)

[2]
[http://digital.library.unt.edu/ark%3A/67531/metadc30962/m2/1...](http://digital.library.unt.edu/ark%3A/67531/metadc30962/m2/1/high_res_d/Mihalcea-2004-TextRank-
Bringing_Order_into_Texts.pdf)

[3]
[https://github.com/davidadamojr/TextRank](https://github.com/davidadamojr/TextRank)

------
mark_l_watson
It is a nice statistical parser. I did note that it does not do anaphora
resolution (map pronouns to original referred to nouns, etc.)

~~~
gtani
did you try
[http://nlp.stanford.edu/projects/coref.shtml](http://nlp.stanford.edu/projects/coref.shtml)

~~~
mark_l_watson
Thanks, I have tried that online but it is not available as a library. My
kbsportal library does collocation, but not well.

------
charlieegan3
I always thought [http://corenlp.run](http://corenlp.run) was a better demo of
the tool. Sadly it's no longer available:
[https://github.com/stanfordnlp/CoreNLP/issues/273](https://github.com/stanfordnlp/CoreNLP/issues/273)

You could use this Dockerfile to run a local copy if you're interested:
[https://gist.github.com/charlieegan3/910276eef0f8658b44b42af...](https://gist.github.com/charlieegan3/910276eef0f8658b44b42af268599931)

------
m0th87
Looks like it can even handle garden path sentences [1] like "the complex
houses married and single soldiers and their families."

1:
[https://en.wikipedia.org/wiki/Garden_path_sentence](https://en.wikipedia.org/wiki/Garden_path_sentence)

~~~
ma2rten
I don't think this parser parses left-to-right, like humans and other parsers
like Parsley MacParseface do.

~~~
kuschku
I’m currently writing a paper on improving natural language interfaces for
databases, and I’ve ran both Parsey McParseface and Stanford’s CoreNLP for
that, and both have horrible performance.

Their results for even the most simple english sentences are horrible, often
with multiple mistagged words.

As soon as you add relative clauses, they break down entirely.

~~~
YeGoblynQueenne
Well, it's not really a solved problem, is it? Saying the performance is
"horrible" does some injustice to it. It's actually the state of the art in
solutions to a very difficult problem.

I've used the Stanford parser for university work (at Masters level). It was
part of the pipeline for a sentiment extractor. I didn't get the feeling it
performed badly, quite the contrary- the parses it generated were quite useful
as features in my extractor. It had this rare feeling of a tool that you can
actually use to do something interesting and useful.

Then again, I do have my expectations set very lowly, for this sort of thing,
especially after completing my Masters thesis (on grammar induction). Language
learning is _hard_.

~~~
kuschku
I mean, it's surprising how well it works, but if you come in, and expect to
get a perfect parse tree out of everything, so you can then easily pattern
match on top of that, well, you'll be in for a rouch surprise.

Any every slightly more complicated sentence was completely misparsed by
CoreNLP and ParseyMcParseface. As I mentioned before, as soon as you introduce
relative clauses it breaks down.

It's hard when your research was supposed to discuss how to better handle the
common knowledge problem of NLIDBs, but you have to spend a lot of time just
to get a kinda useful parse out in the first place.

------
ma2rten
Does anyone here use parsers and know of practical applications for them?

~~~
victor9000
My main use case is unsupervised feature extraction from blob text. One
application may be to extract all noun entities from a news article, so you
can automatically figure out who the article is about.

~~~
ma2rten
Good idea! However, chunking (shallow parsing) might be enough for that
purpose.

------
bpodgursky
Taking the opportunity to plug my own D3 visualization using CoreNLP : )

[http://nlpviz.bpodgursky.com/](http://nlpviz.bpodgursky.com/)

------
congerous
The assumptions we make, the more performance goes down in tasks like machine
translation, and that includes the assumption that parts of speech matter to
machines.

------
EternalData
Ah, but can it tell when I'm trying to be witty?

~~~
kobeya
That's not its job.

~~~
EternalData
I understand that -- as a parser, of course it's not its job to understand
sarcasm or wit.

Just referencing the old Chomskian bit about statistical correlates and how
relevant it is to understanding the fundamental principles behind the
expression of language (rather than merely its syntax).

I guess it was a bad joke gone over. ah well

------
douche
I really wanted to use the .NET port of this, but unfortunately it is GPL...
dead in the water for commercial software...

------
amelius
Previous discussion:
[https://news.ycombinator.com/item?id=11273240](https://news.ycombinator.com/item?id=11273240)

Unfortunately with zero comments.

------
aminorex
This software kinda sucks. I guess you get what you pay for, sometimes.

~~~
aminorex
It is crazy slow and uses absurd amounts of memory, as well as being
simultaneously an object lesson in both over-engineering, and under-
engineering. Built by academics, so it's not unexpected.

Nice example of the application of recent NLP thesis topics to parsing
problems, though. Quite suitable as documentation and reference for, e.g.
future work on SpaCy - which is much more usable in productive contexts.

~~~
stephenr
Yes, it does use a reasonable chunk of memory (I wouldn't run in with less
than 3GB available)

However, speed wise, its only slow in my experience, if you try to integrate
via a shell script - if you run it as a service and make http calls it's been
amazingly fast in my usage.

I believe they include a "service" mode as part of the package now, but for a
project a little while ago I forked & extended a project that provides a
JSON/XML http service:
[https://github.com/Koalephant/StanfordCoreNLPHTTPServer](https://github.com/Koalephant/StanfordCoreNLPHTTPServer)

------
mrcactu5
Generic sentence from newspaper article. I heard Stanford uses Java?

    
    
      (ROOT
        (S
          (PP (IN In)
            (S
              (VP (VBG keeping)
                (PP (IN with)
                  (NP (PRP$ his) (JJ insurgent) (NN campaign))))))
          (, ,)
          (NP (NNP President) (NNP Trump))
          (VP
            (VP (VBD dispensed)
              (PP (IN with)
                (NP (NNS appeals)))
              (PP (TO to)
                (NP (NN unity))))
            (CC or)
            (VP (VBZ attempts)
              (S
                (VP (TO to)
                  (VP (VB build)
                    (NP (NNS bridges))
                    (PP (TO to)
                      (NP
                        (NP (PRP$ his) (NNS opponents))
                        (PP (IN in)
                          (NP (PRP$ his) (JJ inaugural) (NN address))))))))))
          (. .)))

