

Dependency parse tree visualization - altro
http://spacy.io/displacy/

======
grandpa
It parses "fruit flies like a banana" the same way as "Time flies like an
arrow".

[https://en.wikipedia.org/wiki/Time_flies_like_an_arrow;_frui...](https://en.wikipedia.org/wiki/Time_flies_like_an_arrow;_fruit_flies_like_a_banana)

~~~
Phemist
Similarly, it seems to fail on "The old man the boat", marking "man" as a
noun. The meaning of the sentence however, in this case, is fairly
unambiguous, but parsing it can be tricky. See other:
[https://en.wikiped.org/wiki/Garden_path_sentence](https://en.wikiped.org/wiki/Garden_path_sentence)

~~~
amperser
In some sense, it's a mark of success for an AI system to fail in the same way
that humans do. "The old man the boat" is a terrible sentence, essentially
ungrammatical.

~~~
nsajko
Can a sentence be terrible? In what way is it ungrammatical?

------
syllogism
Please report any performance problems. I have this running on a pretty modest
server, but it should be no trouble to throw up an EC2 instance if the traffic
gets too much.

I did some simple stress testing that said it should handle a couple of
hundred concurrent users, but I didn't put the time in to get a very realistic
simulation. I worry it was too optimistic.

------
nxb
Nice, and has a built-in parser and annotation system too. Very nice.

See also:
[http://brat.nlplab.org/examples.html](http://brat.nlplab.org/examples.html)

Any others like this?

~~~
ninjin
Full disclosure, brat author. We were inspired by things like "What's Wrong
With My NLP?" [1] and TikZ-dependency [2]. But, truth be told, there is not
all that many great visualisation tools out there for NLP data. The same goes
for good and freely available NLP toolkits, things like SpaCy is very much an
exception rather than the rule.

[1]:
[https://code.google.com/p/whatswrong/](https://code.google.com/p/whatswrong/)

[2]: [http://sourceforge.net/projects/tikz-
dependency/](http://sourceforge.net/projects/tikz-dependency/)

~~~
michaelmachine
Brat is awesome! I just started using it last week for a project. For people
who don't know what Brat is, there is a nice demo here using Brat to visualize
the output of Stanford CoreNLP:

[http://nlp.stanford.edu:8080/corenlp/](http://nlp.stanford.edu:8080/corenlp/)

------
LoSboccacc
Super interesting!

I'm feeding it Shakespeare, just for fun, however I'm having trouble
understanding the meaning of having two CCMP in the context of the verb make
in this

"Our doubts are traitors, and make us lose the good we oft might win, by
fearing to attempt"[1]

[1]
[http://spacy.io/displacy/?full=Our%20doubts%20are%20traitors...](http://spacy.io/displacy/?full=Our%20doubts%20are%20traitors%2C%20and%20make%20us%20lose%20the%20good%20we%20oft%20might%20win%2C%20by%20fearing%20to%20attempt)

~~~
syllogism
Well, the parse is wrong.

In linguistic terms this is a case of "over-generation": the parser has
proposed an interpretation that's not "licensed" by the language in general.
(In contrast, consider a sentence like "I shot an elephant in my trousers." A
reading like "An elephant in my trousers was shot" is licensed but unlikely.)

To see the point of error, step through the parser until the focus is on
"win", with 4 words on the stack. (Deep linking into particular states will
come in a future version. For now, just press forward...).

At that state, the parser should attach "we might win" to "good", as a reduced
relative clause. Instead it opts to pop "good" from the stack. It then ends up
in a bad situation, and essentially attaches to "make" as a way to concede
defeat on the arc, and get on with the rest of the sentence.

The parser actually could generate any projective tree over its input. It's
only constrained by its statistics. On the benchmark evaluations this tends to
perform better. I'm interested in trying out ways to incorporate syntactic
restrictions, particularly for verb valencies, into the parser.

------
bpodgursky
I put up a similar demo a few years ago if anyone is interested:
[http://nlpviz.bpodgursky.com](http://nlpviz.bpodgursky.com)

------
bshimmin
Interesting and quite neat! I tried this with the famous openings of two
famous novels, "Pride and Prejudice" and "Ulysses"; it did well with the
former but struggled a bit with the latter. I guess that's probably par for
the course for most humans with those two texts, though.

~~~
maze-le
I think it depends on the level of complexity, and common structure of the
sentences. If a writer uses a bit more prosaic language it will fail.

I tried the famous opening from the "Commentarii de Bello Gallico".

"All Gaul is divided into three parts, one of which the Belgae inhabit, the
Aquitani another, those who in their own language are called Celts, the
third."

It failed to parse the tree in a correct way. On the other hand, it did quite
well with simple sentences (randomly picked from wikipedia).

PS: I had also a hard time to understand Ulysses.

~~~
syllogism
Interesting example. The Latin is much easier to parse than the English
"translation" for this sentence. Actually I often tell people that a classics
class is probably a better introduction for parsing than most linguistics 101
classes I've seen, which are usually a little bit airy.

Theory is definitely good, but it can't replace stepping through a lot of
examples. Classics is probably the best place to get that.

Anyway. The Latin is probably easier to parse than the English "translation",
since the translation here is hardly natural. I'd suggest that Classics
translation have a particular tradition of being faithful to the syntax of the
original.

Compare the parses for the original:

[http://spacy.io/displacy/?full=All%20Gaul%20is%20divided%20i...](http://spacy.io/displacy/?full=All%20Gaul%20is%20divided%20into%20three%20parts%2C%20one%20of%20which%20the%20Belgae%20inhabit%2C%20the%20Aquitani%20another%2C%20those%20who%20in%20their%20own%20language%20are%20called%20Celts%2C%20the%20third)

And what I would say is a more English-like version:

[http://spacy.io/displacy/?full=Gaul%20is%20divided%20into%20...](http://spacy.io/displacy/?full=Gaul%20is%20divided%20into%20three%20parts%2C%20inhabited%20by%20the%20Belgae%2C%20the%20Aquitani%2C%20and%20a%20tribe%20who%20call%20themselves%20the%20%22Celts%22)

The English-like version is still wrong, and I'm interested to dig through
what's gone wrong with it. But it's a much better parse than the tool was
capable of producing for the Latinglish original.

------
S4M
I tried "I am tired of being in front of my computer" and it treats "front"
like a noun, and the tree is not quite right (the dependencies shown are like:
I -> am -> tired -> of -> being -> in -> front -> of -> my computer, instead I
think of something like: I -> am -> tired -> of -> (being -> in front of -> my
computer)).

The UI is quite nice though.

------
donatj
Huh. I work for a book company and feel we could do some interesting things
with a parser like this...

~~~
andreasvc
I'd be interested to know what you have in mind.

