Some sentences are just totally ambiguous without context. "Fruit flies like a banana." isn't even good English. Is the sentence trying to say "Some particular fruit flies like a particular banana"? Or "All fruit flies like any banana"?
By the way, Spacy creator - how's the NER coming along?
It is possible to do incremental dependency parsing with a beam, but all the copying of beam "states" is expensive and there are no guarantees that the n complete parses in the beam are really the n best parses w.r.t. the model.
By the way, the beam isn't slow from the copying. That's really not so important. What matters is simply that you're evaluating, say, 8 times as many decisions. This makes the parser 6-7x slower (you can do a little bit of memoisation).
Also, maybe a dumb question - is there any library or best-practice method for the ensembling of taggers / chunkers? Or must I create it myself from scratch?
I did some simple stress testing that said it should handle a couple of hundred concurrent users, but I didn't put the time in to get a very realistic simulation. I worry it was too optimistic.
See also: http://brat.nlplab.org/examples.html
Any others like this?
I'm feeding it Shakespeare, just for fun, however I'm having trouble understanding the meaning of having two CCMP in the context of the verb make in this
"Our doubts are traitors, and make us lose the good we oft might win, by fearing to attempt"
In linguistic terms this is a case of "over-generation": the parser has proposed an interpretation that's not "licensed" by the language in general. (In contrast, consider a sentence like "I shot an elephant in my trousers." A reading like "An elephant in my trousers was shot" is licensed but unlikely.)
To see the point of error, step through the parser until the focus is on "win", with 4 words on the stack. (Deep linking into particular states will come in a future version. For now, just press forward...).
At that state, the parser should attach "we might win" to "good", as a reduced relative clause. Instead it opts to pop "good" from the stack. It then ends up in a bad situation, and essentially attaches to "make" as a way to concede defeat on the arc, and get on with the rest of the sentence.
The parser actually could generate any projective tree over its input. It's only constrained by its statistics. On the benchmark evaluations this tends to perform better. I'm interested in trying out ways to incorporate syntactic restrictions, particularly for verb valencies, into the parser.
I tried the famous opening from the "Commentarii de Bello Gallico".
"All Gaul is divided into three parts, one of which the Belgae inhabit, the Aquitani another, those who in their own language are called Celts, the third."
It failed to parse the tree in a correct way. On the other hand, it did quite well with simple sentences (randomly picked from wikipedia).
PS: I had also a hard time to understand Ulysses.
Theory is definitely good, but it can't replace stepping through a lot of examples. Classics is probably the best place to get that.
Anyway. The Latin is probably easier to parse than the English "translation", since the translation here is hardly natural. I'd suggest that Classics translation have a particular tradition of being faithful to the syntax of the original.
Compare the parses for the original:
And what I would say is a more English-like version:
The English-like version is still wrong, and I'm interested to dig through what's gone wrong with it. But it's a much better parse than the tool was capable of producing for the Latinglish original.
It is not as curing cancer but as close as solving war as it gets
The UI is quite nice though.