
Show HN: SpaCy – Build tomorrow's language technology products with Python - syllogism
http://spacy.io
======
Smerity
If anyone is interested in NLP and hasn't had the chance to use spaCy yet,
give it a go now - it's just two commands: "pip install spacy" and "python -m
spacy.en.download" =]

spaCy is perfect for web-scale NLP (a term I don't use lightly consider I
crawl billions of pages a month) and is AGPLv3 for hobbyists / academics /
open source developers. It comes with an existing language model for English,
a concise API, and many other goodies baked in by default (i.e. word vector
representations). Even though it's in Python and near state of the art in
terms of accuracy, it's far faster than just about any other parser out there.
Finally, the demos and documentation are consistently strong and continuously
improving.

I'd highly recommend checking out the "marking adverbs" tutorial, which shows
how to develop features you might use in a proofreading tool in a few dozen
lines of Python.

[http://spacy.io/tutorials/mark-adverbs/](http://spacy.io/tutorials/mark-
adverbs/)

n.b. I was in the same NLP research group with Matthew Honnibal whilst he was
obtaining his PhD at the University of Sydney, so I'm biased, but only as I've
seen his work in person!

------
ldng
Are other languages than English be supported ? I suppose so, but then how do
I had another language ? How will it perform ?

~~~
rocko06
It only supports english at the moment[0].

[0][http://spacy.io/#comparisons](http://spacy.io/#comparisons)

~~~
ldng
So saw it was English only. Hence my question. How should I go about building
support for another language ? I skimmed the doc but didn't see anything
related to that question. Also, it is AGPL so I also wonder if you can you
TreeBank for training.

------
Beltiras
This is great! The web demo parses ordinary sentences with pretty good
accuracy. I had to resort to twisters to confuse it:

The old man the boat. (the elderly staff positions on the boat). Buffalo
buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. (see
[https://simple.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_bu...](https://simple.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo),
it's not a fair test, but it illuminates how the underlying engine works)

I'm going to point this project out to a teacher I had, see how it does with
other corpora.

------
jMyles
It's interesting, for sure, but it doesn't take trick sentences to confuse it.
This somewhat opaque but totally realistic sentence:

[http://spacy.io/displacy/?full=super%20llamas%20find%20a%20w...](http://spacy.io/displacy/?full=super%20llamas%20find%20a%20way%2C%20generally%2C%20to%20illumate%20dingos%20but%20not%2C%20if%20I%20may%20say%2C%20the%20coins%20of%20yore)

caused it to get all the parts of speech correct, but miss pretty hard on how
they are applied (ie, the arrows are pointing to the wrong objects or in some
cases even in the wrong direction).

------
bobwaycott
The "Under Construction" theme of the dependency parse tree visualization
tool[0] is amazing.

[0] [http://spacy.io/displacy/](http://spacy.io/displacy/)

------
sanxiyn
Previously here:
[https://news.ycombinator.com/item?id=8942783](https://news.ycombinator.com/item?id=8942783)

------
jMyles
Is there a glossary of annotations? For example, what is "RELCL?"

~~~
bdchauvette
I can't be certain without the sentence that generated the tag, but coming
from a linguistics background, RELCL is probably 'relative clause'.

