
SpaCy 2.0 released - nl
https://github.com/explosion/spaCy/releases/tag/v2.0.0
======
syllogism
Demos: [https://demos.explosion.ai/displacy-
ent/](https://demos.explosion.ai/displacy-ent/)

The English neural network named entity model is a huge improvement over the
v1 model. However, the training data is still all from 2010, so it makes some
notable errors. We're working on improved training data, using our annotation
tool Prodigy ([https://prodi.gy](https://prodi.gy)).

The NER for the other languages is trained on "silver standard" data from
Wikipedia, so the quality is much less consistent, especially if you're
working with social media text or "chat bot"-type inputs.

~~~
wodenokoto
I'm a bit confused as to what SpaCys revenue model is. Didn't it use to be
free community edition / paid enterprise support model?

~~~
syllogism
The very first release was under a dual AGPL/commercial license. This was bad.
It prevents open-source developers from building on top of it, and it
discourages people from getting in touch.

We bootstrapped the company by doing consulting, and now we're releasing
products adjacent to spaCy. We've had a great response to our annotation tool
Prodigy, which is currently in free beta: [https://prodi.gy](https://prodi.gy)
.

The license model for Prodigy is pretty simple: permanent per-seat licenses,
with pricing that compares pretty favourably to other developer tools.

We're looking forward to releasing some other offerings alongside spaCy. We
don't like to say too much because timelines are tough --- we don't want to
release something half-finished to stick to a schedule. The Explosion AI
mailing list is the best way to stay in the loop.

~~~
JustFinishedBSG
> The license model for Prodigy is pretty simple: permanent per-seat licenses,
> with pricing that compares pretty favourably to other developer tools.

Can you be more precise ? Because I don't want to invest time in a tool to
discover that I can't afford it months later ( having a research student
budget and all, i.e my "budget" is my own money ).

:(

~~~
syllogism
The license for an individual developer will be a few hundred dollars ---
sorry for the vagueness. We'll be ready to release official pricing soon.

For research students, we think your institution should be covering you! We'll
be offering an academic subscription, so research institutions can pay a
yearly flat fee to have all staff and students covered.

------
kamac
For me, the most important thing about this version is the reduced memory
usage. Previously the smallest english model took 1GB of RAM, making it
troublesome to run it on any cloud instances. If v2 is to take ~200mb instead,
that's a huge improvement.

~~~
Vaskivo
Does that mean that it can run on a Raspberry Pi?

~~~
kamac
Unless RAM usage hasn't significantly increased beyond 200mb since alpha, it
should run.

------
ashish01
This is great. I really really hope they have a stable and big enough source
of revenue to keep the development going.

------
arrmn
Thank you for providing such a great tool, I'm excited to try version 2.0.
I've also played around with Prodigy. SpaCy was my start in NLP, I really hope
it is going to stay around.

We've developed a great product for our customer with SpaCy, it wouldn't be
possible without SpaCy.

------
nl
Thanks to @syllogism for spacy. It’s one of those tools which make Python the
go to language for NLP.

------
danso
Congrats! Have been following SpaCy since it was first discussed/argued here
on HN. I haven't had much reason/imagination to use NLP in work but I
frequently recommend it to students as most of their curriculum is centered
around old versions of NLTK.

------
halfdan
This is awesome! I've been meaning to get into NLP / Computer Linguistics for
a while now.

Can anybody share what kind of projects you're doing that benefit from SpaCy?
Do you use it as-is or do you build on top of it?

------
pqwEfkvjs
Kudos to Matthew, Ines and others making this possible.

I haven't checked it out myself yet, so I wanted to ask that are the
performance issues fixed that were haunting the 2.0 alpha version?

~~~
pqwEfkvjs
Found the answer myself from the release docs: > The Language.pipe method
allows spaCy to batch documents, which brings a significant performance
advantage in v2.0. The new neural networks introduce some overhead per batch,
so if you're processing a number of documents in a row, you should use
nlp.pipe and process the texts as a stream.

So if you have an event based system where you can process only a single
document at once, it does not make sense to upgrade yet, because for a single
document case the runtime performance was 10x-100x slower, at least with 2.0
alpha version.

~~~
syllogism
But with a nice caveat: In an event-based system, you can run spaCy 2 with AWS
Lambda :). This will be _much_ cheaper than keeping a server warm.

------
mark_l_watson
Really nice work. Is there a bitcoin or PayPal donation page for the spaCy
project?

~~~
syllogism
We actually don't believe in soliciting donations. Ines explains our thinking
here: [https://ines.io/blog/spacy-commercial-open-source-
nlp#moneti...](https://ines.io/blog/spacy-commercial-open-source-
nlp#monetising)

Basically: donations can only be made from personal funds, but most of the
benefits from the software will go to commercial users. That's a pretty
lopsided dynamic.

~~~
wyldfire
Aside: Ines appears to be working on something called Prodigy [1] which seems
close to what I imagined would be a "Killer App" after playing around with
SpaCy. I look forward to hearing more about it as it matures.

[1] [https://prodi.gy/](https://prodi.gy/)

------
alexcnwy
Great work - big fan! :)

------
sho_hn
Do you have any plans for Korean support?

~~~
syllogism
We're definitely interested in Korean support. I hope we can get some
contributions for this in the next few months.

My understanding is that there are actually some very good Python libraries
for Korean NLP? It's now much easier to provide annotations via another
library. This is how the Chinese and Japanese support is working at the
moment. We'll add "native" models for all of these languages, but for now you
might want to wrap some of these resources:
[https://github.com/datanada/Awesome-Korean-
NLP](https://github.com/datanada/Awesome-Korean-NLP)

~~~
est
See also:

[https://github.com/crownpku/Awesome-Chinese-
NLP](https://github.com/crownpku/Awesome-Chinese-NLP)

Very much looking forward Chinese support in SpaCy.

------
ajohnclark
Excellent update, great work.

