
Advanced NLP with SpaCy - jonbaer
https://course.spacy.io/
======
ZeroCool2u
I can't recommend SpaCy enough. We use their Prodigy[1] app here at work, it's
outstanding.

After experimenting with gensim, nltk, and most everything else under the sun,
we primarily rely on SpaCy now with some TensorFlow for specific models.

1\. [https://prodi.gy/](https://prodi.gy/)

~~~
syllogism
Glad to hear it! Actually we noticed the order come through for your group and
felt it was a shame we didn't have a proper contact email. We're especially
pleased to support public institutions like the FRB. I also did a little bit
of consulting for the Bank of Canada. Could you email me? matt@explosion.ai

~~~
ZeroCool2u
Haha, hi Matt! Sure, just messaged you.

------
growlist
Mordecai uses SpaCy and is worth a look for extracting place names:
[https://github.com/openeventdata/mordecai](https://github.com/openeventdata/mordecai)

I wasn't too successful running it against tweets (low hit rate/false
positives, low spatial resolution) but geolocating tweets is a hard problem
and I'm sure it would work better against more structured text.

~~~
rpedela
Are you using the pre-trained NER models or your own? If the former, I
wouldn't expect it to work well on tweets since it wasn't trained on them.

~~~
amrrs
Do you have any recommendations for building a custom language model for
business-specific NER?

~~~
rpedela
I recommend Prodigy to label your examples and train a Spacy model. Prodigy is
the best tool I have ever used for NLP labeling. Most likely starting with a
blank model will work better, but you can try starting with one of Spacy's
pre-trained models.

[https://prodi.gy/](https://prodi.gy/)

~~~
timkpaine
I also recommend looking at ipyannotate if most of your workflow is in jupyter

------
mwexler
MIT License for spaCy, though you'd never know it from the home page of the
site. It is really frustrating to have such a nice tool with such a beautiful
site, but have the site have no mention of the word "License" on it. Yes, it's
in the code, but if one goes to the trouble to make such a great site, well...

~~~
ausjke
sorry but why MIT license is bad here?

~~~
cwyers
I don't think parent is complaining about the MIT license, they are
complaining about how hard it is to find out what license is being used.

------
acconrad
I'm using NLP with Spacy for my slack bot and it's awesome! Glad to see
they're offering a more advanced course to use their library.

~~~
avmich
Does it handle dialogs?

------
dfischer
Curious how people are using SpaCy in production with constant improvement of
the model and how that workflow happens from user input & review.

------
topicseed
spaCy's updated Pattern Matcher is pretty amazing and we use it extensively in
our textual content analyses to help SEOs.

------
wodenokoto
How good is spaCy for exploratory analysis of a corpus?

I'm thinking, questions like, top adjectives applied to men and women, or top
word usage across document classes.

The best I've come across for doing this easily is in R, where they have
tidytext[1] which is nice, and very straight forward to understand and work
with. However, the data model stores each token and all its meta data (page
number, sentence number, document id / title, document class, word type, etc,
etc) in its own row, causing the in memory size of whatever corpus you are
working on to explode.

[1] [https://www.tidytextmining.com](https://www.tidytextmining.com)

------
evrydayhustling
spaCy in production for some grammar-based tokenization (frame.ai)! And their
model for span annotations is awesome for our research pipeline; very
convenient to be able to traverse pipeline objects in both directions to
explore results.

------
anentropic
This is very nicely presented. Kudos!

------
syllogism
GitHub repo: [https://github.com/ines/spacy-
course](https://github.com/ines/spacy-course)

Might also be interesting to others who have DataCamp courses they'd like to
release free.

~~~
kyllo
> So should I not take your DataCamp course anymore? Probably not, no.

Here's some context for anyone who's wondering why:
[https://noamross.github.io/datacamp-sexual-
assault/](https://noamross.github.io/datacamp-sexual-assault/)

~~~
jpdus
Unpopular opinion here:

I followed the outrage about Datacamp loosely on Twitter. At first I thought
it was about an actual assault, but so far afaik the only thing publicly known
is "uninvited contact", on a dancefloor and the victim reported that several
months later.

Sexual assualt or abuse is a crime and should be punished and prosecuted.
However, if we treat every situation where two people interact physically and
have a different understanding of their relationship as sexual assault, there
won't be any sexuality without a written consent form in the future anymore
and real victims get marginalized at the same time. I don't know if that is
really what we should be aiming for.

I don't see Datacamps reaction as inappropriate in this case,at least if there
is nothing more to the story behind the curtain. Humans do errors and in most
cases everybody should get a second chance. One moment of misjudgment can be
enough today to get fired, you name gets burned forever in the Internet and
your life can get turned upside down. And I don't know if that helps the
victim in any way.

Probably I will be downvoted for that, but I want to make clear again that I
don't tolerate this behaviour in any way and I think the industry needs to be
more inclusive and diverse. I just think that you have to differentiate and
that public shaming is not on all cases a good solution.

~~~
madenine
I don’t think firing two employees for “poor performance” after they voiced
concerns internally as an ‘appropriate’ reaction.

I don’t see responding to a letter from ~100 of your content producers with a
legalese blog post purposely hidden from search engines as ‘appropriate’

People do shitty things; that doesn’t have to reflect on the organization that
employs them unless the org decides to do shitty things as well.

