
Natural Language Basics with TextBlob - sloria
http://rwet.decontextualize.com/book/textblob/
======
languagehacker
TextBlob is just an easy-to-use wrapper for a number of more involved
libraries, including NLTK and Pattern.

As with most things like it, if you're looking to shift off extremely
unsophisticated NLP work to a junior developer, this is a good thing.

If you're an engineer focused in the NLP space, using this API would be like
tying your hand behind your back. It introduces its own performance problems,
and obscures a number of configurations that the APIs of the libraries it
wraps expose. I also find its attitude towards object-orientation tends to
obscure performance bottlenecks by hiding how much just-in-time computation
occurs for a given string.

Also, I hate to admit this, but the Java/Scala NLP stack is beating out most
Python NLP libraries these days. NLTK _just_ got Stanford CoreNLP's best-in-
class dependency parser. It's been available in Java for years.

~~~
syllogism
If you're doing NLP in Python, there's no reason to use CoreNLP's parser from
the NLTK wrapper. Communicating with the Java process over the file system or
a socket introduces a tonne of unnecessary complications, slow-downs, invites
encoding problems, etc.

spaCy's native Cython dependency parser is both faster and more accurate than
CoreNLP.

The NP chunks example from the post:

    
    
        >>> from spacy.en import English
        >>> nlp = English()
        >>> doc = nlp(u'ITP is a two-year graduate program located in the Tisch School of the Arts. Perhaps the best way to describe us is as a Center for the Recently Possible.')
        >>> for np in doc.noun_chunks:
        ...   print(np.text)
        ... 
        ITP
        a two-year graduate program
        the Tisch School
        the Arts
        the best way
        us
        a Center

~~~
mark_l_watson
spaCy looks like a great product, but it is expensive.

edit: sorry, I just noticed that it is available for free under the AGPL 3
license.

~~~
elyase
Now 100% free under the MIT license. Things change by the hour in spacy world,
:-).

~~~
mark_l_watson
It looks like a lot of good work went into spacy - I hope that you are
sucessful monetizing it with the MIT license.

------
imh
I think NLP is really cool, but it seems to be moving so quickly. If I wanted
to get a decent overview, are there some review papers or textbooks with good
coverage that aren't too out of date?

~~~
nl
Just learn Deep Learning instead.

I'm a NLP person, and I think the Wit.ai people said it best:

 _Many papers were kind of “the state of the art for X was Y. We replaced the
hand-crafted, manually hacked, heavily engineered Z by a RNN. It improved
state of the art by 5 points.” The poor guys who presented deep learning-free
papers invariably got the question: “did you also try with a [insert deep net
technique here]?”_ [1]

The only downside with this is that traditional NLP tools are still probably
easier to use, and you'll usually need to understand vocabulary to be able to
talk to other people about your problems.

[1]
[https://wit.ai/blog/2015/09/23/emnlp](https://wit.ai/blog/2015/09/23/emnlp)

~~~
imh
Wouldn't this require larger datasets? That isn't always an option. I'm
imagining that a smaller, more computationally efficient network could learn
nearly as well with fewer data points given these heavily engineered features.
Is that off base?

~~~
nl
Basically, no. See [http://karpathy.github.io/2015/05/21/rnn-
effectiveness/](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

He gets pretty amazing results with a corpus size around 10M.

~~~
imh
But that takes ages to train!

~~~
nl
So something like Jason Weston's state-of-the-art attention-NN based sentence
summarizer took ~4 days to train.

You'd easily spend that time doing manual feature engineering just to build a
baseline system.

------
alextk
Absolutely love TextBlob API. Much easier to get started, than NLTK. We
currently use similar design in our nlp toolkit for Estonian:
[http://estnltk.github.io/estnltk/1.3/index.html](http://estnltk.github.io/estnltk/1.3/index.html)

------
ma2rten
Also have a look at [http://spacy.io/](http://spacy.io/)

It's better than textblob / nltk in many ways.

~~~
fizzbatter
Can you _(or someone)_ comment on some of the important differences, in your
mind? I'm quite new to NLP, so a knowledgable comparison of the two would be
appreciated :)

~~~
elyase
Essentially spacy is better and faster at almost everything it supports but is
not free for commercial use. I consider spacy a blessing for the Python and
the NLP community in general. They have a great comparison with existing
libraries at [http://spacy.io/](http://spacy.io/).

EDIT: From today on spacy is free for commercial use! (MIT license).

------
fizzbatter
Can't wait for a "batteries included" NLP solution to come to Golang

------
huckyaus
"There is no such thing as a sentence, or a phrase, or a part of speech, or
even a "word" — these are all pareidolic fantasies occasioned by glints of
sunlight we see on reflected on the surface of the ocean of language;
fantasies that we comfort ourselves with when faced with language’s infinite
and unknowable variability."

This is a particularly beautiful articulation of the complexity of English
(and language in general).

------
lewisl9029
Can anyone recommend some JS (or compiles-to-JS) NLP libraries that can be
used for strictly client-side NLP in a privacy conscious web app?

------
BinaryIdiot
I have't used TextBlob but I found this interesting regardless (I'm very
interested in NLP). Thanks for the read! Slightly off topic but I don't see
very many NLP libraries for C++ as I do Python; are there any notable ones?

~~~
amirouche
There is [http://www.abisource.com/projects/link-
grammar/](http://www.abisource.com/projects/link-grammar/) and
[https://code.google.com/p/mate-tools/](https://code.google.com/p/mate-
tools/). A lot of tools are written in Java.

~~~
BinaryIdiot
Cool, thanks! I have more reading to do :)

