
Deep Learning for NLP Best Practices - tim_sw
http://ruder.io/deep-learning-nlp-best-practices/
======
peteratt
Are there any libraries out there that implement some/most of these best
practices and approaches to NLP? From what I've seen, the existing ones
(Stanford NLP, OpenNLP) are getting somewhat dated. Many non-PhD people
(including me) would find such a library incredibly useful.

~~~
syllogism
(I'm the lead author of spaCy)

spaCy 2 is reconfigured for deep learning. It's still in alpha, but there's
already a lot there:

[http://alpha.spacy.io/docs/](http://alpha.spacy.io/docs/)

I wrote a whole neural network library to get this done, because Tensorflow
and Theano are terrible for the type of models NLP needs. Joke was sort of on
me, because PyTorch came out just as I was finishing :). But it's actually
very good to own the dependencies of the library anyway, since it's such an
important part of things. Having our own NN library has made it easy to make
lots of small innovations along the way. The best one is hash-kernel powered
embeddings, which have just been published as "bloom embeddings". I've been
using these for the last six months, with great results.

You can read my thoughts about NLP best-practices here:

[https://explosion.ai/blog/deep-learning-formula-
nlp](https://explosion.ai/blog/deep-learning-formula-nlp)

Like Sebastian (and pretty much anyone else), I think the two improvements to
emphasise in NLP are sequence models like LSTM, and transfer learning. That's
what's better now with deep learning: what we used to call semi-supervised
approaches and domain adaptation now work much better than they did before.
Incidentally Ruder et al. (2017)'s sluice networks are an important recent
paper on this:
[https://arxiv.org/abs/1705.08142](https://arxiv.org/abs/1705.08142)

Going forward I think it's important that we get past just using word2vec to
pre-train vectors, and start making it easier to use pre-trained LSTM and CNN
models. Side-objectives in multi-task learning are also very important.

I don't think the APIs in spaCy around these things are quite right yet. There
are also lots of trade-offs in sharing weights, that make things complicated
for people. Sometimes weight-sharing gets in the way, because you want to just
train this one part, and it's really weird that your updates are affecting
other models you don't think you're touching.

For more idea of where Ines and I are going with all of this, you can read
this: [https://explosion.ai/blog/supervised-learning-data-
collectio...](https://explosion.ai/blog/supervised-learning-data-collection)

Basically I think the main problem people are having with NLP is that they
don't want to commit to a problem and create training and evaluation data for
it. Teams that don't bite the bullet and commit to their problem thrash around
and don't get anything done. Even if you're using unsupervised techniques, you
need repeatable evaluations.

We're preparing to launch an evaluation tool to address this problem. You can
subscribe to our mailing list, RSS or Twitter to get the announcement when the
beta is ready:
[https://twitter.com/explosion_ai](https://twitter.com/explosion_ai)

~~~
arrmn
I've used spaCy for a customer project without having any NLP or deep learning
experience and it was really easy to use and it's fast enough for our use
case. So thanks for providing such an awesome library.

------
orthoganol
This feels a little hit or miss, for example:

> One way to decrease the risk of vanishing gradients is to clip their maximum
> value

But probably helpful for a general picture if you're new to this stuff.

------
avarun
Looks cool, but how do I know that _this_ list of best practices is the
definitive list of best practices, given how many lists claim that title?

~~~
cleansy
"Best practices" are pretty much always according to the author. From the
article:

> Disclaimer: Treating something as best practice is notoriously difficult:
> Best according to what? What if there are better alternatives? This post is
> based on my (necessarily incomplete) understanding and experience. In the
> following, I will only discuss practices that have been reported to be
> beneficial independently by at least two different groups. I will try to
> give at least two references for each best practice.

