
The State of Transfer Learning in NLP - amrrs
http://ruder.io/state-of-transfer-learning-in-nlp/
======
v4dok
What I have seen is that one of the main issues in NLP is actually finding
annotated text. Especially for some more specialized tasks annotating can be a
very costly process. It is not easy to annotate an e.g an argument versus
labelling a Bus or a person. I believe that unsupervised pre-training can help
a lot with this issue. It is just not feasible to find such big labelled
corpora to train a model from the beginning for all the NLP tasks.

Another issue that I've seen is that there is a huge inertia in academia in
fields where (deep)NLP is really needed (e.g law or education) most of the
academics in these fields just cannot follow the developments and a lot of the
quantitative folks who can, they seem stuck in the SVM + heckton of features
approach. I am glad that the trend is towards tools like BERT that can act as
the initial go-to for NLP development even if its not exactly Transfer
Learning as done by CV people

~~~
PaulHoule
Creating annotations is the main issue. It is what will separate those who
commercialize NLP vs those who talk about it.

As a practioner I don't think BERT and friends are that great. They get you
from 88% accuracy in some of those tasks to 93% accuracy. They will get better
over time but they are approaching an asymptote -- that article points out
several major limitations of the approach.

Thus "simple classifier" vs "heckton of features" can be effective for that
work, particularly because you don't have to train and retrain complex models
and use the effort to build up training sets and do feature engineering.

~~~
v4dok
Maybe I don't understand correctly but my point is that BERT and friends just
let you skip a lot of feature engineering. Which(the F.Eng) is actually pretty
useless once you shift the corpus a bit. That is for me the biggest
contribution, I don't care about a couple of percentage points of better
results in the end but I do care that I don't need to spend all my time
devising new features, which almost always require an expert on that
particular topic.

~~~
rpedela
Yeah I agree. Deep learning mostly eliminates feature engineering which is a
huge win. However BERT and other transformer models take too damn long to
train. I still prefer using FastText for transfer learning because it can be
trained on large corpora quickly. Combine that with supervised deep learning,
and you often get good enough accuracy in practice.

------
s_Hogg
As an NLP person who had to work on some object detection stuff recently for
an unrelated matter, it's amazing what a difference transfer learning makes.
Whole task becomes brainless.

You could argue that BERT was a first go at it, but until transfer learning
doesn't equate to "throw compute at it because we're Google/OpenAI", we're
nowhere near to having solved this.

~~~
The_rationalist
I would like to know: Yes, it take really a lot of compute and time to train
e.g BERT. (probably 64 V100) and 3 days of training. BUT once it is trained,
and you want to use it on your application. Inference time usually take a few
ms, is it far longer with bert? And does a _modern_ smartphone can easily run
such inference? I've heard about mobilenets but they sacrifice too much
accuracy, so I really hope BERT can be run today on a 7nm snapdragon + it's
mini TPU. I can't find such data on the web but this is an elementary
question, necessary for complete success of Nlp.

 _throw compute at it because we 're Google/OpenAI_ Sorry but for training
time, neural network deep learning is far from a "smart" paradigm. It is
essentially statistical brute force + a few clever math tricks. This is a part
of the answer on how to create an artificial general intelligence. But where's
is the research for creating a causal reasoning system understunding natural
language? It mostly died in the AI winter, and except a few hipsters like me
or opencog or cyc, is dead. I wonder how many decades will be needed for firms
like Google to realize such an obvious thing (that real intelligence is
statistical AND causal).

------
mark_l_watson
Great write up. Sebastian has a deep learning NLP email newsletter that I
highly recommend. I read two articles this morning that were mentioned in the
newsletter that he sent out last night. Good stuff.

