What I have seen is that one of the main issues in NLP is actually finding annotated text. Especially for some more specialized tasks annotating can be a very costly process. It is not easy to annotate an e.g an argument versus labelling a Bus or a person. I believe that unsupervised pre-training can help a lot with this issue. It is just not feasible to find such big labelled corpora to train a model from the beginning for all the NLP tasks.
Another issue that I've seen is that there is a huge inertia in academia in fields where (deep)NLP is really needed (e.g law or education) most of the academics in these fields just cannot follow the developments and a lot of the quantitative folks who can, they seem stuck in the SVM + heckton of features approach. I am glad that the trend is towards tools like BERT that can act as the initial go-to for NLP development even if its not exactly Transfer Learning as done by CV people
Academic incentives are awkward. Computer scientists and other quants find it difficult to build careers in this stuff as it's often either too derivative to be cutting-edge ML ('you applied last year's tools to an education problem we don't care about, yay?') or too reductive to be useful in the target discipline ('you've got an automated system to tell us how newspaper articles are framed? Lovely, but we've been studying this with grad students reading them for 30 years and already have a much more nuanced view than your experimental system can give us').
That, and it's difficult to sell results to people who don't understand your methods, which by definition is practically everybody when making the first new applications in a field.
I'm a social scientist who can code, and even stuff like SVMs can be a difficult sale outside of the community of methodologists.
You're also spot on about annotated training data. The framing example above is my current problem, and there is one (1) existing annotated dataset, annotated with a version of frames which is useless for my substantive interest. Imagenet this is not.
Creating annotations is the main issue. It is what will separate those who commercialize NLP vs those who talk about it.
As a practioner I don't think BERT and friends are that great. They get you from 88% accuracy in some of those tasks to 93% accuracy. They will get better over time but they are approaching an asymptote -- that article points out several major limitations of the approach.
Thus "simple classifier" vs "heckton of features" can be effective for that work, particularly because you don't have to train and retrain complex models and use the effort to build up training sets and do feature engineering.
Maybe I don't understand correctly but my point is that BERT and friends just let you skip a lot of feature engineering. Which(the F.Eng) is actually pretty useless once you shift the corpus a bit. That is for me the biggest contribution, I don't care about a couple of percentage points of better results in the end but I do care that I don't need to spend all my time devising new features, which almost always require an expert on that particular topic.
Yeah I agree. Deep learning mostly eliminates feature engineering which is a huge win. However BERT and other transformer models take too damn long to train. I still prefer using FastText for transfer learning because it can be trained on large corpora quickly. Combine that with supervised deep learning, and you often get good enough accuracy in practice.
Sure, but the feature engineer could escape the asymptope that BERT is converging towards and get better accuracy which could make the product*market fit work.
Another issue that I've seen is that there is a huge inertia in academia in fields where (deep)NLP is really needed (e.g law or education) most of the academics in these fields just cannot follow the developments and a lot of the quantitative folks who can, they seem stuck in the SVM + heckton of features approach. I am glad that the trend is towards tools like BERT that can act as the initial go-to for NLP development even if its not exactly Transfer Learning as done by CV people