Creating annotations is the main issue. It is what will separate those who commercialize NLP vs those who talk about it.
As a practioner I don't think BERT and friends are that great. They get you from 88% accuracy in some of those tasks to 93% accuracy. They will get better over time but they are approaching an asymptote -- that article points out several major limitations of the approach.
Thus "simple classifier" vs "heckton of features" can be effective for that work, particularly because you don't have to train and retrain complex models and use the effort to build up training sets and do feature engineering.
Maybe I don't understand correctly but my point is that BERT and friends just let you skip a lot of feature engineering. Which(the F.Eng) is actually pretty useless once you shift the corpus a bit. That is for me the biggest contribution, I don't care about a couple of percentage points of better results in the end but I do care that I don't need to spend all my time devising new features, which almost always require an expert on that particular topic.
Yeah I agree. Deep learning mostly eliminates feature engineering which is a huge win. However BERT and other transformer models take too damn long to train. I still prefer using FastText for transfer learning because it can be trained on large corpora quickly. Combine that with supervised deep learning, and you often get good enough accuracy in practice.
Sure, but the feature engineer could escape the asymptope that BERT is converging towards and get better accuracy which could make the product*market fit work.
As a practioner I don't think BERT and friends are that great. They get you from 88% accuracy in some of those tasks to 93% accuracy. They will get better over time but they are approaching an asymptote -- that article points out several major limitations of the approach.
Thus "simple classifier" vs "heckton of features" can be effective for that work, particularly because you don't have to train and retrain complex models and use the effort to build up training sets and do feature engineering.