Another issue that I've seen is that there is a huge inertia in academia in fields where (deep)NLP is really needed (e.g law or education) most of the academics in these fields just cannot follow the developments and a lot of the quantitative folks who can, they seem stuck in the SVM + heckton of features approach. I am glad that the trend is towards tools like BERT that can act as the initial go-to for NLP development even if its not exactly Transfer Learning as done by CV people
That, and it's difficult to sell results to people who don't understand your methods, which by definition is practically everybody when making the first new applications in a field.
I'm a social scientist who can code, and even stuff like SVMs can be a difficult sale outside of the community of methodologists.
You're also spot on about annotated training data. The framing example above is my current problem, and there is one (1) existing annotated dataset, annotated with a version of frames which is useless for my substantive interest. Imagenet this is not.
As a practioner I don't think BERT and friends are that great. They get you from 88% accuracy in some of those tasks to 93% accuracy. They will get better over time but they are approaching an asymptote -- that article points out several major limitations of the approach.
Thus "simple classifier" vs "heckton of features" can be effective for that work, particularly because you don't have to train and retrain complex models and use the effort to build up training sets and do feature engineering.
You could argue that BERT was a first go at it, but until transfer learning doesn't equate to "throw compute at it because we're Google/OpenAI", we're nowhere near to having solved this.
throw compute at it because we're Google/OpenAI
Sorry but for training time, neural network deep learning is far from a "smart" paradigm.
It is essentially statistical brute force + a few clever math tricks.
This is a part of the answer on how to create an artificial general intelligence.
But where's is the research for creating a causal reasoning system understunding natural language?
It mostly died in the AI winter, and except a few hipsters like me or opencog or cyc, is dead.
I wonder how many decades will be needed for firms like Google to realize such an obvious thing (that real intelligence is statistical AND causal).
Thus, we've seen real value out of transfer learning that doesn't require overly much compute power (and actually could even be run on free colab instances, I think).
That said, I agree that the problem is still very far from being "solved". In particular, I have a fear that most recent advances might be tracked back to gigantic models memorizing things (instead of doing something that could at least vagely be seen as sort of understanding text) to slightly improve GLUE scores.
Still, I am highly optimistic about transfer learning for NLP in general
most people also actually distill BERT to reduce the computation cost