I think when you shift into pure research, yes a deep probability, information theory, linear algebra, and calculus background are needed. But at the level, you're rarely writing code and more likely working at theoretical level.
1. Most of your time is spent transforming data. Very little is spent building models.
2. Most of the eye-grabbing stuff that makes headlines is inapplicable. My application involves decisions that are expensive and can be safety critical. The models themselves have to be simple enough to be reasoned about, or they're no use.
You might argue that this means what I'm actually doing is statistics.
It's also one critique I have to the world of academia. When learning ML in academia, 9 of 10 times you work with clean and neat toy datasets.
Then you go out in the "real world" and instantly get hit with reality: You're gonna spend 80% of the time fixing data.
With that said, I think that 10 year from now, ML is going to be almost exclusively SaaS with very high levels of abstraction, with very little coding for the average user. Maybe some light scripting here and there, but I mostly just drag'n drop stuff.
whats the difference?
Also, most folks I know that are making practical deep learning contributions are doing so by combining their pre-existing domain expertise with their new deep learning skills. E.g. a journalist analyzing a large corpus of text for a story, or an oil&gas analyst building models from well plots, etc.
Also I really appreciated that on of the training goals for ULMfit was to be trainable on a single gpu. With these large-capacity models, training is getting crazy expensive and out of hand. Any chance that your future work will still keep the single gpu training goal?