It seems like industry doesn’t really use any of the recent advancements that come fresh from research, including things like capsule networks or advances in Neural Architecture search?
Is there a gulf between the release of cutting edge research and its commercialization? Couldn’t many of these models be commercialized faster and be made available as enterprise products?
1. Most of the advances do not result in large enough gains to justify them being translated into industry. 99.9% of research papers propose techniques that result in small gains in the optimization metric (accuracy, ROC AUC, BLEU score, etc). However, this comes at the expense of added cost in complexity, more expensive training, model instability, challenges in code maintainability, and so on. For the vast majority of companies, unless you are Google AdWords or Google Translate, a tiny gain in metric X is not worth the costs mentioned above. You're much better off using proven off-the-shelf models that have stood the test of time, are fast to train and easy to maintain. Even if they are 1% worse.
2. Research tends to focus on model improvements and you are not allowed to touch your train/test data. That makes sense as otherwise competing approaches would not be comparable. However, in the real world you have the freedom of collecting more training data, cleaning your data, selecting more appropriate validation/test data, and so on. The vast majority of times, getting better/cleaner/more data beats getting a slightly better model. And it's much easier to implement. So for industry it often makes more sense to focus on that.
3. Metrics optimized in research papers rarely translate into real world business metrics, but many research ideas are overfit to those metrics and/or datasets. For example, translation papers optimize something called BLEU score, but in the real world the thing that matters is user satisfaction and "human evaluations", which cannot easily be optimized in research. Similarly, no business sells "ImageNet recognition accuracy". Research overfits to this metric on this dataset (because that's how papers are evaluated) but it's not obvious that a model doing better on this metric will also do better on some other metric or dataset, even if they are similar. In fact, even datasets that are known to contain errors are still used as-is, because they have always been used.