Is there a gulf between the release of cutting edge research and its commercialization? Couldn’t many of these models be commercialized faster and be made available as enterprise products?
1. Most of the advances do not result in large enough gains to justify them being translated into industry. 99.9% of research papers propose techniques that result in small gains in the optimization metric (accuracy, ROC AUC, BLEU score, etc). However, this comes at the expense of added cost in complexity, more expensive training, model instability, challenges in code maintainability, and so on. For the vast majority of companies, unless you are Google AdWords or Google Translate, a tiny gain in metric X is not worth the costs mentioned above. You're much better off using proven off-the-shelf models that have stood the test of time, are fast to train and easy to maintain. Even if they are 1% worse.
2. Research tends to focus on model improvements and you are not allowed to touch your train/test data. That makes sense as otherwise competing approaches would not be comparable. However, in the real world you have the freedom of collecting more training data, cleaning your data, selecting more appropriate validation/test data, and so on. The vast majority of times, getting better/cleaner/more data beats getting a slightly better model. And it's much easier to implement. So for industry it often makes more sense to focus on that.
3. Metrics optimized in research papers rarely translate into real world business metrics, but many research ideas are overfit to those metrics and/or datasets. For example, translation papers optimize something called BLEU score, but in the real world the thing that matters is user satisfaction and "human evaluations", which cannot easily be optimized in research. Similarly, no business sells "ImageNet recognition accuracy". Research overfits to this metric on this dataset (because that's how papers are evaluated) but it's not obvious that a model doing better on this metric will also do better on some other metric or dataset, even if they are similar. In fact, even datasets that are known to contain errors are still used as-is, because they have always been used.
If you were starting a PhD in CS/ML right now, and you wanted to be as useful as possible primarily to industry (while still being impactful academically), would you focus on the theoretical aspects on those weaknesses you mentioned? (e.g. model maintainability, complexity, etc)
Finding problems that will be of interest to the academy wasn't easy.
Taking algorithms that will be based on them and turning them into a product will be hard.
We explained why textbook machine learning will fail on these problem, what make it interesting academically.
We also provide data for academic use, what enables academic research.
So far response is positive, so I hope that we found a way for academy-industry cooperation.
Say your model only increases performance by 1%. It’s unproven within the business and against the test of time. Not to mention someone generally needs to know how it works and maintain it. Someone needs to be responsible for the change and be able to explain why the new model is better (and will continue to stay better).
And generally businesses buy solutions not releases of models. There’s a lot of additional work that goes into commercialising a model as a product than the actual model itself. I know because I’ve tried to do it before.
We are working on developing software that will let us be more agile with ML and rapidly release new models to compete with existing models, which will help us learn about the effectiveness of new modelling techniques and help us build business cases.
Enterprises should perhaps have a team / group that tries to fill the gap between research and adoptions.