Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Does ML research ever get translated to industry?
26 points by hsikka on Jan 6, 2019 | hide | past | web | favorite | 14 comments
It seems like industry doesn’t really use any of the recent advancements that come fresh from research, including things like capsule networks or advances in Neural Architecture search?

Is there a gulf between the release of cutting edge research and its commercialization? Couldn’t many of these models be commercialized faster and be made available as enterprise products?

Some reasons that come to mind:

1. Most of the advances do not result in large enough gains to justify them being translated into industry. 99.9% of research papers propose techniques that result in small gains in the optimization metric (accuracy, ROC AUC, BLEU score, etc). However, this comes at the expense of added cost in complexity, more expensive training, model instability, challenges in code maintainability, and so on. For the vast majority of companies, unless you are Google AdWords or Google Translate, a tiny gain in metric X is not worth the costs mentioned above. You're much better off using proven off-the-shelf models that have stood the test of time, are fast to train and easy to maintain. Even if they are 1% worse.

2. Research tends to focus on model improvements and you are not allowed to touch your train/test data. That makes sense as otherwise competing approaches would not be comparable. However, in the real world you have the freedom of collecting more training data, cleaning your data, selecting more appropriate validation/test data, and so on. The vast majority of times, getting better/cleaner/more data beats getting a slightly better model. And it's much easier to implement. So for industry it often makes more sense to focus on that.

3. Metrics optimized in research papers rarely translate into real world business metrics, but many research ideas are overfit to those metrics and/or datasets. For example, translation papers optimize something called BLEU score, but in the real world the thing that matters is user satisfaction and "human evaluations", which cannot easily be optimized in research. Similarly, no business sells "ImageNet recognition accuracy". Research overfits to this metric on this dataset (because that's how papers are evaluated) but it's not obvious that a model doing better on this metric will also do better on some other metric or dataset, even if they are similar. In fact, even datasets that are known to contain errors are still used as-is, because they have always been used.

Interesting - I replied to your twitter post about this but I’ll go ahead and say it here as well:

If you were starting a PhD in CS/ML right now, and you wanted to be as useful as possible primarily to industry (while still being impactful academically), would you focus on the theoretical aspects on those weaknesses you mentioned? (e.g. model maintainability, complexity, etc)

I’d also love to hear the answer to this :)

I think that an important obstacle is that the industry and academy have different goals and therefore different motivations. We (researchers from Palo Alto Networks and Shodan), recently published a paper on machine learning challenges in cyber security, aiming to the academy. https://arxiv.org/abs/1812.07858

Finding problems that will be of interest to the academy wasn't easy. Taking algorithms that will be based on them and turning them into a product will be hard.

We explained why textbook machine learning will fail on these problem, what make it interesting academically. We also provide data for academic use, what enables academic research.

So far response is positive, so I hope that we found a way for academy-industry cooperation.

When we talk about commercialisation, it almost always has to do with business performance improvement. When you’re thinking about making a change to improve the performance of business, then you have to consider the benefits/risk ratio of your change (in this case, swapping for a better model).

Say your model only increases performance by 1%. It’s unproven within the business and against the test of time. Not to mention someone generally needs to know how it works and maintain it. Someone needs to be responsible for the change and be able to explain why the new model is better (and will continue to stay better).

And generally businesses buy solutions not releases of models. There’s a lot of additional work that goes into commercialising a model as a product than the actual model itself. I know because I’ve tried to do it before.

We are working on developing software that will let us be more agile with ML and rapidly release new models to compete with existing models, which will help us learn about the effectiveness of new modelling techniques and help us build business cases.

Capsule networks don’t work very well. Why would anyone want to use them in industry?

and Neural Architecture search is (currently) too computationally expensive for non-trivial problems

Large enterprises need to have a faster research adoption cycle too. They need to do their bit. It’s not all researchers fault. At least in my org (a very large conventional co ) our DL adoption is hardly beyond transfer learning, using CNNs as feature detectors. We lament lack of labeled data and there is tangible research in few-shot learning and semi-supervised learning to at least get us started. But we still don’t do it. I do understand that it sometimes hard to conceptualize a product around some advanced in ML, eg GANs or RL, but there are way too many others that we can adopt without killing ourselves.

Enterprises should perhaps have a team / group that tries to fill the gap between research and adoptions.

I beg to differ. Some ideas related to training methods of deep neural networks e.g binarized neural networks, asynchronous SGD are used in datacenter infrastructure see https://research.fb.com/wp-content/uploads/2017/12/hpca-2018.... Some of the companies that I built in the past in Montreal uses SOTA research for sound event detection, a dropout method that friends and I co-authored were used in production at an NLP company (that I cannot disclose).

Have a look at http://cKnowledge.org/rpi-crowd-tuning and https://portalparts.acm.org/3230000/3229762/fm/frontmatter.p... - they also discuss a growing gap between academia and industry, and suggest some solutions.

Because researchers have little incentives to commercialize their research, especially in academia. They take pride of this insulation from the 'real world'. That's the root cause why research metrics are different from industry metrics.

As someone who works in applied ML but tries to stay heavily on top of research, I think it’s this. I am always stunned when I try to recruit people, talk with folks at conferences, or even look for jobs myself how few people care about commercialization. I’ve seen this attitude grow a lot in corporate R&D in recent years too as they pull in more academics with promises of independence from the “business grind”.

I'm not sure that it is just lack of interest in commercialisation, though that does exist. I think that the academy and industry work in different scale. In the industry low risk short term project have high value. The academy aims for novelty which usually means high risk and longer terms. Companies that aim to high risk longer term are usually either start up that that project is their core or large companies that can afford failing 9 out of 10 projects and have the benefit form the others to return the investment in 5 years.

What are you working on?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact