Hacker News new | past | comments | ask | show | jobs | submit login
Shrinking Machine Learning Models for Offline Use (amazon.com)
123 points by georgecarlyle76 on Aug 13, 2018 | hide | past | web | favorite | 10 comments

Can anyone with more knowledge in the area point me to some resources/surveys regarding state of the art techniques for compressing machine learning models? I'd be particularly interested to see experiments exploring what a plot of model size reduction vs. accuracy cost looks like for different techniques. For example is there usually a graceful degradation in terms of accuracy loss as you compress more, or is there often some kind of tipping point where accuracy plummets?

There are low-bit networks as well: https://arxiv.org/abs/1603.05279

That technology is now a spinoff of AI2 and UW: https://www.xnor.ai/

State of the art is basically 8 bit weights. Anything below that doesn't really work. You will see lots of benchmarks and figures saying that it does, but nearly all of those neglect the absolute accuracy, meaning, they compare deeply quantized models against the _shitty_ variants of full-precision or half-precision models which are not useful in practice. Another trick is to deeply quantize an overly redundant model that's no longer state of the art, and show a few percent degradation in accuracy on top of an already barely acceptable number.

IMO, we need to pay attention to absolute accuracy if any of this is to become actually practical. I.e. I don't care how fast or small your compressed network is if its top5 accuracy on ImageNet is below 80%, or some other such criterion. Now granted, this is not perfect, because such models might still be useful for a smaller number of classes, but then maybe come up with a separate metric for that, too. A pedestrian detector is not very useful if it misses or misplaces 30% of pedestrians.

This WWDC video shows the effects quantization at various levels: https://developer.apple.com/videos/play/wwdc2018/708/

A bit Apple specific, but the main ideas carry over to any ML model. There's also a part 2, which I haven't watched.

The most effective "shrinking" of ML models that I've seen (very limited experience, YMMV) is through "pruning". Searching for "arxiv pruning" is an excellent starting point, and a couple of those papers include metrics for accuracy vs size and the tradeoffs therein.

I came to the comments to say the same thing. Quantization and hashing tricks for embeddings are cool and all, but not really important for model compression.

Rather, training companion models to prune away whole subnetworks of weight and layer combinations can allow you to remove tens of thousands of parameters from the model entirely— not wasting space on their quantized weights when they end up not being a contributing pathway to predictions.

In my limited experience with CNNs/MLPs, it is more of a tipping point. There is a very small knee point in the tradeoff curve - below this point you get no accuracy, within the point some tradeoffs, and above it very little increase in accuracy for more compression.

Is there really a need to shrink models?

As far as I know, most machine learning models can be very compact, often well under 1 GB. Even high-res vision CNNs aren't anywhere close to being fully-connected. They might have millions of weights, but that's just in the MegaByte range.

My understanding in the the real problem is obfuscating the machine learning model. If they decide to put their model on the local memory, they'd be giving away their well-guarded trade secret. Also they'd be giving away their justification for collecting all user data.

Is anyone around here working of production ML software? Am I really wrong?

For a security product deployment of the model over the internet is a significant cost. Hundreds of thousands of machines in an enterprise may need to be updated at once. These machines may not have an easy way to accelerate the evaluation of the model, either, making feature reduction important for CPU work as well.


Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact