
Shrinking Machine Learning Models for Offline Use - georgecarlyle76
https://developer.amazon.com/blogs/alexa/post/09bacbdd-c089-4b02-863d-6761728102ed/shrinking-machine-learning-models-for-offline-use
======
anonymousDan
Can anyone with more knowledge in the area point me to some resources/surveys
regarding state of the art techniques for compressing machine learning models?
I'd be particularly interested to see experiments exploring what a plot of
model size reduction vs. accuracy cost looks like for different techniques.
For example is there usually a graceful degradation in terms of accuracy loss
as you compress more, or is there often some kind of tipping point where
accuracy plummets?

~~~
web007
The most effective "shrinking" of ML models that I've seen (very limited
experience, YMMV) is through "pruning". Searching for "arxiv pruning" is an
excellent starting point, and a couple of those papers include metrics for
accuracy vs size and the tradeoffs therein.

~~~
mlthoughts2018
I came to the comments to say the same thing. Quantization and hashing tricks
for embeddings are cool and all, but not really important for model
compression.

Rather, training companion models to prune away whole subnetworks of weight
and layer combinations can allow you to remove tens of thousands of parameters
from the model entirely— not wasting space on their quantized weights when
they end up not being a contributing pathway to predictions.

------
John_KZ
Is there really a need to shrink models?

As far as I know, most machine learning models can be very compact, often well
under 1 GB. Even high-res vision CNNs aren't anywhere close to being fully-
connected. They might have millions of weights, but that's just in the
MegaByte range.

My understanding in the the real problem is _obfuscating_ the machine learning
model. If they decide to put their model on the local memory, they'd be giving
away their well-guarded trade secret. Also they'd be giving away their
justification for collecting all user data.

Is anyone around here working of production ML software? Am I really wrong?

~~~
chowyuncat
For a security product deployment of the model over the internet is a
significant cost. Hundreds of thousands of machines in an enterprise may need
to be updated at once. These machines may not have an easy way to accelerate
the evaluation of the model, either, making feature reduction important for
CPU work as well.

------
gliboc
Cpdddkm

