That technology is now a spinoff of AI2 and UW: https://www.xnor.ai/
IMO, we need to pay attention to absolute accuracy if any of this is to become actually practical. I.e. I don't care how fast or small your compressed network is if its top5 accuracy on ImageNet is below 80%, or some other such criterion. Now granted, this is not perfect, because such models might still be useful for a smaller number of classes, but then maybe come up with a separate metric for that, too. A pedestrian detector is not very useful if it misses or misplaces 30% of pedestrians.
A bit Apple specific, but the main ideas carry over to any ML model. There's also a part 2, which I haven't watched.
Rather, training companion models to prune away whole subnetworks of weight and layer combinations can allow you to remove tens of thousands of parameters from the model entirely— not wasting space on their quantized weights when they end up not being a contributing pathway to predictions.
As far as I know, most machine learning models can be very compact, often well under 1 GB. Even high-res vision CNNs aren't anywhere close to being fully-connected. They might have millions of weights, but that's just in the MegaByte range.
My understanding in the the real problem is obfuscating the machine learning model. If they decide to put their model on the local memory, they'd be giving away their well-guarded trade secret. Also they'd be giving away their justification for collecting all user data.
Is anyone around here working of production ML software? Am I really wrong?