So I assume it will be similar to this ? It almost sounds like they would keep the interface to this but redesign how it uses the specialty chips. https://www.tensorflow.org/mobile/
It'd be interesting to compare the power drain of running the model vs transmitting the data to a server, waiting for the response, and then acting on it. Even if it's still worse, it might be worth the responsiveness and reduction of data usage.
Not to mention that training can be done somewhat "out of band". So you can have your phone train on data while charging overnight to get better predictions the next day.
If you found a way to have efficient distributed training you could just have each device do a few training runs and the user probably wouldn't even notice.
they show an example of ML when selecting a text in android and how it predicts how much text should it select depending on what it is (a full address, an email etc).. don't know how they made it but it was almost instant, I'm sure the users wont care.
Simple models should run performantly, we're all thinking of CNNs and RNNs, but a simple Logistic Regression will probably the most ubiquitous application.
https://developer.qualcomm.com/software/snapdragon-neural-pr...