
Machine learning on mobile: on the device or in the cloud? - putnam
http://machinethink.net/blog/machine-learning-device-or-cloud/
======
deepnotderp
If you're interested in fast inference models, be sure to check out QuickNet:
[https://arxiv.org/abs/1701.02291](https://arxiv.org/abs/1701.02291)

Disclaimer: I'm the author on this paper.

It shares many similarities with Google's production MobileNets albeit
MobileNets is a family and QuickNet uses PReLU which has helped a lot in terms
of parametric efficiency. Also there's a slight difference in the
implementation of the separable convolution (no activation function in between
the depthwise and the pointwise for QuickNet).

~~~
albertzeyer
Have you published that paper anywhere on a conference or so? I guess not
because it's not really in the common format. But from a quick look, it
actually looks interesting, so why not try to submit it somewhere, e.g. some
image conference? But you really should work on the format. I assume that this
was not done with LaTeX? Then, in the table in your experimental results, what
you are also comparing is number of parameters, and maybe train time, and
inference time, and maybe memory consumption during inference, so that should
all be also in the table. Also, there should be one or two figures which give
small sketches of your models blocks, which helps to better understand the
difference to e.g. XCeption. Also, as I understand, QuickNet is mostly based
on XCeption? But XCeption is not in the table of your experimental
comparisons. Also you mention MobileNets which is also not there. You should
add that. Then, if you additionally add compression methods like 8-bit
quantization, you should have a separate table where you show how much less
memory it needs and how much it degrades the performance. Then, to really
boost interest in your paper, it would be really nice if you publish some
code. It sounds like you already implemented that in Lasagne or in Keras? So
just publish that code.

~~~
deepnotderp
The issue with LaTex is that diagrams and tables are a nightmare to deal with,
but I've been meaning to get to that once I have some time (swamped atm).

XCeption is not compared because XCeption doesn't have any results on CIFAR,
and this was meant to be a fast inference model, which XCeption is not.

I do provide links to full resolution images of the network topology.

A comparison of parameter count, FLOPs and other performance figures would
probably be useful, you're right, I'll add that as soon as I have time.

MobileNets came out after this, and I haven't updated it in the meantime.

I used Keras but there are some internal tools, namely a data loader, a
visualization tool, a replication environment and an optimizer that I'm not
allowed to share externally.

Hope you found it interesting, and thanks for the feedback!!

~~~
auggierose
tables are really easy in LaTex, and as for diagrams, just export them as PDFs
or something and then include them in LaTex.

------
zitterbewegung
This is a really good breakdown on data engineering for mobile. Usually you
read something really theoretical or a complex ML solution but here is a great
breakdown of the pros and cons of where to do training or inference.

------
candiodari
When using Google or Amazon assistant, I wish they would do the machine
learning on the device. I understand they want to protect their models but
just this morning:

Ok Google, what's the time ?

* 2 seconds or so, nothing *

Hey Google, what's the time ?

* Another 2 seconds or so, nothing *

Alexa, what's the time ?

* 2 seconds or so, again *

"Here it currently are the is 9:42 search results for"

What happened ? Network had a hiccup (okay, it sucks), with as a result that
Google's connection was blocked for a while, this caused me to repeat the
question, which caused Google assistant to resend the query (in other words
send a cancel down a tcp connection that hadn't gotten through it's buffer
yet, resulting in even more delay). More delay let me to ask alexa instead.
Alexa got lucky on the network.

This resulted in Alexa and Google answering at the same time, alexa with the
time, google assistant with the search results for "what's the time hey google
what's the time alexa what's the time". I was ready to throw things out of the
window.

But the root cause of this problem was the delay due to the network.

So PLEASE get local voice transcription going, please ! Save alexa and my
phone from getting thrown out the window.

------
deepnotderp
I think that custom deep learning chips will be the best enabler of edge
device deep learning, it's just too difficult to deploy anything useful onto
most smartphones or other computationally constrained devices, and to compound
this you often have to use the _CPU_ because the GPU is unavailable either due
to the framework or due to the drivers unavailable.

~~~
zitterbewegung
I think that custom chips won't be made but just better graphics processors
that can do inference even faster. Gamers that want more performance on their
mobile games will push mobile graphics to the point that you will able to do
gpgpu.

~~~
deepnotderp
The problem is that even the Titan x drowns when doing real time inference, so
a mobile gpu beating it without low precision is very unlikely.

~~~
argonaut
That's news to me! I'm pretty darn sure the Titan X has no problems with
inference on the amounts of data that a single user would want inference for.

~~~
deepnotderp
Well, I was talking about real time object detection on 640x480 video. Perhaps
most users would be okay with a 5 second or so delay when processing an image
and perhaps you could use Facebook's trick of fast, bad quality style transfer
and better quality style transfer once the image is in the servers. But the
point is that the current paradigm is very restrictive in terms of deep
learning applications.

~~~
putnam
I think computing optical flow remains a major time bottleneck. Any attempt at
temporal coherence would be great, and I'm sure there is some attempt on
Messenger, but it really only works well for the last style transfer filter,
all the way to the right in the app, and that one only really looks great in
well lit scenes. Also, the phone seems to heat up a lot.

------
intrasight
Both - just like the brain works

~~~
adrianN
Your brain comes with a cloud connection?

~~~
mamon
If you believe in God, then... yes :)

