
TensorFlow machine learning now optimized for the Snapdragon 835 and Hexagon 682 - rahulchowdhury
https://www.qualcomm.com/news/snapdragon/2017/01/09/tensorflow-machine-learning-now-optimized-snapdragon-835-and-hexagon-682
======
Scaevolus
Qualcomm's Snapdragon chips ship with a Hexagon DSP core, which is optimized
for high-throughput numerical calculations -- not the branch heavy code you'll
see in most general-purpose applications.

TensorFlow does lots of matrix multiplies. The Hexagon chip can do 8
multiplies each cycle, and runs multiple threads on each core. The benchmark
isn't clear, but it's likely that _one_ Hexagon instruction can replace
multiple normal ARM instructions for the inner loop.

You can see some more on how the Hexagon DSP works here:
[http://pages.cs.wisc.edu/~danav/pubs/qcom/hexagon_hotchips20...](http://pages.cs.wisc.edu/~danav/pubs/qcom/hexagon_hotchips2013.pdf)

~~~
pjscott
Looks like those slides are for an older version. Since then they've bumped up
the size of the vector execution units from 64 bits to 1024 bits and
quadrupled the number of them, if I'm reading this right.

[http://www.anandtech.com/show/10948/qualcomm-
snapdragon-835-...](http://www.anandtech.com/show/10948/qualcomm-
snapdragon-835-kryo-280-adreno-540/2)

------
fithisux
Unfortunately, this DSP is not FOSS, you need an SDK for this with binary
components. Hopefully some day we have a cross-DSP standard or at least
documentation in order to use said chips. OpenCL could also acquire a DSP
profile.

~~~
pawadu
A DSP is significantly simpler than a CPU, specially this particular bland.

The hard part is to implement it efficiently (power, area speed). So in theory
with could have an open source design and the vendors could still compete with
each other by providing the most efficient implementation.

------
dharma1
Nice, looks like about 10x speed up for this classification task.

I think there are big gains to be made in lower precision inference too. Lots
of people doing interesting work in that area, check out these guys -
[https://xnor.ai/](https://xnor.ai/)
[https://arxiv.org/abs/1603.05279](https://arxiv.org/abs/1603.05279)

------
nharada
Are the two devices running the same model? The article claims the DSP has
higher confidence, but I don't see why that would be the case. I suppose one
could work at a higher precision but that wouldn't make sense if they're
comparing performance.

~~~
zo7
I've talked a little bit with some engineers at Qualcomm who worked on
projects like this. My impression was that they make a lot of compromises when
they optimize a computer vision algorithm for their hardware which slightly
alters the result, but can run extremely fast with comparable performance.
It's likely they're doing something similar here, which might explain the
difference in confidences, but I highly doubt that it objectively classifies
images better than the one running on the CPU. If anything the better
performance is an illusion just because the model running on the DSP reacts
more quickly.

------
apadmarao
I have a basic understanding of machine learning and absolutely no
understanding of TensorFlow.

Can someone help me understand what is going on here?

Are we doing just doing prediction for a model on a mobile device instead of
in the cloud? If so, for what kinds of scenarios is this useful?

~~~
pjscott
Sure! They have a neural net that they pre-trained for image recognition as a
demo. They ran it on the mobile device both times -- no cloud involved -- but
the one on the left is running on the CPU, while the one on the right is
running on a DSP located on the same chip. The DSP is specialized for
workloads that have very regular control flow and involve a lot of fixed-point
arithmetic. Running the neural network is such a workload, so they get
impressive speed and power improvements by using the DSP instead of the CPU.

~~~
monk_e_boy
How does training the brain compare to running a trained brain?

Can the pre-trained brain (the one in the phone) flip to training mode? Can
you teach it something and upload that new training result to the original?

Or for things it doesn't recognise, do you need to add the images and
classification to the training data and create a 'new brain' and download it
to the phone?

Is there one super organism (cloud based learning) that gives birth to
millions of mini-minds. Each mini-mind asking it's parent to help it with
things it doesn't understand. In 20 years time what will this say about
consciousness? Where would it live? Is this a new way to think about minds,
those that are distributed in many physicals devices?

And the precision of the hardware changing thought processes in subtle ways is
very interesting. Upgrading a neural net to a new hardware platform would
change how it works, how it thinks and makes decisions.

~~~
IanCal
> How does training the brain compare to running a trained brain?

Harder operations and you need to do a lot more of them. Far more suited to
having a single massive training system then send out the information just for
inference.

Another thing that can be done is to train a large neural net then figure out
which bits you can cut out without sacrificing much accuracy. The newer,
smaller net is then faster to run and more likely to actually fit neatly into
the RAM on your phone.

> Can the pre-trained brain (the one in the phone) flip to training mode? Can
> you teach it something and upload that new training result to the original?

Technically you probably could, but practically the answer is no for the types
of nets used in this kind of thing. You'd want to be training the net on
millions of images, and even if it were as fast as the inference on the phones
that'd still take way too long.

[edit - interestingly this is not only technically possible but pretty much
what is often done but on more powerful machines. You can start with a pre-
trained network or model and then "fine tune" it with your own data:
[http://cs231n.github.io/transfer-learning/](http://cs231n.github.io/transfer-
learning/)]

> Or for things it doesn't recognise, do you need to add the images and
> classification to the training data and create a 'new brain' and download it
> to the phone?

This is generally the approach, yes. It has other advantages though, the
performance can be checked and compared once then re-used lots of times.

> Is there one super organism (cloud based learning) that gives birth to
> millions of mini-minds. Each mini-mind asking it's parent to help it with
> things it doesn't understand. In 20 years time what will this say about
> consciousness? Where would it live? Is this a new way to think about minds,
> those that are distributed in many physicals devices?

In many ways, sounds similar to delegating work to more junior / less well
trained staff.

------
mcintyre1994
In some of these examples the Hexagon DSP one detects it first but with a low
confidence, and then the CPU detects it later with a higher confidence than
the Hexagon DSP one has yet obtained.

If you were using this for a real purpose, would you only consider it
identified at a certain confidence? If you did then the CPU one is
surprisingly more performant in some of these examples despite taking longer
to get to the object at all.

~~~
IanCal
They appear to be seeing slightly different scenes. I think the phones are
next to each other, and this might explain the difference in what they're
reporting.

I'd be very interested to know if there's any difference in the processing
that should be taken into account however.

------
marclave
This is absolutely crazy... The response time is unbelievable.

------
sliken
What kinds of "AI" is likely to be viable to run on a snapdragon 835+682?

Recognizing faces? Voice? Handwriting? Captions for photos? Natural Language
queries (like google's AI assistant)? Positioning by recognizing landmarks?
Simple autonmous driving (say RC cars)? Flying (quad rotors or rc planes)?
Cars?

Or I guess a better question... will this change anything except decrease your
need for a good network?

~~~
z92
Being able to do all these works on a cell phone, with no network, should be a
big gain.

~~~
lightedman
Cell phones were already powerful enough for most of this to begin with. Face
recognition? Windows 98 on a 180MHz Evergreen Overclocked processor and 48MB
RAM did that just fine. Voice? Ditto. Handwriting recognition? Palm Pilots
with far less power could do it.

I think nobody's bothered to code that stuff in, because, well, despite trying
time and time again to make these hot-shit features for a couple of decades,
these features end up unused. Those that pay attention to history see this,
and figure "Probably not worth trying, even in this day and age."

~~~
tsomctl
Maybe they are unused because the implementation was poor. And the
implementation was poor because a Pentium 2 processor is shit and deep
learning has only been practical in the last couple of years.

More specifically, I had a Palm Pilot, and you had to write using weird letter
shapes for it to work.

~~~
lightedman
The implementation worked great. You walked up to your computer, turned it on,
and looked at the camera. Bam! You were logged in, assuming you had enough
proper light on your face for the software to make out your facial shape.

It was just bloody annoying because you had to wait about 15 seconds for it to
figure everything out. It was far faster to just use the keyboard.

------
visarga
This is the new trend - dedicated AI coprocessor. Fast and less power hungry.

~~~
jumasheff
There are speculations [1], saying that Snapdragon 835 will be used in Samsung
Galaxy S8, HTC 11, OnePlus 4, LG G6.

[1] [http://www.pcadvisor.co.uk/new-product/mobile-
phone/snapdrag...](http://www.pcadvisor.co.uk/new-product/mobile-
phone/snapdragon-835-specs-features-meet-chip-that-will-
power-2017-flagships-3649321/)

~~~
wapz
Wow if Samsung will be using it that will show how powerful it is. I'm eyeing
a OP4 for my next phone but I hope they don't jump the price again.

~~~
msh
Eh, samsung is normally using qualcomm flagship CPUs for their north american
handsets (except for the snapdragon 810 with its heat issues).

Generally (yes there are exceptions) qualcomm produces the best flagship arm
CPUs outside of apple.

------
ant6n
I for one am curious how large the image classification neural net is (in MB).
I've come across some image classifier (vgg16) in some ML course that was a
500MB file, although the format may have been very inefficient.

If it's a 100MB file, you'd basically have to ship it with the operating
system.

------
nswanberg
Is this available now or just announced? I've searched their site and forums
but can't find anything that's been released, including for the 820, aside
from some lower-level SDKs (comma.ai's openpilot uses these lower-level SDKs
in their closed-source portion).

------
sandGorgon
Can someone explain what did qualcomm build here ? is this CUDA for ARM ?

~~~
Symmetry
More like a CUDAish backend for a specific application. And for Hexagon rather
than ARM, Hexagon and ARM are both architectures. ARM is a RISC for
application processing and Hexagon is a VLIW for digital signal processing.

------
nojvek
I wonder how this compares to apple's gpu on iPhone 7.

Having Siri do local voice and image recognition would be killer. I hate the
latency currently for the AI agents

------
ferongr
Hopefully the SOC will run with a recent kernel.

