
Movidius launches a $79 deep-learning USB stick - rajeevk
https://techcrunch.com/2017/07/20/movidius-launches-a-79-deep-learning-usb-stick/
======
oelmekki
It took me a while to find how it interfaces with the system (driver?
dedicated application? just drop model and data in a directory which appeared
on mounted key?), so I'll post it here.

To access the device, you need to install a sdk which contains python scripts
that allow to manipulate it (so, it seems like it's a driver embedded in
utilities programs). Source: [https://developer.movidius.com/getting-
started](https://developer.movidius.com/getting-started)

------
legolassexyman
> Movidius's NCS is powered by their Myriad 2 vision processing unit (VPU),
> and, according to the company, can reach over 100 GFLOPs of performance
> within an nominal 1W of power consumption. Under the hood, the Movidius NCS
> works by translating a standard, trained Caffe-based convolutional neural
> network (CNN) into an embedded neural network that then runs on the VPU.

This is sure to save me money on my power bill after marathon sessions of "Not
Hotdog."

~~~
joshvm
Ignoring the price tag this is about half the performance of the Jetson TX2
which can manage around 1.5TFLOPS on 7.5W.

Interesting that you could use this to accelerate systems like the Raspberry
Pi. The Jetson is a pain in the backside to deploy (at a production level)
because you need to make your own breakout board, or buy an overpriced
carrier.

EDIT: I use the Pi as an example because it's readily available and cheap.
There are lots of other embedded platforms, but the Pi wins on ecosystem.

~~~
MacsHeadroom
1.5TFLOPS would have made the supercomputer top500 12 years ago. That's
amazing.

~~~
Dylan16807
Keep in mind that supercomputers are a lot less specialized than circuits for
running neural nets.

12 years ago you could have gotten a stack of 5-8 7800 GTX cards and had
1.5TFLOPS of single precision. 11 years ago you could have had a stack of 5
cards with unified shaders. It's not fair to compare against the significantly
more complicated route of getting 100 CPU cores working together with only 1-4
per chip.

~~~
amelius
But can't you configure the device to do e.g. fast matrix-vector
multiplications instead of inference? I can be wrong, but I suspect that's
what people do mostly on supercomputers anyway.

------
sillysaurus3
So what can you do with a deep-learning stick of truth?

EDIT: Looks like the explanation is in a linked article:
[https://techcrunch.com/2016/04/28/plug-the-fathom-neural-
com...](https://techcrunch.com/2016/04/28/plug-the-fathom-neural-compute-
stick-into-any-usb-device-to-make-it-smarter/)

 _How the Fathom Neural Compute Stick figures into this is that the
algorithmic computing power of the learning system can be optimized and output
(using the Fathom software framework) into a binary that can run on the Fathom
stick itself. In this way, any device that the Fathom is plugged into can have
instant access to complete neural network because a version of that network is
running locally on the Fathom and thus the device._

This reminds me of Physics co-processors. Anyone remember AGEIA? They were
touting "physics cards" similar to video cards. Had they not been acquired by
Nvidia, they would've been steamrolled by consumer GPUs / CPUs since they were
essentially designing their own.

The $79 price point is attractive. I wonder how much power can be packed into
such a small form factor? It's surprising that a lot of power isn't necessary
for deep learning applications.

~~~
legolassexyman
> The $79 price point is attractive. I wonder how much power can be packed
> into such a small form factor? It's surprising that a lot of power isn't
> necessary for deep learning applications.

It runs pretrained NN, which is the cheap part. So this is a chip optimized to
preform floating point multiplication and that's it.

~~~
shams93
Yeah I can run pretrained models on my pi3, that's not that exciting, its more
exciting that 2nd handle graphics cards are dumping onto the market.

~~~
shams93
If you can use a set of say 3 of these running in paralell on a pi3 with
tensorflow to train models from scratch then this is more interesting.

~~~
RBerenguel
No training, only evaluation:
[https://ncsforum.movidius.com/discussion/104/can-i-use-
this-...](https://ncsforum.movidius.com/discussion/104/can-i-use-this-for-
training-of-neural-network-with-tensorflow)

------
nl
It's surprising how much attention this has had over the last few days,
without any discussion of the downside: it's slow.

It's true that it is fast for the power it consumes, but it is way (way!) to
slow to use for any form of training, which seems to be what many people think
they can use it for.

According to Anandtech[1], it will do 10 GoogLeNet inferences per second. By
_very_ rough comparison, Inception in TensorFlow on a Raspberry Pi does about
2 inferences per second[2], and I think I saw AlexNet on an i7 doing about
60/second. Any desktop GPU will do orders of magnitude more.

[1] [http://www.anandtech.com/show/11649/intel-launches-
movidius-...](http://www.anandtech.com/show/11649/intel-launches-movidius-
neural-compute-stick)

[2] [https://github.com/samjabrahams/tensorflow-on-raspberry-
pi/t...](https://github.com/samjabrahams/tensorflow-on-raspberry-
pi/tree/master/benchmarks/inceptionv3) ("Running the TensorFlow benchmark tool
shows sub-second (~500-600ms) average run times for the Raspberry Pi")

~~~
olegkikin
But all those other solutions will consume orders of magnitude more power,
especially the GPU. It's actually impressive what can be achieved on 1W of
power.

~~~
corysama
Yep. I think the niche here is battery-powered AI. Train on the desktop,
deploy to the field on a USB stick.

------
visarga
Interesting applications for drones and robots. The small form factor and low
energy requirements are the key.

------
tuxracer
Really disappointing there doesn't appear to be a USB-C option

~~~
skrebbel
Or a blue bike shed option, for that matter.

~~~
make3
I know it may be surprising, but bandwidth is a really important factor for
speed in deep learning, and USB-C would help with that.

~~~
anoother
USB-C is a connector, and has no effect on speed.

I think you're referring to USB 3.1 gen2, which would double the theoretical
bandwidth to 10Gbps.

~~~
Dylan16807
If so I'm amazed, because I have never seen someone conflate those two before.
I've seen plenty of people conflate USB-C and USB 3 in general, but not
specifically thinking USB-C implies the 10gbps mode.

------
j_s
Currently out of stock as best I can tell.

