Hacker News new | past | comments | ask | show | jobs | submit login
Movidius launches a $79 deep-learning USB stick (techcrunch.com)
118 points by rajeevk on July 22, 2017 | hide | past | web | favorite | 39 comments

It took me a while to find how it interfaces with the system (driver? dedicated application? just drop model and data in a directory which appeared on mounted key?), so I'll post it here.

To access the device, you need to install a sdk which contains python scripts that allow to manipulate it (so, it seems like it's a driver embedded in utilities programs). Source: https://developer.movidius.com/getting-started

> Movidius's NCS is powered by their Myriad 2 vision processing unit (VPU), and, according to the company, can reach over 100 GFLOPs of performance within an nominal 1W of power consumption. Under the hood, the Movidius NCS works by translating a standard, trained Caffe-based convolutional neural network (CNN) into an embedded neural network that then runs on the VPU.

This is sure to save me money on my power bill after marathon sessions of "Not Hotdog."

Ignoring the price tag this is about half the performance of the Jetson TX2 which can manage around 1.5TFLOPS on 7.5W.

Interesting that you could use this to accelerate systems like the Raspberry Pi. The Jetson is a pain in the backside to deploy (at a production level) because you need to make your own breakout board, or buy an overpriced carrier.

EDIT: I use the Pi as an example because it's readily available and cheap. There are lots of other embedded platforms, but the Pi wins on ecosystem.

1.5TFLOPS would have made the supercomputer top500 12 years ago. That's amazing.

Keep in mind that supercomputers are a lot less specialized than circuits for running neural nets.

12 years ago you could have gotten a stack of 5-8 7800 GTX cards and had 1.5TFLOPS of single precision. 11 years ago you could have had a stack of 5 cards with unified shaders. It's not fair to compare against the significantly more complicated route of getting 100 CPU cores working together with only 1-4 per chip.

But can't you configure the device to do e.g. fast matrix-vector multiplications instead of inference? I can be wrong, but I suspect that's what people do mostly on supercomputers anyway.

That 1.5 TFLOPs for TX2 is FP16, while TOP500 is FP64.

But, you can do training on a Jetson, whereas the stick is inference only of pre-trained networks

You can't really do any reasonable training on a Jetson.

Thanks, worth knowing (was thinking of getting one in a few months)

So what can you do with a deep-learning stick of truth?

EDIT: Looks like the explanation is in a linked article: https://techcrunch.com/2016/04/28/plug-the-fathom-neural-com...

How the Fathom Neural Compute Stick figures into this is that the algorithmic computing power of the learning system can be optimized and output (using the Fathom software framework) into a binary that can run on the Fathom stick itself. In this way, any device that the Fathom is plugged into can have instant access to complete neural network because a version of that network is running locally on the Fathom and thus the device.

This reminds me of Physics co-processors. Anyone remember AGEIA? They were touting "physics cards" similar to video cards. Had they not been acquired by Nvidia, they would've been steamrolled by consumer GPUs / CPUs since they were essentially designing their own.

The $79 price point is attractive. I wonder how much power can be packed into such a small form factor? It's surprising that a lot of power isn't necessary for deep learning applications.

> The $79 price point is attractive. I wonder how much power can be packed into such a small form factor? It's surprising that a lot of power isn't necessary for deep learning applications.

It runs pretrained NN, which is the cheap part. So this is a chip optimized to preform floating point multiplication and that's it.

Yeah I can run pretrained models on my pi3, that's not that exciting, its more exciting that 2nd handle graphics cards are dumping onto the market.

If you can use a set of say 3 of these running in paralell on a pi3 with tensorflow to train models from scratch then this is more interesting.

I wonder how it stacks up against the snapdragon 410e. You can buy one on a dragonboard for roughly the same price ~$80 [1]. The dragonboard has four ARM cores, a GPU, plus a DSP. You could run OpenCV/FastCV on any or all three.

[1] https://www.arrow.com/en/products/dragonboard410c/arrow-deve...

Why not both? Plug in the USB deep-learning USB stick. Use the snapdragon to do ETL and or download models to be run on the Movidlus so that it can perform inference. I am waiting for the day that we can do some nontrivial training on mobile hardware.

It's surprising how much attention this has had over the last few days, without any discussion of the downside: it's slow.

It's true that it is fast for the power it consumes, but it is way (way!) to slow to use for any form of training, which seems to be what many people think they can use it for.

According to Anandtech[1], it will do 10 GoogLeNet inferences per second. By very rough comparison, Inception in TensorFlow on a Raspberry Pi does about 2 inferences per second[2], and I think I saw AlexNet on an i7 doing about 60/second. Any desktop GPU will do orders of magnitude more.

[1] http://www.anandtech.com/show/11649/intel-launches-movidius-...

[2] https://github.com/samjabrahams/tensorflow-on-raspberry-pi/t... ("Running the TensorFlow benchmark tool shows sub-second (~500-600ms) average run times for the Raspberry Pi")

But all those other solutions will consume orders of magnitude more power, especially the GPU. It's actually impressive what can be achieved on 1W of power.

Yep. I think the niche here is battery-powered AI. Train on the desktop, deploy to the field on a USB stick.

The Raspberry Pi 3 uses around 4W so that is a lot less than an order of magnitude more. You need a host machine to use this too.

Yes the low power is great, though.

Interesting applications for drones and robots. The small form factor and low energy requirements are the key.

Really disappointing there doesn't appear to be a USB-C option

Or a blue bike shed option, for that matter.

I know it may be surprising, but bandwidth is a really important factor for speed in deep learning, and USB-C would help with that.

USB-C is a connector, and has no effect on speed.

I think you're referring to USB 3.1 gen2, which would double the theoretical bandwidth to 10Gbps.

If so I'm amazed, because I have never seen someone conflate those two before. I've seen plenty of people conflate USB-C and USB 3 in general, but not specifically thinking USB-C implies the 10gbps mode.

My assumption is that they never would build more capacity into a device, whose only interface was USB 2.0, than USB 2.0 can actually handle.

I thought the stick was USB 3.0?

O.K. yes.

I went back, looked, and yes it does support USB 3.0. Actually given the chip itself also apparently supports GigE it's a shame there isn't the option with that brought out.

As make3 said, bandwidth can be important, but it's also becoming more-so the case that people have computers that only come with USB-C. Obviously not a dealbreaker since people can just buy an adapter, but it's an issue worth bringing up.

Really? I'm not a fan of USB-C being yet another connector that's easy to snap off. I break a USB cable every few days, and sometimes a socket just gets torqued off a PCB. The other day I had a phone on the edge of my table and I bumped the cable from the top with my elbow. Snap! The big-old A connectors are much, much harder to break, especially if the case has a solid metal rectangle that holds the USB connector in place.

I still wish though USB connectors had some kind of rubber padding around them. Like the computer IEC power connectors -- no matter what you do it's virtually impossible to break cable or socket.

Even DB9 connectors were better. I never broke one in the many years I used them. Rock solid and you could even screw them in.

> I'm not a fan of USB-C being yet another connector that's easy to snap off.

I initially had the same concern, but after using USB-C heavily for over a year now, not had one instance of a connector failing.

> I break a USB cable every few days

Either you're doing it wrong (and really careless / buy really cheap cables) or you're doing some highly specialized thing which means the feedback should be caveated saying I'm in "xyz field, which means i break far more USB cables than most ever will".

"doing it wrong"

I never had this problem before USB. I also break a lot of USB cables in the field, e.g. hiking, and that never used to be a problem with barrel connectors. USB connectors just are not designed for people who don't sit in an office all day.

The "connector" has nothing to do with it compared to older cables imho. The wires in something like a DB9/PS2 cable were heavier grade, but the durability of the connector wasn't measurable better (at least not with a good USB-A cable). If you're continually breaking them, I recommend upgrading to braided cables. I abuse the heck out of them and still haven't had one fail that I can remember.

It's mostly the connector that breaks, not the cable itself. I've tried all that reinforced stuff to no avail. One broke yesterday when I put my phone in my pocket while charging, and sat down on a chair. Seems like a pretty common use case to me.

Did you used to sit on DB9 connectors that were plugged in and inside your pocket? That might be why you see more breakage now.

A usb-c to type A adapter should be fine in the early stages. I'm sure USB-c will be on it's way soon. Even with the backing of Intel, they still need to factor in development time and support for different hardware, etc. They would gain little from supporting USB-C at this stage.

Currently out of stock as best I can tell.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact