Hacker News new | comments | show | ask | jobs | submit login
Vision Kit – An image recognition device that can see and identify objects (withgoogle.com)
258 points by modeless on Nov 30, 2017 | hide | past | web | favorite | 83 comments

I've been ruminating over how to use Tensorflow to recongize squirrels on my birdfeeder on a Raspberry Pi, and hook up something to activate a squirt gun. Now the squirrels don't stand a chance.

Makes me wonder what the military is doing with these technologies. And a bit afraid to think about their versions of “squirrel” and “squirt gun”

Why people think that AI with a gun more scarier than a soldier with a gun, I would never understand.

Remember a video where helicopter shot reporters because their cameras looked like an RPG? A soldier with a gun, when facing an uncertain situation with where he or his friends risk death, will weigh heavily on the "caution" side, shooting on everything that moves. AI with a gun, however, can accept it's death as an acceptable outcome in a similar situation.

> Why people think that AI with a gun more scarier than a soldier with a gun, I would never understand.

In two words: scale and miniaturisation. A rifleman has inherent limitations -- he cannot move unaided beyond his walking speed, cannot be made to weigh on the order of a kilogramme, and he must sleep and eat and shit. He cannot lie in wait indefinitely, and he cannot fly either. He has, in the godawful vernacular of the defence-contracting industry, SWaP (size, weight, and power) issues. His face is as vulnerable to bullets as yours or mine, and a 12.7mm (.50 BMG) round will walk through his body armour anyway. He is human, and the harm that men with guns can do is thus limited.

Stuart Russell uses the example of micro-UAVs with AI-based targeting software and each armed with a single-use shaped charge (for anti-personnel use or breaching doors) -- 10^6 of them will devastate a city, with extremely little human/logistical support needed. A million riflemen could do a bunch of killing, but they will be slower, easier to stop, easier to detect, and will require a lot more support and infrastructure to remain effective.

What do we call weapons that allow very few men to kill millions without placing themselves in any hazard, again? Russell (rightfully, in my judgement) classes this sort of use of AI as "scalable WMDs". Lethal autonomous weapons shouldn't be compared to a "soldier with a gun"; appropriate comparisons are more along the lines of "flying landmines with face recognition".

Million riflemen are unstoppable by neanderthals with sticks, million micro-UAVs are unstoppable by not-ready-for-this areas. The technology to detect uavs is the same that we're talking about though, so we are definitely no neanderthals.

If a human being does the shooting there is a human element involved - someone to surrender to, a possibility for mercy, a chance at accountability at a court, someone to write a book about what happened 20 years later.

All of those things are important, but none of them are a priority for the people who have the "AI with a gun" programmed.

> none of them are a priority for the people who have the "AI with a gun" programmed

Aren't they, really? Why do you think so? Soldiers have exactly the same incentives as designers and engineers of those devices: accepting enemy's surrender can be a rational tactical choice (so that more of them surrender instead of fighting to the end), they are just as accountable in the eyes of law (which may be important to them or not - exactly the same as the usual soldiers), etc.

The only difference is, AI will make choices rationally and less influences by emotions of the battlefield. Do you really think then net result of average soldier's emotions brings him closer to "merciful"? As far as I can tell, it's the opposite - most powerful emotion on the battlefield is usually fear, and it doesn't make people merciful at all.

Sure both can happen. I think the real fear is an army of AI bots gone wrong. I don't think we've had an army of humans turn on each other.

Humans will always make mistakes, and not that two humans can't make the same mistake, but each mistake is individual. A bunch bots stamped with the same code using the same hardware will be capable of making the same mistake for each bot due to the same bug.

> A bunch bots stamped with the same code using the same hardware will be capable of making the same mistake for each bot due to the same bug.

It's only true for the current, logic-style programming. I don't think it will hold for neural network-based decision systems.

> I don't think we've had an army of humans turn on each other.

It happened only last week in Zimbabwe. An army of humans designed to keep the ruling powers safe turned on them and takes over control, and it's hardly the first time.

You clearly need to enable cloud auto-updates, duh.


And the problems of adversarial images - https://www.theverge.com/2017/11/2/16597276/google-ai-image-...

There are already military versions of what you fear in production today that track motion.

I found it! Samsung SGR-A1


Skip to 0:40 for human tracking demo

Should see the autonomous boats that work together as a swarm. Stuff is super cool/scary!

you can use my pre-trained model: https://github.com/secretbatcave/Uk-Bird-Classifier whilst the majority of the objects it can classify are bird, it does know what cats and squrrells are.

It looks promising, but seems to sneakily require that you have already soldered on a 40pin header onto your RPi Zero W (I have gotten lots of practice doing it by now, but that doesn't mean I really love to...), which doesn't seem to be included in the parts list.

My Adabox 5 came with a Pi Zero and the Pimoroni Hammer Headers [1]. Not cheap, but it doesn't get one out of soldering a lot of pins.

[1] https://shop.pimoroni.com/products/gpio-hammer-header

Nice, thank you, this is a US option: https://www.adafruit.com/product/3413 I've never soldered, but I can use a hammer!

First thing I noticed too. Quite sneaky how they missed that. Speaking of headers, really want to use this one next: https://www.adafruit.com/product/2823.

So you are connected to the raspberry pi wirelessly and it sends the identified object(s) data through some form like HTTP or something? Hmm

If it can differentiate between strangers and yourself/friends/family that would be (additionally) interesting.

Hi, I'm the co-founder of https://snips.ai, we are building a 100% on-device Voice AI platform which runs on Raspberry Pi, if you are looking to add voice interaction to your cool image recognition, you can use it for free!

We plan to open-source it over time

looks amazing; I am super interested in learning the technologies required behind building something like that. what would you suggest where I should start from?

This is something I've been looking into working on.

My focus is on capturing the health of plants and extracting meaning using a Deep Learning Video Camera.

Anyone know have any information on where to start? Such as if there's a database that I can feed into the ML engine on different diseases in a plant?

NDVI is a direction a lot of drone people are going, and probably a good place to start to get a sense of techniques and work being done in plant monitoring:



More ideas: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4600171/

Thank you :)

There are some people doing something similar with drones. Just saw a presentation on this a few weeks back. https://support.dronedeploy.com/v1.0/docs/ndvi-algorithms. Measures the difference between blue and near infrared to infer things about plant health, although I don't believe it indicates for specific diseases. Good luck with your work!

Thank you :)

What do you mean " extracting meaning "?

Notifying you that your plant needs water, has mites, needs more nutrients, not enough light, etc.

Lots more technical info available here: https://blog.hackster.io/announcing-the-aiy-projects-vision-...

One have to be skeptical about the motivation behind the timing of this release specially coming so soon after AWS releasing Deeplens yesterday in reinvent 2017.


Actually, there's an upcoming AI/ML specific Google conference coming up next week or so.


If you want to play with something similar, but at a different level of abstraction (software, interfacing with the Google Cloud Vision API in a way that's accessible to non-technical folks -- e.g., teachers are using this pattern to create "scavenger hunts" in their classrooms [0]), check out how it's being done in Metaverse[1].

[0] - https://twitter.com/search?src=typd&q=metaverse%20app

[1] - https://www.youtube.com/watch?v=zWuZVM46qa0

I see this uses a VisionBonnet board based on a Movidius chip... I wonder if such an accelerator is really needed. Could someone elaborate on the resources needed for applying (as opposed to training) a typical deep neural network? I am only familiar with "old school" neural nets, which are relatively cheap to evaluate. What is the bottleneck, memory, or the parallel computation? I would think that you could use GPU shaders to some extent, but the mixing after each layer and the number of weights you have to save would make it hard on a RPi, right?

Neural Net accuracy scales with the amount of compute you have available at inference time, and for vision models it is largely the amount of compute you need (NLP models often require a lot of memory too), so while you could probably squeeze something onto a raspberry pi, you would probably have to sacrifice frames per second and overall performance.

I think for a hobby project bundling an accelerator is the right choice so that hobbyists don't have to worry so much about performance.

The same way that a raspberry pi really is overkill for almost everything people do with them since you could do the same with a microcontroller with no OS, someone could probably squeeze something onto a raspberry pi without the accelerator, but that's going to be far harder than just getting started with the high level APIs.

Am I the only one who finds it difficult to figure out what this device actually does?

What are the inputs, and outputs?

How do you train it?

I only see a hardware assembly guide, but nothing on the software.

EDIT: found more information here: https://developers.googleblog.com/2017/11/introducing-aiy-vi...

No you aren't the only one.

SD image - coming soon

Android app - coming soon

SDK - no links or search hits

It's an unfinished project that's been rushed to the press with little documentation.

A bit more info from another site[1]:

"It’s called the AIY Vision Kit, and it’s up for pre-order from Micro Center for $45, with an expected ship date of December 31st."

[1] https://liliputing.com/2017/11/google-introduces-45-aiy-visi...

Is this a response/competitor against AWS DeepLens?

That is a lot of steps to setup a little camera without giving you the core software that powers the whole shebang. If you don't know tensorflow or how to set it up, that seems to be a bit of the most important part. Unless of course, someone knows of a good starting place for setting it up yourself.

Crazy how different the approach here is from AWS Deeplens!

Karl Marx vs Adam Smith

The communist vs. the guy that warned against the political power of the business-owning class?

Not sure that's the contrast you are looking for...

Communist vs capitalist

Any word on what the daughterboard does? I presume it's a TPU? Will it work on a full-size RPi? Given that Pi Zeros are still really hard to get hold of (especially outside the US)...

If you look closely at the screenshots, it comes with a Movidius Chip on the VisionBonnet board.

I'm guessing this VisionBonnet accessory is simply another spin on the Intel "Movidius Neural Compute Stick" with the Movidius Chip wired directly to the CSI Camera port and the GPIOs on the Zero used to talk with it. So you probably develop on it using the same Movidius Toolchain you use for the Neural Stick: http://developer.movidius.com

Their SDK recently had a major release with TensorFlow support included, which I bet drives this. (Even with Tensorflow Lite optimizations, the RPi zero is probably just too weak to drive inferences for any non-toy model.)

For awhile I’ve wanted to build a hat that automatically takes a picture of every dog that I see. This looks like it could be a great start for that project.

Can we source the parts for one of these now? The pre-order page states they will be available on Dec 31st. Is there an equivalent?

The one new looking item is the VisionBonnet (with a low power Intel Movidius chip [1]). I've been pounding away on a low power / low cost NN vision device and now in the last two days we got Amazon DeepLens and this Google AIY Vision kit. Exciting and frustrating at the same time.

[1] https://www.movidius.com/solutions/vision-processing-unit

I'll go with exciting. I'm looking at doing some computer vision (at least background segmentation for motion detection) as part of a security camera NVR project. I was eyeing the Hexagon DSP 680 included in the newest Qualcomm SoCs but couldn't find a cheap SBC that included it. At first glance, the VisionBonnet seems to do similar things as part of a $45 kit. As a bonus, they say it's supported by TensorFlow. That will be helpful if I ever actually get into machine learning...

On second glance, I think it's a pretty different device than the Hexagon DSP 680 I mentioned.

For one thing, this board in particular apparently has a direct connection to the camera. I'm not sure if you can do anything but live video from the directly-connected camera (in my case, I want prerecorded video / video from IP cameras). Maybe it can but it's not immediately obvious anyway.

The $75 "Movidius Neural Compute Stick" uses the same chip and does everything via USB so that's more promising. But it's a binary-only API that's totally focused on neural networks (and only available for Ubuntu/x86_64 and Raspbian/arm7). In contrast, I believe you can easily send Hexagon arbitrary code. Its assembly format is documented and upstream llvm appears to support it. So if I want to do background subtraction via more old-school approaches, the Hexagon is probably useful where the Movidius stuff is not. And I have yet to learn anything about neural networks so that's a significant factor for me at least.

Really neat hardware but I wish it were more open.

If I was going to do some embedded image processing I would choose a Tegra. You can get a Shield TV for not too much money, and although I haven't done it myself it looks pretty hackable with both Android and Ubuntu (and if you don't want to hack it you can just buy the devkit). CUDA is a decent toolkit and of course NVIDIA's support for neural networks is by far better than anyone else's.

> All you need is a Raspberry Pi Zero W, a Raspberry Pi Camera 2, and a blank SD card.

Looks like Google's own Google Clips -- https://store.google.com/us/product/google_clips?hl=en-US

That the free Android app is "coming soon" does seem to suggest that this was rushed out as a response to Amazon's DeepLens. That said, I'd say it's a pretty good response.

How is this better than using a typical smartphone?

It can be left in a specific location, for one thing.

Security and game cameras are a massively-unsolved problem, for instance. I'd like to capture footage of bears, coyotes, and other wildlife as it travels through my back yard, not to mention keeping an eye out for larger bipedal visitors. But it's almost impossible to convince the naive motion detection algorithms in my surveillance cameras not to respond to trees swaying back and forth in the wind, or to the resulting rapid movement of patches of dappled sunlight. Or to spiders crawling back and forth in front of the lens, building a web. Or to moths that seem to be attracted to the IR illuminator at dusk. Or to any number of other things that any human would instantly recognize as a false alert, but that are very difficult for software to reject without frequent mistakes in sensitivity, specificity or both.

It's hard to believe that anyone with an outdoor security camera hasn't had to deal with similar hassles. I'm sure there are other applications for a camera like this, but if I were an investor, I'd be very interested in the intersection of ML and security in general. I'm definitely interested as a homeowner.

Same here. I like to set up trail cameras in the mountains to watch animals and it's really hard to find a spot where no piece of grass or some leaf in a spider web triggers the camera all the time when it's windy. I would also like the camera to stay on the whole time the animal is in view, not only a fixed interval like 10 secs.

OTOH an 2 year old smartphone has a decent camera, CPU, and an inbuilt battery. Powering the things is probably the trickiest part for remote surveillance (even home surveillance if you don't want to rewire your house); a battery helps with that.

Compared to smartphones, outdoor cameras need to be built to very different specifications and standards. I don't think my iPhone would last very long duct-taped to the side of my house.

Of course, neither would a camera that's made out of cardboard and runs on a Raspberry Pi. But for prototyping, this seems like it could offer a good start.

This is a reasonable question with a real answer that has not yet been mentioned. The Movidius accelerator chip in this kit does 150 GFLOPS at around 1 watt of power consumption [1] making it much faster (for this specific application) than a budget phone, or even a flagship most likely, with lower power consumption.

[1] http://www.tomshardware.com/news/movidius-fathom-neural-comp...

It costs $45 (+ Pi Zero W + Camera + SD card = approx. $90 total) and could be left somewhere analyzing video frames and communicating results.

There are quite a few budget smartphones out there for under $90, without even touching the used market. Could the software run on an Android device?

Like what?

Redmi 4A. There are some very interesting low end phones coming out of China.

Do you know of any ML/Deep learning apps for Android that I can apply as I would with this? I want to run this on my fire engine in traffic.

You probably wouldn't want to train models on it, but there's several examples of using models on phones. https://github.com/tensorflow/tensorflow/blob/master/tensorf...

Well, it's cheaper than a typical smartphone. And probably easier to customize, hardware and software-wise, than a typical smartphone.

Where's the deep learning/tensorflow part? It shows how to assemble the box with the components only.

I'm guessing they just added that deep learning buzzwords after the DeepLens announcement from AWS.

I can only that's the daughterboard; might be a TPU.

It's a Movidius Myriad 2 computer vision chip, model MA2450. This chip can accelerate TensorFlow neural net models.


Is the VPU on the Vision Bonnet the same as the one on the Intel Neural Stick we saw earlier this year?

Had anyone else noticed the "privacy LED"? What it is for, how does it work?

Pretty sure it just is on when the camera is on, as an indicator.

This is really neat, I like the open hardware and software approach.

There's no YouTube demo link? Didn't find it...

That lens mount looks super dodgy.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact