
Vision Kit – An image recognition device that can see and identify objects - modeless
https://aiyprojects.withgoogle.com/vision
======
ralphc
I've been ruminating over how to use Tensorflow to recongize squirrels on my
birdfeeder on a Raspberry Pi, and hook up something to activate a squirt gun.
Now the squirrels don't stand a chance.

~~~
andrei_says_
Makes me wonder what the military is doing with these technologies. And a bit
afraid to think about their versions of “squirrel” and “squirt gun”

~~~
golergka
Why people think that AI with a gun more scarier than a soldier with a gun, I
would never understand.

Remember a video where helicopter shot reporters because their cameras looked
like an RPG? A soldier with a gun, when facing an uncertain situation with
where he or his friends risk death, will weigh heavily on the "caution" side,
shooting on everything that moves. AI with a gun, however, can accept it's
death as an acceptable outcome in a similar situation.

~~~
chrisan
Sure both can happen. I think the real fear is an army of AI bots gone wrong.
I don't think we've had an army of humans turn on each other.

Humans will always make mistakes, and not that two humans can't make the same
mistake, but each mistake is individual. A bunch bots stamped with the same
code using the same hardware will be capable of making the same mistake for
each bot due to the same bug.

~~~
golergka
> A bunch bots stamped with the same code using the same hardware will be
> capable of making the same mistake for each bot due to the same bug.

It's only true for the current, logic-style programming. I don't think it will
hold for neural network-based decision systems.

------
blacksmith_tb
It looks promising, but seems to sneakily require that you have already
soldered on a 40pin header onto your RPi Zero W (I have gotten lots of
practice doing it by now, but that doesn't mean I really love to...), which
doesn't seem to be included in the parts list.

~~~
rootbear
My Adabox 5 came with a Pi Zero and the Pimoroni Hammer Headers [1]. Not
cheap, but it doesn't get one out of soldering a lot of pins.

[1] [https://shop.pimoroni.com/products/gpio-hammer-
header](https://shop.pimoroni.com/products/gpio-hammer-header)

~~~
exhilaration
Nice, thank you, this is a US option:
[https://www.adafruit.com/product/3413](https://www.adafruit.com/product/3413)
I've never soldered, but I can use a hammer!

------
oulipo
Hi, I'm the co-founder of [https://snips.ai](https://snips.ai), we are
building a 100% on-device Voice AI platform which runs on Raspberry Pi, if you
are looking to add voice interaction to your cool image recognition, you can
use it for free!

We plan to open-source it over time

~~~
neebz
looks amazing; I am super interested in learning the technologies required
behind building something like that. what would you suggest where I should
start from?

------
antoniuschan99
This is something I've been looking into working on.

My focus is on capturing the health of plants and extracting meaning using a
Deep Learning Video Camera.

Anyone know have any information on where to start? Such as if there's a
database that I can feed into the ML engine on different diseases in a plant?

~~~
jsmthrowaway
NDVI is a direction a lot of drone people are going, and probably a good place
to start to get a sense of techniques and work being done in plant monitoring:

[https://agribotix.com/blog/2014/06/10/misconceptions-
about-u...](https://agribotix.com/blog/2014/06/10/misconceptions-about-uav-
collected-ndvi-imagery-and-the-agribotix-experience-in-ground-truthing-these-
images-for-agriculture/)

[https://en.wikipedia.org/wiki/Normalized_difference_vegetati...](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index)

More ideas:
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4600171/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4600171/)

~~~
antoniuschan99
Thank you :)

------
pat2man
Lots more technical info available here: [https://blog.hackster.io/announcing-
the-aiy-projects-vision-...](https://blog.hackster.io/announcing-the-aiy-
projects-vision-kit-234505bc6eef)

~~~
john_teller02
One have to be skeptical about the motivation behind the timing of this
release specially coming so soon after AWS releasing Deeplens yesterday in
reinvent 2017.

[https://aws.amazon.com/deeplens/](https://aws.amazon.com/deeplens/)

~~~
dguaraglia
Actually, there's an upcoming AI/ML specific Google conference coming up next
week or so.

------
sthielen
If you want to play with something similar, but at a different level of
abstraction (software, interfacing with the Google Cloud Vision API in a way
that's accessible to non-technical folks -- e.g., teachers are using this
pattern to create "scavenger hunts" in their classrooms [0]), check out how
it's being done in Metaverse[1].

[0] -
[https://twitter.com/search?src=typd&q=metaverse%20app](https://twitter.com/search?src=typd&q=metaverse%20app)

[1] -
[https://www.youtube.com/watch?v=zWuZVM46qa0](https://www.youtube.com/watch?v=zWuZVM46qa0)

------
captainmuon
I see this uses a VisionBonnet board based on a Movidius chip... I wonder if
such an accelerator is really needed. Could someone elaborate on the resources
needed for _applying_ (as opposed to training) a typical deep neural network?
I am only familiar with "old school" neural nets, which are relatively cheap
to evaluate. What is the bottleneck, memory, or the parallel computation? I
would think that you could use GPU shaders to some extent, but the mixing
after each layer and the number of weights you have to save would make it hard
on a RPi, right?

~~~
Eridrus
Neural Net accuracy scales with the amount of compute you have available at
inference time, and for vision models it is largely the amount of compute you
need (NLP models often require a lot of memory too), so while you could
probably squeeze something onto a raspberry pi, you would probably have to
sacrifice frames per second and overall performance.

I think for a hobby project bundling an accelerator is the right choice so
that hobbyists don't have to worry so much about performance.

The same way that a raspberry pi really is overkill for almost everything
people do with them since you could do the same with a microcontroller with no
OS, someone could probably squeeze something onto a raspberry pi without the
accelerator, but that's going to be far harder than just getting started with
the high level APIs.

------
amelius
Am I the only one who finds it difficult to figure out what this device
actually does?

What are the inputs, and outputs?

How do you train it?

I only see a hardware assembly guide, but nothing on the software.

EDIT: found more information here:
[https://developers.googleblog.com/2017/11/introducing-aiy-
vi...](https://developers.googleblog.com/2017/11/introducing-aiy-vision-kit-
add-computer.html)

~~~
lovelearning
No you aren't the only one.

SD image - coming soon

Android app - coming soon

SDK - no links or search hits

It's an unfinished project that's been rushed to the press with little
documentation.

~~~
jackhack
A bit more info from another site[1]:

"It’s called the AIY Vision Kit, and it’s up for pre-order from Micro Center
for $45, with an expected ship date of December 31st."

[1] [https://liliputing.com/2017/11/google-introduces-45-aiy-
visi...](https://liliputing.com/2017/11/google-introduces-45-aiy-vision-kit-
diy-computer-vision-hardware-projects.html)

------
fredliu
Is this a response/competitor against AWS DeepLens?

------
verifex
That is a lot of steps to setup a little camera without giving you the core
software that powers the whole shebang. If you don't know tensorflow or how to
set it up, that seems to be a bit of the most important part. Unless of
course, someone knows of a good starting place for setting it up yourself.

------
advisedwang
Crazy how different the approach here is from AWS Deeplens!

~~~
btian
Karl Marx vs Adam Smith

~~~
dragonwriter
The communist vs. the guy that warned against the political power of the
business-owning class?

Not sure that's the contrast you are looking for...

~~~
btian
Communist vs capitalist

------
askvictor
Any word on what the daughterboard does? I presume it's a TPU? Will it work on
a full-size RPi? Given that Pi Zeros are still really hard to get hold of
(especially outside the US)...

~~~
aseipp
If you look closely at the screenshots, it comes with a Movidius Chip on the
VisionBonnet board.

I'm guessing this VisionBonnet accessory is simply another spin on the Intel
"Movidius Neural Compute Stick" with the Movidius Chip wired directly to the
CSI Camera port and the GPIOs on the Zero used to talk with it. So you
probably develop on it using the same Movidius Toolchain you use for the
Neural Stick: [http://developer.movidius.com](http://developer.movidius.com)

Their SDK recently had a major release with TensorFlow support included, which
I bet drives this. (Even with Tensorflow Lite optimizations, the RPi zero is
probably just too weak to drive inferences for any non-toy model.)

------
josephpmay
For awhile I’ve wanted to build a hat that automatically takes a picture of
every dog that I see. This looks like it could be a great start for that
project.

------
monkmartinez
Can we source the parts for one of these now? The pre-order page states they
will be available on Dec 31st. Is there an equivalent?

~~~
brittohalloran
The one new looking item is the VisionBonnet (with a low power Intel Movidius
chip [1]). I've been pounding away on a low power / low cost NN vision device
and now in the last two days we got Amazon DeepLens and this Google AIY Vision
kit. Exciting and frustrating at the same time.

[1] [https://www.movidius.com/solutions/vision-processing-
unit](https://www.movidius.com/solutions/vision-processing-unit)

~~~
scottlamb
I'll go with exciting. I'm looking at doing some computer vision (at least
background segmentation for motion detection) as part of a security camera NVR
project. I was eyeing the Hexagon DSP 680 included in the newest Qualcomm SoCs
but couldn't find a cheap SBC that included it. At first glance, the
VisionBonnet seems to do similar things as part of a $45 kit. As a bonus, they
say it's supported by TensorFlow. That will be helpful if I ever actually get
into machine learning...

~~~
scottlamb
On second glance, I think it's a pretty different device than the Hexagon DSP
680 I mentioned.

For one thing, this board in particular apparently has a direct connection to
the camera. I'm not sure if you can do anything but live video from the
directly-connected camera (in my case, I want prerecorded video / video from
IP cameras). Maybe it can but it's not immediately obvious anyway.

The $75 "Movidius Neural Compute Stick" uses the same chip and does everything
via USB so that's more promising. But it's a binary-only API that's totally
focused on neural networks (and only available for Ubuntu/x86_64 and
Raspbian/arm7). In contrast, I believe you can easily send Hexagon arbitrary
code. Its assembly format is documented and upstream llvm appears to support
it. So if I want to do background subtraction via more old-school approaches,
the Hexagon is probably useful where the Movidius stuff is not. And I have yet
to learn anything about neural networks so that's a significant factor for me
at least.

Really neat hardware but I wish it were more open.

~~~
modeless
If I was going to do some embedded image processing I would choose a Tegra.
You can get a Shield TV for not too much money, and although I haven't done it
myself it looks pretty hackable with both Android and Ubuntu (and if you don't
want to hack it you can just buy the devkit). CUDA is a decent toolkit and of
course NVIDIA's support for neural networks is by far better than anyone
else's.

------
wiradikusuma
Looks like Google's own Google Clips --
[https://store.google.com/us/product/google_clips?hl=en-
US](https://store.google.com/us/product/google_clips?hl=en-US)

------
fatjokes
That the free Android app is "coming soon" does seem to suggest that this was
rushed out as a response to Amazon's DeepLens. That said, I'd say it's a
pretty good response.

------
Willson50
How is this better than using a typical smartphone?

~~~
CamperBob2
It can be left in a specific location, for one thing.

Security and game cameras are a massively-unsolved problem, for instance. I'd
like to capture footage of bears, coyotes, and other wildlife as it travels
through my back yard, not to mention keeping an eye out for larger bipedal
visitors. But it's almost impossible to convince the naive motion detection
algorithms in my surveillance cameras not to respond to trees swaying back and
forth in the wind, or to the resulting rapid movement of patches of dappled
sunlight. Or to spiders crawling back and forth in front of the lens, building
a web. Or to moths that seem to be attracted to the IR illuminator at dusk. Or
to any number of other things that any human would instantly recognize as a
false alert, but that are very difficult for software to reject without
frequent mistakes in sensitivity, specificity or both.

It's hard to believe that anyone with an outdoor security camera hasn't had to
deal with similar hassles. I'm sure there are other applications for a camera
like this, but if I were an investor, I'd be very interested in the
intersection of ML and security in general. I'm definitely interested as a
homeowner.

~~~
askvictor
OTOH an 2 year old smartphone has a decent camera, CPU, and an inbuilt
battery. Powering the things is probably the trickiest part for remote
surveillance (even home surveillance if you don't want to rewire your house);
a battery helps with that.

~~~
CamperBob2
Compared to smartphones, outdoor cameras need to be built to very different
specifications and standards. I don't think my iPhone would last very long
duct-taped to the side of my house.

Of course, neither would a camera that's made out of cardboard and runs on a
Raspberry Pi. But for prototyping, this seems like it could offer a good
start.

------
guiomie
Where's the deep learning/tensorflow part? It shows how to assemble the box
with the components only.

~~~
askvictor
I can only that's the daughterboard; might be a TPU.

~~~
modeless
It's a Movidius Myriad 2 computer vision chip, model MA2450. This chip can
accelerate TensorFlow neural net models.

[https://uploads.movidius.com/1463156689-2016-04-29_VPU_Produ...](https://uploads.movidius.com/1463156689-2016-04-29_VPU_ProductBrief.pdf)

------
abdullahi1
Is the VPU on the Vision Bonnet the same as the one on the Intel Neural Stick
we saw earlier this year?

------
mirap
Had anyone else noticed the "privacy LED"? What it is for, how does it work?

~~~
j_s
Pretty sure it just is on when the camera is on, as an indicator.

------
shurcooL
This is really neat, I like the open hardware and software approach.

------
bruno2223
There's no YouTube demo link? Didn't find it...

------
ohazi
That lens mount looks super dodgy.

