
Supercharge your Computer Vision models with the TensorFlow Object Detection API - janober
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html
======
yamaneko
Their repository is pretty neat! It includes three state-of-the-art
architectures in object detection: Faster-RCNN, RFCN, and SSD. It is missing
YOLO [1][2], though, which shares some similarities with SSD. Another detector
is the recently released Mask-RCNN [3], which of course wouldn't be possible
to be included in this publication as we can't travel through time yet.

[1]: [https://arxiv.org/abs/1506.02640](https://arxiv.org/abs/1506.02640)

[2]: [https://arxiv.org/abs/1612.08242](https://arxiv.org/abs/1612.08242)

[3]: [https://arxiv.org/abs/1703.06870](https://arxiv.org/abs/1703.06870)

~~~
mrplank
There are already newer versions, Yolov2 and DSSD. See
[http://github.com/sbrugman/deep-learning-
papers](http://github.com/sbrugman/deep-learning-papers)

In practice Faster R-CNN worked better for me than YOLOv2 as it, in contrast
to what is reported in the paper, had a higher recall for the detect task I
used it for.

~~~
pveierland
"Speed/accuracy trade-offs for modern convolutional object detectors" seems to
establish that Faster R-CNN beats R-FCN and SSD-type architectures in
accuracy, however YOLOv2 can beat Faster R-CNN and R-FCN in speed, while
retaining high accuracy.

------
elliottcarlson
So, could you use this to solve the image recognition captcha's that ask you
to select all images that contain [object]?

~~~
glup
Maybe they will make you do a Captcha before you access the API?

~~~
EGreg
So then you bootstrap that using another api key :)

~~~
ijidak
It's captchas all the way down. :)

~~~
fooker
Mutual recursion ;)

------
polskibus
Is this a new Google API for use through their cloud offering or is it a set
of tensorflow artifacts one can download and use freely without ever
contacting Google Cloud?

~~~
azernik
Clicking through the two layers of links, it is a GitHub repository containing
pre-trained models, training scripts, and scripts for running the models on
Google Cloud:
[https://github.com/tensorflow/models/tree/master/object_dete...](https://github.com/tensorflow/models/tree/master/object_detection)

~~~
radarsat1
Holy moly, I can't believe I didn't know about
[https://github.com/tensorflow/models](https://github.com/tensorflow/models)

------
zitterbewegung
So they are launching all of these frameworks targeted to mobile but what's
happening to Tensorflow Lite ? I'm beginning to think that these things that
they are releasing are scaffolding for this . I really hope it's not going to
be vaporware from google I/O

~~~
wyldfire
I missed I/O -- what's particular to Tensorflow Lite? Is that distinct from
the CPU target?

~~~
kyrra
Mobile focused version of tensorflow.

~~~
haimez
Lol. Parallel data computations across resource (including battery)
constrained devices? Good news, the owner of the device is now the product.
The device is also the product. Can't wait.

~~~
oh_sigh
I'm going to guess that Google knows a thing or two about mobile devices and
their performance characteristics. Also, feeding something through an already
trained NN can be pretty darn performant. I'll wait and see what this ends up
looking like, but I am hopeful.

------
matt4077
Finally I'm getting the results for all those traffic sign CAPTCHAS I've been
solving.

(And I just noticed I should not have include the post as part of the
sign–sorry for any inaccuracies I may have caused)

------
koolba
Anyone know of a sample app that uses this?

Say to detect if something is or isn't a hot dog?

~~~
fosk
Yes, here you go: [https://itunes.apple.com/us/app/not-
hotdog/id1212457521](https://itunes.apple.com/us/app/not-hotdog/id1212457521)

------
accountyaccount
This would be great to run a security camera still feed through. It could
completely eliminate false positives.

~~~
kyrra
I wonder if Nest is using it with their new cam[0], as it has person alerts
now (with face detection).

[0] [https://nest.com/camera/meet-nest-cam-iq/](https://nest.com/camera/meet-
nest-cam-iq/)

~~~
halflings
They would probably use FaceNet[0] then, if they only want to detect faces, as
that should give better results.

[0] [https://arxiv.org/abs/1503.03832](https://arxiv.org/abs/1503.03832)

~~~
odbol
Except they wouldn't be able to detect people wearing masks, which is probably
an important thing for a security camera to do...

------
monkeydust
I recently came across a company that's built a ML model to track feet (for
footfall observations). It seems that if you had an appropriate training set
(labelled feet) you could re-create what they have done with this technology.
Perhaps not achieving state-of-art but close. Thoughts?

------
sharemywin
They need some kind of context input.

-GPS position, intent/goal, domain etc.

I'm at a dog show I would want breed etc.

I'm on the street I just want it come back dog maybe dangerous dog, friendly
dog.

Also, would be cool/scary to just get back movable object 1, person 1, living
movable object 3 etc. and if I give it multiple scenes from a video it knows
person 1 is the same person 1 and if I name (them) Tony it keeps tracking
tony.

~~~
asciimo
> I'm on the street I just want it come back dog maybe dangerous dog, friendly
> dog.

Most autonomous humans ship with this capability.

~~~
bbrian
But not all.

[https://www.google.com/search?q=blind+man+bitten+by+dog](https://www.google.com/search?q=blind+man+bitten+by+dog)

------
Omnipresent
Would it be able to detect textual regions in an image as it depics
kite/persons in the example image?

~~~
vivekrathod
Yes, if you train those models using a dataset with box annotations. A more
relevant model if you want to transcribe the text :
[https://github.com/tensorflow/models/tree/master/attention_o...](https://github.com/tensorflow/models/tree/master/attention_ocr)

------
mlaretallack
Just spent the last 6 months making anpr camera. Now just need to put Python
on it. Fun times.

~~~
TuringNYC
My sentiment exactly. For my full-time startup, we've been trying, testing
(many), and productionizing (one) object detection network for the past nine
months. It was a tedious effort of implementing papers from last year's CVPR
conference. This makes some of our MOJO go away, but in the scheme of things
we can focus more closely on our business. Mixed bag.

------
nzjrs
What's the hype here. It's a curated model zoo, or?

~~~
pveierland
The researchers have created a framework for object detection such that one
can easily experiment with using different feature extraction networks,
separated from the "meta-architecture" such as Faster R-CNN, R-FCN, or SSD,
used to handle the object detection task. They compare many models using this
framework, described in
[https://arxiv.org/abs/1611.10012](https://arxiv.org/abs/1611.10012) \- and
they were able to construct the winning entry of the COCO 2016 detection
challenge based on this research.

------
throwaway321373
This doesn't seem to include training scripts ?

------
Drdrdrq
I can't find the license, anyone have better luck?

~~~
advisedwang
The root of the repo has Apache license 2.0.

~~~
nostrademons
Basically everything Google releases is Apache 2.0. It was company policy when
I was there.

------
Joboman555
Anyone know what license this is under?

------
aw3c2
Direct link [https://research.googleblog.com/2017/06/supercharge-your-
com...](https://research.googleblog.com/2017/06/supercharge-your-computer-
vision-models.html)

~~~
artursapek
Admins update submission please

~~~
impish19
[https://news.ycombinator.com/item?id=14562314](https://news.ycombinator.com/item?id=14562314)

