Hacker News new | past | comments | ask | show | jobs | submit login
Supercharge your Computer Vision models with the TensorFlow Object Detection API (googleblog.com)
350 points by janober on June 16, 2017 | hide | past | web | favorite | 59 comments



Their repository is pretty neat! It includes three state-of-the-art architectures in object detection: Faster-RCNN, RFCN, and SSD. It is missing YOLO [1][2], though, which shares some similarities with SSD. Another detector is the recently released Mask-RCNN [3], which of course wouldn't be possible to be included in this publication as we can't travel through time yet.

[1]: https://arxiv.org/abs/1506.02640

[2]: https://arxiv.org/abs/1612.08242

[3]: https://arxiv.org/abs/1703.06870


There are already newer versions, Yolov2 and DSSD. See http://github.com/sbrugman/deep-learning-papers

In practice Faster R-CNN worked better for me than YOLOv2 as it, in contrast to what is reported in the paper, had a higher recall for the detect task I used it for.


"Speed/accuracy trade-offs for modern convolutional object detectors" seems to establish that Faster R-CNN beats R-FCN and SSD-type architectures in accuracy, however YOLOv2 can beat Faster R-CNN and R-FCN in speed, while retaining high accuracy.


So, could you use this to solve the image recognition captcha's that ask you to select all images that contain [object]?


Maybe they will make you do a Captcha before you access the API?


LOL. The API requires you to first identify all of the objects in a different picture.


That's how they train the API!


No, other people ARE the API!


It's a ponzapi scheme.


You're saying it's a kind of soylent green of APIs?


So then you bootstrap that using another api key :)


It's captchas all the way down. :)


Mutual recursion ;)


They can make you (or your customers) solve a captcha before each API call.


At that point the data will have likely gone full circle. So, maybe.

-- Edit: Strike that. They're not actually providing any model data afaict. I assumed this was comparable to AWS's offering.


They have pretrained weights from the COCO dataset included with the open source models.

https://github.com/tensorflow/models/tree/master/object_dete...


Is this a new Google API for use through their cloud offering or is it a set of tensorflow artifacts one can download and use freely without ever contacting Google Cloud?


It has been added to the TensorFlow github repository like Inception. You can use it completely independently from Google.

https://github.com/tensorflow/models/tree/master/object_dete...


Clicking through the two layers of links, it is a GitHub repository containing pre-trained models, training scripts, and scripts for running the models on Google Cloud: https://github.com/tensorflow/models/tree/master/object_dete...


Holy moly, I can't believe I didn't know about https://github.com/tensorflow/models


Yes. Both.


So they are launching all of these frameworks targeted to mobile but what's happening to Tensorflow Lite ? I'm beginning to think that these things that they are releasing are scaffolding for this . I really hope it's not going to be vaporware from google I/O


I missed I/O -- what's particular to Tensorflow Lite? Is that distinct from the CPU target?


Mobile focused version of tensorflow.


Lol. Parallel data computations across resource (including battery) constrained devices? Good news, the owner of the device is now the product. The device is also the product. Can't wait.


I'm going to guess that Google knows a thing or two about mobile devices and their performance characteristics. Also, feeding something through an already trained NN can be pretty darn performant. I'll wait and see what this ends up looking like, but I am hopeful.


Many SoCs have under-utilized DSPs that can be used for tensorflow.

E.g. https://www.qualcomm.com/news/onq/2017/01/09/tensorflow-mach...


It's not vaporware. (It's not released yet, but it's not vaporware.) (blah blah this is not an official statement blah blah)


Finally I'm getting the results for all those traffic sign CAPTCHAS I've been solving.

(And I just noticed I should not have include the post as part of the sign–sorry for any inaccuracies I may have caused)


Anyone know of a sample app that uses this?

Say to detect if something is or isn't a hot dog?




This would be great to run a security camera still feed through. It could completely eliminate false positives.


I wonder if Nest is using it with their new cam[0], as it has person alerts now (with face detection).

[0] https://nest.com/camera/meet-nest-cam-iq/


In the research blog entry they do say they are using it in the new Nest cams.

https://research.googleblog.com/2017/06/supercharge-your-com...


They would probably use FaceNet[0] then, if they only want to detect faces, as that should give better results.

[0] https://arxiv.org/abs/1503.03832


Except they wouldn't be able to detect people wearing masks, which is probably an important thing for a security camera to do...


We still have a ways to go, to completely eliminating false positives, but these tools will help us get there. For example, you can recognize different types of objects now but we still need to figure out which are meaningful or not (like a person or animal vs a tree blowing in the wind). Even certain classes, some are benign while others or not, for example pedestrians walking by the front of my house versus a guy wearing ski mask fiddling with my window, they're both people, but their behavior is what separates them.


Even a confidence level would go a long way. If I get a notification that says "motion detected" I have to look at it, but if it said "motion detected, person with 75% confidence" that suddenly becomes much more valuable.


Digital cameras pick up different parts of the spectrum. My curiosity is raised. Could that be used to increase confidence levels?


I recently came across a company that's built a ML model to track feet (for footfall observations). It seems that if you had an appropriate training set (labelled feet) you could re-create what they have done with this technology. Perhaps not achieving state-of-art but close. Thoughts?


They need some kind of context input.

-GPS position, intent/goal, domain etc.

I'm at a dog show I would want breed etc.

I'm on the street I just want it come back dog maybe dangerous dog, friendly dog.

Also, would be cool/scary to just get back movable object 1, person 1, living movable object 3 etc. and if I give it multiple scenes from a video it knows person 1 is the same person 1 and if I name (them) Tony it keeps tracking tony.


> I'm on the street I just want it come back dog maybe dangerous dog, friendly dog.

Most autonomous humans ship with this capability.



I imagine you could use the confidence-value output of the object-detection API as input into a separate system that would also incorporate the other inputs you mention.


Would it be able to detect textual regions in an image as it depics kite/persons in the example image?


Yes, if you train those models using a dataset with box annotations. A more relevant model if you want to transcribe the text : https://github.com/tensorflow/models/tree/master/attention_o...


Just spent the last 6 months making anpr camera. Now just need to put Python on it. Fun times.


My sentiment exactly. For my full-time startup, we've been trying, testing (many), and productionizing (one) object detection network for the past nine months. It was a tedious effort of implementing papers from last year's CVPR conference. This makes some of our MOJO go away, but in the scheme of things we can focus more closely on our business. Mixed bag.


What's the hype here. It's a curated model zoo, or?


The researchers have created a framework for object detection such that one can easily experiment with using different feature extraction networks, separated from the "meta-architecture" such as Faster R-CNN, R-FCN, or SSD, used to handle the object detection task. They compare many models using this framework, described in https://arxiv.org/abs/1611.10012 - and they were able to construct the winning entry of the COCO 2016 detection challenge based on this research.


This doesn't seem to include training scripts ?


I can't find the license, anyone have better luck?


The root of the repo has Apache license 2.0.


Basically everything Google releases is Apache 2.0. It was company policy when I was there.


Anyone know what license this is under?



Admins update submission please





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: