

Show HN: API for detecting people, cars, and everyday objects in images - jluan
https://www.dextrorobotics.com

======
simonsquiff
This reminds me of how my visual psychology professor was attempting to help
those with poor vision 15 years ago, but didn't appear to get anywhere with at
the time.

The idea was a simple (but clever) one - use virtual reality to segment the
world into solid blocks of identified objects. The solid blocks are
identifiable to those with poor vision in a way that the real world is not.

Essentially this meant processing an image, identifying items e.g. cars,
fences, roads etc and then colouring them solid. So instead of a confusing
scene of blur, you have a blurred but still identifiable scene of a solid
strip of grey for the road, a solid blob of red for the car, another solid
yellow stip for a fence etc. A poorly sighted person could still identify from
this something that made sense in a way that they couldn't in the real world.

What was required was an input, real time visual processing and then display
back to the user - all of which was fantasy 15 year ago.

However, attempt this today with a visual feed, real time processing like
this, and then near instantaneous display of the results back to the person
with e.g. google glass, and you might have a viable way to show the world
categorised in a visual way that will help those with poor vision. Interesting
times.

~~~
opminion
_However, attempt this today_

According to <http://news.ycombinator.com/item?id=4985100> it won't be today,
perhaps tomorrow.

~~~
dbaupp
I'm guessing that many severely sight-impaired people would be willing to take
the latency to have vision that is significantly more useful.

------
apu
Looks like this is using training data from the PASCAL VOC object detection
challenge [1], which is the standard benchmark for evaluating object detection
performance in computer vision.

Object detection is an extremely tough problem (some would say it is _the_
computer vision problem ;-)), and while we've made a lot of progress in the
past decade, the best methods are still terrible [2] -- average detection
precision between 30-50%. For reference, most consumer applications require an
AP of 90+% to be considered usable.

So if this is a completely automated solution, it's not going to be able to do
much better, unless the creators can make _massive_ (I mean orders-of-
magnitude) improvements on the state-of-the-art.

But that being said, there are some applications where lower performance is
acceptable. And if you add some manual verification, you could conceivably
make this much better (with an increase in latency, though). Another
possibility is to specialize on a certain type of input image (e.g., if you're
a company taking photos in your warehouse, where all your photos look very
similar and/or you can control the lighting and environment).

Still, I'm excited to see companies attempting to take object detection out to
the real world. All the best to these guys!

[1] <http://pascallin.ecs.soton.ac.uk/challenges/VOC/>

[2]
[http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2011/resu...](http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2011/results/index.html)

~~~
EwanG
One of my main hobbies is photography. I do mainly outdoor shots, and really
enjoy macros of flowers. The problem being that "oh last weekend I took an
amazing shot of a purple flower" isn't all that helpful for someone who is
trying to find a picture of an iris. When someone comes up with an algorithm
that can take my shot, compare it to a library, and tell me what wildflower it
is, I will be a happy camper. I suspect Flickr and 500px will also become more
valuable places since it would be possible to correlate geotagged shots with
flora to document what seems to be there.

~~~
apu
It's not quite what you want, but I worked on Leafsnap [1], which
automatically identifies trees by their leaves, using computer vision
techniques. We focused on leaves since they are present throughout much more
of the year than flowers. Our free apps also include high-resolution, high-
quality photos of all aspects of the species we cover -- leaves, flowers,
fruits, bark, etc. So you can at least browse through and compare the flowers
you're looking at with those in the app.

Our current coverage is of the trees of the northeast US (about 200 species),
but we are working on expanding that.

[1] <http://leafsnap.com>

------
ank286
This is not Pedro Felzenszwalb's discriminative part-model algorithm. This is
simple AdaBoost. The authors have labeled a bunch of datasets (1000s of them)
and are able to detect whatever object. AdaBoost (Viola/Jones) is the most
popular Yes/No detector, there is an OpenCV api for it. It used for detecting
faces and license plates in commercial applications. Full person detector is
nothing but a SVM+HOG descriptor.

As a computer vision researcher, I am not impressed by this. It is primarily
an api for smartphone app makers who want a binary result for detection. It
does not help with scene context analysis. For instance, if I have a big
picture of a airplane on a wall, it will detect the airplane.. Does it know
that this airplane is in the sky? or on a wall? There are a 1000 failure
cases.

~~~
cynwoody
It got zero of six airplanes for the link below, even though the images are
not overlapping and are against a blue sky background:

[http://www.keithcarter.com/wp-
content/uploads/2009/10/blue-a...](http://www.keithcarter.com/wp-
content/uploads/2009/10/blue-angels-formation-02.jpg)

~~~
martininmelb
But it did find one potted plant for that image. I could not see it (bottom
left hand corner).

~~~
tripzilch
_"Not again."_

------
forrestthewoods
Failed completely for me across a half dozen tries. I wonder how cheaply you
could get results via Mechanical Turk. I bet you could get much more accurate
results for a very low price but with some added latency.

~~~
aantix
One to two cents a task. Anytime you have a language agnostic task
(identifying/classifying objects, etc), the tasks can be done very cheaply.
Just make sure you do triplicate validation.

Language dependent/creative tasks run much higher (smaller worker pool, more
brain power needed).

~~~
benmccann
I've never used mechanical turk before and don't understand what you mean by
language agnostic. I'd want someone to tell me that it's a "car" and not a 汽車.
And I'd want to give the instructions for the task in English.

~~~
trentmb
If I had to guess what they meant: A car is a car, regardless if you call it
'car' or 'das auto.'

------
jluan
Hey everybody, OP here. Thanks for the great feedback! We're really happy that
so many people have checked this out.

One thing that I want to mention: our service was built favoring Precision
over Recall; we reasoned that we'd rather have a low number of false positives
and make sure that when we do report a detection, that it actually is one.
Thus, our service may occasionally miss instances.

I'm going to implement a button on the Experiment page that lets you flag a
detection as something that we need to work on; we will use your feedback to
improve the accuracy.

~~~
rjdagost
You might want to let the user decide if it is more important to have a false
positive or a false negative. For some applications a false alarm is a minor
nuisance but a false negative is catastrophic, but for some applications it is
flipped. In the past I have let the end user define the balance (i.e. "a false
negative is 10X as bad as a false positive") and the decision results were
scaled by their decision rule. It's not always easy to do as many machine
learning algorithms are nonlinear but at least you can cast a wider net of
potential customers.

------
bluishgreen
Can you give me bit more technical background. Tell me how this is better than
for eg. out of the box openCV filters.

~~~
dchichkov
It is probably based on "Object Detection with Discriminatively Trained Part
Based Models"; Pedro F. Felzenszwalb"... Somebody took
<http://people.cs.uchicago.edu/~rbg/latent/> made REST API and hooked up a
payment system.

------
senthilnayagam
[http://www.dauntless-
soft.com/products/freebies/airbus380/a3...](http://www.dauntless-
soft.com/products/freebies/airbus380/a380_5.jpg) detected 0 planes, there
should be at least 5

when used
[http://www.airbus.com/fileadmin/media_gallery/aircraft_pages...](http://www.airbus.com/fileadmin/media_gallery/aircraft_pages_photo_galleries/a380-gallery/A380_On_Ground.JPG)
it detected 2 planes, there was 1 only

but hope with additional training images, it would improve.

------
steeve
As a long time CV enthusiast, I applaude the tech and the way you guys make it
"just work". However for any serious application I feel a few things are
missing:

\- your pricing won't work for video (even at only 5fps)

\- I can't really use the data without a confidence level of detection.
Because for some applications I'd rather discard a bouding box that is below a
threshold I set.

Other than that, congrats for the great work :)

~~~
jluan
Hey steeve, thanks for the really kind feedback! We're aware of the video
pricing issue, and it's something that we're thinking hard to come up with a
solution to for makers and developers.

In the meantime, if you want to experiment with Dextro for video, shoot us an
email at team@dextrorobotics.com and we will hook you up!

With regard to confidence level, that's something that we provide the
enterprise-class service with; if this is a critical feature, we can
potentially offer it to everyone as well.

------
fchollet
Over a dozen experiments, the recognition rate for faces seems to be about
70%. Example of failure: only 2 faces detected here (in particular NOT the one
in focus)
[http://iamdaveknockles.files.wordpress.com/2011/03/meeting_j...](http://iamdaveknockles.files.wordpress.com/2011/03/meeting_jpg.jpg)

This is worse than OpenCV (I thought you were using OpenCV but apparently
aren't?)

~~~
cynwoody
Similar result with the image below. It got four of six faces, missing the
most important of the faces.

[http://www.bostonglobe.com/rw//Boston/2011-2020/WebGraphics/...](http://www.bostonglobe.com/rw//Boston/2011-2020/WebGraphics/National/BostonGlobe.com/2012/06/21healthcare/images/2006Romney.jpg)

~~~
bsenftner
If you read the description on the site, you'd know that the face detection
stops after finding 4 faces.

~~~
cynwoody
How would I know that? I don't read documentation until absolutely necessary.
If it claims to find faces, well, then let's see it work! And then we'll count
the faces found.

In any case, the documentation is wrong if it says that. E.g., the software
found all seven SEGs in the photo below:

[http://www.bagnewsnotes.com/files/2011/10/Romney-Bain-
Capita...](http://www.bagnewsnotes.com/files/2011/10/Romney-Bain-Capital-
money-shot.jpg)

------
limejuice
Tried detecting Airplanes on this image with 18 airplanes, but it only
detected 4 of them.

[https://lh3.ggpht.com/-GbPgbhUtmnE/UH0p3VmMWoI/AAAAAAAAApM/u...](https://lh3.ggpht.com/-GbPgbhUtmnE/UH0p3VmMWoI/AAAAAAAAApM/uuP6VHzyZ44/s1600/all-
planes_800.jpg)

~~~
tunnuz
The browser demo page says "Dextro supports up to 4 objects to be detected
concurrently when used via the full API.", I think that's the problem.

~~~
jluan
Hey tunnuz and limejuice, sorry to hear we only picked up on 4 of the planes.
We've biased our service towards precision rather than recall; thus, we try to
be wrong about detected objects as little of the time as possible at the
expense of perhaps missing a few object instances.

I want to clarify: the 4 object concurrent detection refers to 4 classes of
objects. On the Experiment page, you can only choose one class to detect on
(whether that is person, bottles, cars, etc). However, by using the API, you
can simultaneously search for cars, planes, people, and motorcycles, for
example.

~~~
tunnuz
Ok, so the two 4s have nothing to do with each other. Thanks for the
clarification.

------
limejuice
Tried to detect Cats in a whole room full of cats, and it detected zero cats.

[http://englishrussia.com/wp-
content/uploads/2007/08/130-cats...](http://englishrussia.com/wp-
content/uploads/2007/08/130-cats-1.jpg)

------
zopticity
Wow, I just tried this image of faces:

<http://3rdarm.biz/images/2010/02/faces.jpg>

It got almost all of them but so many errors. It can't detect sheeps either.

I was really impressed at first, but as I tried out more and more images, it
became apparent that the api isn't mature enough for one or two cents worth of
money. There is a 90% of the algorithm detect the image correctly, but
sometimes it doesn't detect the entire object. For example, I used another
image of two jets, but it only found one of them even though the jets were
identical, but one was smaller than the other.

------
afhof
Concerning the API: (On page <https://www.dextrorobotics.com/api>)

* The documentation is pretty weak.

* I am not sure what a classID is, and I don't see any links to where the numbers come from.

* The example request is posting to an insecure http address, but the secret api key is required?

* The example request doesn't fit on one line? It took me a while to see it was in the "GET / HTTP/1.1" style.

* How do errors work? Having clearly specified error responses would be really useful.

If you trying to sell me on your API, show it to me.

------
makeee
Did a few tests and it works pretty well! No false positives at least.

Any plans to increase the number of objects you can search for at once? Very
interested in using this but I'd want to be able to scan for ~20 objects.

~~~
mrtbld
It detects a false positive in this picture for "full-body person":
[http://farm9.staticflickr.com/8232/8366217251_972624d84b_b.j...](http://farm9.staticflickr.com/8232/8366217251_972624d84b_b.jpg)

The man is detected, but also a shape above the umbrella.

Edit: direct link the the result picture:
[https://s3.amazonaws.com/dextro_detection_results/debug13579...](https://s3.amazonaws.com/dextro_detection_results/debug1357948979.8611.jpg)

------
agotterer
Interesting technology. It got a couple correct for me. But failed on a bunch
as well. Here's a few horses it failed to find correctly.

2 horses / detected 0: [http://images4.fanpop.com/image/photos/23500000/horse-
horses...](http://images4.fanpop.com/image/photos/23500000/horse-
horses-23582505-1024-768.jpg)

4 horses / detected all as 1:
[http://4.bp.blogspot.com/-Rso9vw4BmSE/TqZU6vHl3kI/AAAAAAAACL...](http://4.bp.blogspot.com/-Rso9vw4BmSE/TqZU6vHl3kI/AAAAAAAACLk/NBBmDDLC9uY/s1600/slaughter%2Bof%2Bhorses%2Be%2BOctober%2B25%2B2011%2B3.jpg)

------
cabalamat
It didn't recognise the cat in this picture (<http://i.imgur.com/TgFaJh.jpg>)
so I'm doubtful of it's practicality.

------
bbayer
Very interesting application though, but I couldn't realize real life usage
via web api. As my knowledge those kind of stuff is for realtime applications
and with web based approach it might not serve the purpose.

BTW, it can find only two airplanes in this photo
<http://www.q8.com/SiteCollectionImages/Gatwick%20Airport.jpg>

------
philhippus
I have been tinkering with a similar side project which you can read about
here:

[http://artificial-intelligence-projects.com/augmented-
realit...](http://artificial-intelligence-projects.com/augmented-reality/)

It's still in the development stage because I can only fiddle with it when I
have the time and impetus to do so. Criticisms/comments welcome.

------
gesman
Hey - I found a bug - no cat was detected here: <http://i.imgur.com/wGxWy.jpg>

------
jschmitz28
It seems to have some trouble finding cats: <http://i.imgur.com/ONFis.jpg>

------
yesimahuman
Works well! Found a few it didn't work on. For example, it didn't detect an
airplane in this image (but it's a fighter jet, so maybe not part of the
training set): [http://cdn-www.airliners.net/aviation-
photos/photos/2/8/0/20...](http://cdn-www.airliners.net/aviation-
photos/photos/2/8/0/2043082.jpg)

------
stormen
Hmm.. It's a fantastic idea and really great website, but the actual algorithm
is very unprecise.

See this: <http://i.imgur.com/ulith.png?1>

You need to get a higher percentage of actual matches before you can use this
for anything.

------
treelovinhippie
Didn't work for me. That said, image recognition via an API will be huge once
things mature a little more.

I've been searching lately for a post-face.com API and have been following a
few for a while, but they seem to have similar issues with poor results.

------
ahc
It'd be great if you could use this to detect nudity. Any plans for that? I'm
assuming the balls on the "in the works" list are of the sport variety? ;)

In the works: Shoes Balls Smartphones and tablets Dogs Keyboards Cups and
glasses Doors Keys

~~~
cookingrobot
<https://github.com/pa7/nude.js>

------
liuliu
shamelessly plug: libccv supports REST-ful API in 0.4 version, it is open-
source, and free: <http://libccv.org/doc/doc-http/>. Trained pedestrian / car
/ face detectors are included.

------
mephi5t0
[http://rumors.automobilemag.com/files/2012/11/2013-VW-Eos-
fr...](http://rumors.automobilemag.com/files/2012/11/2013-VW-Eos-front-three-
quarter-with-plane.jpg)

detected 3 planes... there is only 1 plane and a car

------
captaincrunch
Failed with this :(

[http://i.dailymail.co.uk/i/pix/2012/11/06/article-2228752-15...](http://i.dailymail.co.uk/i/pix/2012/11/06/article-2228752-15E0DC91000005DC-176_634x286.jpg)

------
IceyEC
That's great! I like that you guys are offering a small free usage tier!

------
gesman
It didn't detect face here: <http://i.imgur.com/c34dX.jpg> The algorithm
probably got distracted and raised an exception.

~~~
keeran
UnexpectedHairsOnBreastException

------
piercebot
Is there a good way for submitting recommendations for improvements?
<http://i.imgur.com/yLSHW.jpg>

------
TommyDANGerous
That is amazingly awesome. Glad it can integrate with Ruby and Python. I
haven't even read the whole info and I already signed up.

------
MasterScrat
Isn't there a risk for this service to be used as an image proxy? The analyzed
images are rehosted on their S3...

------
blaze33
Over/underdetection of bicycles: <http://imgur.com/a/tIS6c>

------
gourneau
Very nice. Would y'all consider offering some embeddable solution that does
not need to be ran on the net.

~~~
thechut
Try OpenCV: <http://opencv.willowgarage.com/wiki/>

------
misleading_name
nice!

does a good job with painting too, but it did find the phantom neighbor
peeping in as well:

[http://img822.imageshack.us/img822/347/screenshot20130111at5...](http://img822.imageshack.us/img822/347/screenshot20130111at524.png)

------
danielharan
Are you willing to pay for this service but need higher accuracy? I'd love to
hear from you.

------
mephi5t0
detected 2 planes, there are 7
[http://iskin.co.uk/wallpapers/imagecache/1280x800/jet_plane_...](http://iskin.co.uk/wallpapers/imagecache/1280x800/jet_plane_formation.jpg)

seems too buggy to pay just yet

------
pkrein
seems pretty good, but my first test found a potted plant in the aeroplane
demo picture -- a 100 story potted plant :P very cool idea, super hard problem
so mad respect regardless!

------
leoplct
I tried to search for a cow into a horse's image, but it failed

------
sinzone
Hi, would love to have this API on Mashape.com

