The idea was a simple (but clever) one - use virtual reality to segment the world into solid blocks of identified objects. The solid blocks are identifiable to those with poor vision in a way that the real world is not.
Essentially this meant processing an image, identifying items e.g. cars, fences, roads etc and then colouring them solid. So instead of a confusing scene of blur, you have a blurred but still identifiable scene of a solid strip of grey for the road, a solid blob of red for the car, another solid yellow stip for a fence etc. A poorly sighted person could still identify from this something that made sense in a way that they couldn't in the real world.
What was required was an input, real time visual processing and then display back to the user - all of which was fantasy 15 year ago.
However, attempt this today with a visual feed, real time processing like this, and then near instantaneous display of the results back to the person with e.g. google glass, and you might have a viable way to show the world categorised in a visual way that will help those with poor vision. Interesting times.
According to http://news.ycombinator.com/item?id=4985100 it won't be today, perhaps tomorrow.
Object detection is an extremely tough problem (some would say it is the computer vision
problem ;-)), and while we've made a lot of progress in the past decade, the best
methods are still terrible  -- average detection precision between 30-50%.
For reference, most consumer applications require an AP of 90+% to be considered
So if this is a completely automated solution, it's not going to be able to do
much better, unless the creators can make massive (I mean orders-of-magnitude)
improvements on the state-of-the-art.
But that being said, there are some applications where lower performance is
acceptable. And if you add some manual verification, you could conceivably
make this much better (with an increase in latency, though). Another possibility
is to specialize on a certain type of input image (e.g., if you're a company
taking photos in your warehouse, where all your photos look very similar and/or
you can control the lighting and environment).
Still, I'm excited to see companies attempting to take object detection out to
the real world. All the best to these guys!
Our current coverage is of the trees of the northeast US (about 200 species), but we are working on expanding that.
In this case, you have to tell it which object you're looking for.
As a computer vision researcher, I am not impressed by this. It is primarily an api for smartphone app makers who want a binary result for detection. It does not help with scene context analysis. For instance, if I have a big picture of a airplane on a wall, it will detect the airplane.. Does it know that this airplane is in the sky? or on a wall?
There are a 1000 failure cases.
Language dependent/creative tasks run much higher (smaller worker pool, more brain power needed).
We've developed RTFM at CrowdFlower to handle the similar task of moderating images and providing detailed reasons for why they are flagged. It's a common problem that the computers can't solve well enough yet.
Sheeps were detected as horses, faces but not as cats or cars. This seems to be the current state of the art for a general purpose classification. I haven't seen anything better yet (unless you'd specialize in sheep detection)
One thing that I want to mention: our service was built favoring Precision over Recall; we reasoned that we'd rather have a low number of false positives and make sure that when we do report a detection, that it actually is one. Thus, our service may occasionally miss instances.
I'm going to implement a button on the Experiment page that lets you flag a detection as something that we need to work on; we will use your feedback to improve the accuracy.
Your application detects none of them... Is it because my ancient phone camera's pics are too grainy? Or do the bikes need to be en profile to be detected properly? Or maybe it's trained to detect bikes with people on them, instead of bikes parked in the street?
when used http://www.airbus.com/fileadmin/media_gallery/aircraft_pages... it detected 2 planes, there was 1 only
but hope with additional training images, it would improve.
This is worse than OpenCV (I thought you were using OpenCV but apparently aren't?)
In any case, the documentation is wrong if it says that. E.g., the software found all seven SEGs in the photo below:
- your pricing won't work for video (even at only 5fps)
- I can't really use the data without a confidence level of detection. Because for some applications I'd rather discard a bouding box that is below a threshold I set.
Other than that, congrats for the great work :)
In the meantime, if you want to experiment with Dextro for video, shoot us an email at email@example.com and we will hook you up!
With regard to confidence level, that's something that we provide the enterprise-class service with; if this is a critical feature, we can potentially offer it to everyone as well.
I want to clarify: the 4 object concurrent detection refers to 4 classes of objects. On the Experiment page, you can only choose one class to detect on (whether that is person, bottles, cars, etc). However, by using the API, you can simultaneously search for cars, planes, people, and motorcycles, for example.
It got almost all of them but so many errors. It can't detect sheeps either.
I was really impressed at first, but as I tried out more and more images, it became apparent that the api isn't mature enough for one or two cents worth of money. There is a 90% of the algorithm detect the image correctly, but sometimes it doesn't detect the entire object. For example, I used another image of two jets, but it only found one of them even though the jets were identical, but one was smaller than the other.
2 horses / detected 0: http://images4.fanpop.com/image/photos/23500000/horse-horses...
4 horses / detected all as 1: http://4.bp.blogspot.com/-Rso9vw4BmSE/TqZU6vHl3kI/AAAAAAAACL...
Any plans to increase the number of objects you can search for at once? Very interested in using this but I'd want to be able to scan for ~20 objects.
The man is detected, but also a shape above the umbrella.
Edit: direct link the the result picture: https://s3.amazonaws.com/dextro_detection_results/debug13579...
* The documentation is pretty weak.
* I am not sure what a classID is, and I don't see any links to where the numbers come from.
* The example request is posting to an insecure http address, but the secret api key is required?
* The example request doesn't fit on one line? It took me a while to see it was in the "GET / HTTP/1.1" style.
* How do errors work? Having clearly specified error responses would be really useful.
If you trying to sell me on your API, show it to me.
BTW, it can find only two airplanes in this photo
It's still in the development stage because I can only fiddle with it when I have the time and impetus to do so. Criticisms/comments welcome.
You need to get a higher percentage of actual matches before you can use this for anything.
I've been searching lately for a post-face.com API and have been following a few for a while, but they seem to have similar issues with poor results.
In the works:
Smartphones and tablets
Cups and glasses
detected 3 planes... there is only 1 plane and a car
does a good job with painting too, but it did find the phantom neighbor peeping in as well:
seems too buggy to pay just yet