
IBM Watson's Visual Recognition Demo - jonbaer
https://visual-recognition-demo.mybluemix.net/
======
dlau
After trialling several image recognition and categorization API services,
Watson was by far the least impressive when you use their default classifier.

For a project I've been building, I have used Clarif.ai, Google Vision API,
Watson, and Imagga. From empirical tests, Watson has always provided poor, if
not hilariously non-sensical, results. As an example, I use this image of a
baby in a stroller
([https://dl.dropboxusercontent.com/u/898689/IMG_2948.jpg](https://dl.dropboxusercontent.com/u/898689/IMG_2948.jpg)).
Here are the results in classification tags from the 4 services:

Clarifai: people, child, vehicle, woman, one, man, outdoors, emergency,
accident, protest, adult, transportation system, carriage, portrait,
wheelchair, safety, road, bike, leisure, wheel

Google: baby carriage, car seat, child, vehicle, diving equipment

Watson: performing, escalator, repairing, indoors, celebration, dancing,
human, amusement arcade, bottle, baggage claim, group of people, appliance,
tiger, people, big group, child, mixed color

Imagga: people portraits

These results obviously may vary depending on image subject, composition,
etc., but I've basically dismissed Watson as a viable off-the-shelf visual
recognition API. That being said, if you have a specific dataset of images
that positively and negatively identify a concept, Watson's custom classifiers
may be of interest although I haven't tried that.

~~~
dlau
By the way, I'm on the look-out for other image classification services or
open-source projects. I know I can (and will) deep dive into Tensor Flow deep-
learning, but any other suggestions are really welcome!

~~~
pierre
I regularly benchmark all image captioning approach, but human captioning is
still WAY better (see
[http://www.cloudsightapi.com/api](http://www.cloudsightapi.com/api), scroll
down for demo)

~~~
rahmaniacc
and probably way more expensive? But yeah, i have never seen a more accurate
captioning for images yet

------
nfriedly
Hey folks, I work on the Watson team at IBM, and wanted to clarify a couple of
things about this demo. The default classifier set is good for getting started
and getting an idea of how the service works, but it's only been trained on a
relatively small assortment of images, hence the sometimes bad results for
arbitrary images off the internet.

However, the real power of the service is the ability to train it for your
specific domain - after setting up one or more custom classifiers, you can get
far more accurate results on the tags that matter for your use case. You can
see a few examples of this and even try it out yourself on the "Train" tab.

------
mattygug
I have to say, I was trying out several services to train my own image
aesthetics classifiers and for that the service is pretty neat. Trained it on
50 Macro images in nature and 50 non Macro images in nature. 7/10 served
images were correct. To have a comparison, on Caffe or other open source
libraries I need to train with more than 10k images to get the same results +
setup the whole server myself. Assembling 10k samples can be quite tiring.
[https://www.dropbox.com/s/5gfhbrftu4z8xr4/Screenshot%202016-...](https://www.dropbox.com/s/5gfhbrftu4z8xr4/Screenshot%202016-03-20%2016.58.10.png?dl=0)

------
stewbrew
An image of the sea (e.g.,
[https://upload.wikimedia.org/wikipedia/commons/8/8e/Lines_of...](https://upload.wikimedia.org/wikipedia/commons/8/8e/Lines_of_sargassum_Sargasso_Sea.jpg))
returns whale or dolphin. I'm still waiting for one to show up.

This isn't actual visual "recognition" but rather pattern matching, I guess.

------
mrloop
Can anybody offer an explanation as to why an image of a tiger can have a
confidence score of 99% as a tiger but only 84% confidence score as an animal.
I guess the system has no concept of taxonomy.

~~~
GrantS
The classifiers were probably trained independently and are running
independently here as well, not in a hierarchical fashion.

Another way of saying this is that while the tiger training data consisted
completely of tigers (for positive training examples), the animal training
data _might_ not have had any tigers at all (however unlikely) -- i.e. it
could be recognizing the tiger image as containing similar enough features to
the images of dogs and cats and lions that it did see to trigger the animal
classifier, but with only 84% confidence.

------
femto113
Less than amazing. This pancake got "95% moon" and nothing else:
[http://www.hoborecipes.com/images/pancake.png](http://www.hoborecipes.com/images/pancake.png)

~~~
femto113
It does recognize these however: [https://yurielkaim.com/wp-
content/uploads/2015/09/Why-You-Sh...](https://yurielkaim.com/wp-
content/uploads/2015/09/Why-You-Shouldnt-Eat-Pancakes-for-Breakfast.jpg)

This is the biggest gap I find in existing classifiers, the inability to make
leaps from trained images (stack of pancakes) to others (single pancake) that
are trivial for humans. I suspect this is because most scoring algorithms for
rating classifiers don't appropriately penalize absurdly wrong answers, so
what we wind up with are systems that are simply very good at matching a new
image to a similar reference image, rather than really "figuring out" what the
picture is of.

------
ambirex
This appears to be the rebranded AlchemyVision product from AlchemyAPI (bought
by IBM last year).

~~~
pesenti
No. There is added functionality. It is allowing you to build your own
classifier.

------
danso
I've used this in teaching a class before as an example of a computer vision
API...it's important to understand what it does _not_ do...it doesn't have a
huge vocabulary of trained words...and just a couple of months ago, what it
was trained on was...not very optimal for showing off the product.

For example, this image of President Obama on the phone at his desk:

[https://farm2.staticflickr.com/1445/24129414022_f89da8ea52_b...](https://farm2.staticflickr.com/1445/24129414022_f89da8ea52_b_d.jpg)

\--- currently is interpreted by Watson's API as "90% person"...a month ago,
when I was demoing the product, it also returned "person"...but a whole
variety of other things too...most notably, "Flag Burning"...it's only when
you dive into the API and request a list of default classifiers did you see
that the API contained a lot of specific terms, but was by no means
comprehensive...so "Flag Burning" had been trained, but not just "Flag". It
seems they've cut down the vocabulary at this point, which is good, because it
shouldn't be judged against APIs that purport to do face/celebrity recognition
and have been trained for that.

I think the key advantage of this API is that it's the only one that I had
seen that allowed you to build your own classifier. Here's an example I found
in the wild, of someone building a classifier to differentiate between M1A1
Abrams tanks and...non-Abrams tanks (you upload a set of positive and negative
images to the API):

[http://cmadison.me/2015/12/03/classifying-tanks-with-the-
ibm...](http://cmadison.me/2015/12/03/classifying-tanks-with-the-ibm-visual-
recognition-service/)

Here are some quick notes I wrote for my class...I have no idea if they apply
exactly today:

[https://github.com/compciv/watson-preview](https://github.com/compciv/watson-
preview)

And here's a example of the Watson classifiers performing on sample White
House images:

[http://stash.compciv.org/samples/watson-
preview/printout.htm...](http://stash.compciv.org/samples/watson-
preview/printout.html)

My favorite is the pic of Obama and the Pope, of which the top classifier is
"Coffee_Maker -- 0.681709"

[http://stash.compciv.org/samples/watson-
preview/pics/popehou...](http://stash.compciv.org/samples/watson-
preview/pics/popehouse.jpg)

Now...the image is classified as: 99% Person, and 80% wedding...which,
actually, isn't the worst guess :)

------
Scryptonite
It classified this Labrador-"I HAVE NO IDEA WHAT I AM DOING"-meme:
[http://i.imgur.com/ZQaM2D7.jpg](http://i.imgur.com/ZQaM2D7.jpg) as vip: 45%,
moustache: 45%, person: 26%

[http://i.imgur.com/FYkxCNS.png](http://i.imgur.com/FYkxCNS.png)

~~~
AnkhMorporkian
Well, if it said VID instead it'd definitely be accurate, as that's a Very
Important Doggy.

~~~
sildur
It's a Very Important Perro

------
jordache
epic fail.. Bicycles are pretty easy things to recognize..

[http://builtbyswift.com/wp-
content/uploads/2015/10/2015_Swif...](http://builtbyswift.com/wp-
content/uploads/2015/10/2015_Swift_Industries_St_Helens_Ride_08_22_0071-2000x800.jpg)

[http://theradavist.com/wp-
content/uploads/2016/03/DRIVESIDE_...](http://theradavist.com/wp-
content/uploads/2016/03/DRIVESIDE_FINAL-1200x798.jpg)

[http://theradavist.com/wp-content/uploads/2016/02/morgan-
tay...](http://theradavist.com/wp-content/uploads/2016/02/morgan-taylor-kona-
sutra-ltd-1-1200x800.jpg)

------
ttrbls
Fail

[https://s3.amazonaws.com/uploads.hipchat.com/55839/380188/4O...](https://s3.amazonaws.com/uploads.hipchat.com/55839/380188/4OkflaNuiW5VnGc/Screen%20Shot%202016-03-22%20at%202.51.53%20PM.png)

------
thecopy
Not really impressed. Did not recognize Narutos selfie as a monkey:
[http://i.imgur.com/Pv72ZEZ.jpg](http://i.imgur.com/Pv72ZEZ.jpg)

It thought bird as the most probable

------
rocky1138
Doesn't do nudes.

~~~
kev009
This is inevitable by some system and will be interesting ethically. Revenge
porn, accidental sharing, "anonymous" sharing etc might today have some level
of implicit anonymity due to sheer volume of content, but if a machine can
precisely tag the media to a person, decisions made before the technology
existed will have retroactive impact on people's reputation, embarrassment,
privacy. It will probably be easy and popular to ridicule someone for poor
judgment when this happens, but that is quaint with benefit of hindsight when
it becomes an all encompassing dragnet.

I've heard from teachers that children these days have an interesting new
adversary with social media. On the contrary I've heard from people a bit
older than me they are grateful things like Facebook didn't exist when they
were teens. I'm in my mid 20s, I was perhaps the first WWW native cohort (I
started using Netscape in early 1995), but social media was originally text
(AIM) when I was particularly stupid, then Myspace and Facebook caught on at
high-school age so I kind of straddled the two generations.

------
Quasimoto3000
Has IBM able to do anything Watson besides Jeopardy as yet?

~~~
th0ma5
It has been aggressively marketed, does that count? It is essentially, IMHO, a
sales funnel into data science consulting services. They have some demo
interactive things you can play with, but the real deal is a whole suite of ML
techniques that they've either already made or can custom tailor to your
needs. It is managed I believe by Apache UIMA. But if you already are doing
this stuff, you may have a leg up in house.

Anyway... so you could probably think of it as you might think of web
development solutions, that is, there isn't one thing, but a suite of all
kinds of things, and that IMHO is Watson today. I think they have a big gaping
cloud offering as well to run those models they have or come up with as well,
but it isn't like Watson is this single computer that won Jeopardy, and that
exact computer they are now letting everyone use or something... I mean it is,
but it also isn't.

------
znpy
Meh.
[http://oi65.tinypic.com/11uagc6.jpg](http://oi65.tinypic.com/11uagc6.jpg)

------
guilamu
Very bad, Google photo is realy far more powerfull and accurate.

~~~
propogandist
Hundreds of millions of re-captcha service are solved daily by humans,
training Google's machine learning algorithm.

This obviously requires some training

