
Google Cloud Vision API changes the way applications understand images - ingve
http://googlecloudplatform.blogspot.com/2015/12/Google-Cloud-Vision-API-changes-the-way-applications-understand-images.html
======
nicklo
I wonder what this means for computer vision startups like
[http://clarifai.com](http://clarifai.com)?

Its hard to compete with Google on a task like image classification when
Google has immense computational resources, tons of data, and hoards of top
researchers.

~~~
akshayB
Forget new services, even a small change in their existing products by Google
can destroy lot of companies. Everyone remembers how they ruined email
marketing by caching gmail images.

~~~
gohrt
caching images prevents knowing the open rate; it doesn't prevent knowing the
response rate on click actions. If people are reading your emails but not
clicking, how is that valuable?

~~~
duderific
It's also valuable to know the open rate. If people aren't opening your email,
maybe it means your email subject line isn't "zingy" enough. At least you
could A/B test with different subject lines and see if it changed the open
rate.

------
abtinf
It's exciting to see more of these services come to market.

IBM Watson has a suite of vision APIs available that have some similar
features.

For example, the demo at
[http://vision.alchemy.ai/#demo](http://vision.alchemy.ai/#demo) has example
images that demonstrate facial detection and identification, label extraction,
object identification, and so on.

Another demo at [http://visual-insights-demo.mybluemix.net/](http://visual-
insights-demo.mybluemix.net/) uses the Visual Insights [1] API to identify a
set of relevant tags.

And the recently released Visual Recognition [2] API allows you to train the
model with your own dataset. Demo: [http://visual-recognition-
demo.mybluemix.net/](http://visual-recognition-demo.mybluemix.net/)

Disclosure: I am an evangelist for the Watson Developer Cloud suite of
services at IBM.

[1]:
[https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc...](https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/visual-
insights.html)

[2]:
[https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc...](https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/visual-
recognition.html)

~~~
Veratyr
Wow, that looks fantastic! I was looking like something like this for ages.

Some feedback though:

\- As an independent developer, pricing is important to me. It was very
difficult to find pricing for the Watson APIs (apparently it was in Bluemix?)
and if I wasn't a little more determined (thanks to the ability to train my
own classifier), I wouldn't have persevered.

\- If I already have a wealth of labelled data (I do), it seems difficult to
train a new classifier for the Visual Recognition service. If I have 200 000
images each with an average of 20 labels (from a set of ~2000 labels) for
example, the positive + negative sample per label is very time and bandwidth
consuming, as I have to train 2000 classifiers using ~ 5000 images per
classifier (for plenty of training data), for a total of ~10 000 000 uploads.
It'd be far nicer to be able to upload a folder of JPEGs with a JSON blob per
file containing labels (or a classifier name) and have Watson derive positive
and negative samples from it.

~~~
abtinf
Thank you for the solid feedback!

As a work-around for mass uploading, I might suggest signing up for a 30 day
free Bluemix trial [1]. You could then upload your data to a container and
script the creation of sample archives and uploads from there.

[1]:
[https://console.ng.bluemix.net/registration/](https://console.ng.bluemix.net/registration/)

------
Veratyr
I wonder what this'll mean for startups like
[https://imagga.com/](https://imagga.com/) and how pricing will change.

Also wonder if they'll bring the ability to train your own classifiers using
their networks...

~~~
GeorgiKadrev
We kind of expected that, as it was just a matter of time since they announces
Google Photos capabilities and recently the TensorFlow.

Funny enough I just answered a similar question a few days ago here -
[http://kaptur.co/10-questions-for-a-founder-
imagga/](http://kaptur.co/10-questions-for-a-founder-imagga/)

The bottom line is - we believe we can provide much better API service with
competing level of technology precision.

At the same time we also stress a lot on specific things like custom
categorization training and enterprise/on-premise installations (both differ
from custom software).

Actually we don't plan to run away in a niche market, though some people
suggest it as a proper strategy. We'll give them a good run for their money on
the broad use-case. I believe that the "Hacker" culture works in our favour in
this case and I hope you help us prove it :)

------
bcherny
I wonder how long it will take for someone to integrate this with Google
recaptcha...

~~~
dubcanada
Using Google Vision API to break Google Recaptcha? Interesting...

~~~
thedangler
They may already had the foresight to train what their Recaptcha looks like.

~~~
bcherny
sort of how the old captcha helped digitize books, people entering recaptchas
is free tagging to train vision's net.

~~~
kuschku
Which is why the data from ReCaptcha and NoCaptcha should be public domain –
it was created by society, it should be usable by society.

~~~
checker
If the data was public the black hats would use the ReCaptcha and NoCaptcha
data to defeat *Captcha and we'd be back to Square 1.

~~~
kuschku
The black hats have far easier way to use it.

But this data shouldn’t belong to a corporation.

It’s inacceptable that Google has the ability to use this dataset, their
market monopoly in this online market, as competitive advantage in the self-
driving car market.

Btw, black hats have no demand at all for captcha-solving solutions by
training captcha solvers – instead, hiring people from Bangladesh is far
cheaper and faster.

------
meirelles
Finally Google is releasing their best dog food as cloud service. Is better
than "copy" AWS.

------
obulpathi
Wondering how this can be used for specific content detection! Say I want to
take pictures of a crop using a drone and analyse the pictures for pests and
disesases etc ...

~~~
GeorgiKadrev
One if our key features in imagga.com is exactly this - being able to train
our own classifier with our help. Let's talk more if you are interested.

------
davidbarker
I've been going round in circles for the last 15 minutes, trying to sign up
for access. I've tried every ID I can find in my account for the "Google Cloud
Platform user account ID", but none seem to be working, and it won't allow me
to submit the form without a valid ID.

Is anyone else having the same issue/know where I can find this ID?

~~~
crb
As it's for whitelisting, it is generally an e-mail address - the ID you are
logged into Google with when accessing the Cloud console. If that doesn't work
for you, let me know - I work on GCP and I can ping the people who can fix it.

~~~
davidbarker
It worked — thanks!

------
joefkelley
I wonder if there is an API to train this on a custom dataset.

They have a few built-in models, but those seem pretty limiting in terms of
uses cases. I can think of a lot more where you would want to be able to train
on your own specialized images and labels.

~~~
GeorgiKadrev
There are multiple open source libraries for this, however optimising and
iterating is the key. Preparing the dataset to be representative enough and
has no overlap of categories/labels is also quite important.

This custom categorization is actually one of our most requested services at
Imagga - here is some info if you are interested -
[http://imagga.com/solutions/custom-
categorization.html](http://imagga.com/solutions/custom-categorization.html)

------
datashovel
The default assumption with cloud apis is the company offering the service
will not use data for internal consumption (even when it comes to training ML
models?). I'm assuming this will be the case with this one also?

------
mrfusion
Where can I see the couple hundred lines of Python code that powers the robot?

~~~
crb
You jest, but [https://tensorflow.org](https://tensorflow.org)

~~~
mrfusion
I'm not jesting. They mention it on the page.

------
datashovel
I know this may not be useful early on, since Google may not have a lot of
this kind of data that would make image recognition more useful, but I think
it makes sense to have a plan for eventually making it possible for developers
to integrate depth information. For example for those who have an XBox Kinect,
or more recently a "Project Tango" tablet.

------
tianlin-shi
This seems like a warning sign for all horizontal companies that claim to
solve AI/CV/ML in X years. If you don't focus on a specific vertical, Google
will beat you.

------
rugdobe
worth mentioning:
[https://www.projectoxford.ai/](https://www.projectoxford.ai/) from Microsoft

------
Narkov
What about a speech recognition API? That would be awesome!

------
it_learnses
Can this tell you approximate volume or quantity?

------
anindyabd
Great! The recognition seems fairly accurate, based on the examples they
provided (haven't used Google Photos myself much, though). I'm still wary,
though; I really hope we won't see a repeat of the labeling black people as
gorillas fiasco, which happened as recently as earlier this year:
[http://mashable.com/2015/07/01/google-photos-black-people-
go...](http://mashable.com/2015/07/01/google-photos-black-people-
gorillas/#wk4MIeWqCGqI). The article mentions that Google was looking into how
these mistakes can be prevented... I wonder what they did/are doing?

~~~
danso
They likely didn't purposefully "do" anything wrong -- at least in the sense
of some racist engineer tampering with the system to have it come out with
those results.

But this is the nature of machine learning algorithms, including both the
process of supervision and ability to view the impact feedbacks have on the
algorithms, and also, the impact the quality of the training set given to the
algorithms. At a lesser company, the problem could be as simple as very few
black people represented in the training set, so that when the algorithm sees
a dark-colored human-like shape, it is more "likely" that that shape is a
gorilla (which is human like and pretty much always has dark fur) than it is a
human, because the algorithm was trained mostly on light-colored humans. The
Google Photos algorithm obviously takes in more kinds of input and factors
besides visual composition so there was probably more to it than this.

Or maybe not...who knows? I'm not interested in reviving a discussion about
importance of diversity in the engineering workforce, but this is one kind of
problem that can slip by the most competent and well-intentioned of engineers
simply because they're less aware of how disenfranchisement can propagate into
technical problems, no matter how correct and powerful the math behind the
algorithm.

Another example from a few years back was when HP released a auto-tracking
webcam that became infamous after a black retail employee uploaded a YouTube
video of how the camera ignored him but not his white co-worker:

[http://www.cnn.com/2009/TECH/12/22/hp.webcams/](http://www.cnn.com/2009/TECH/12/22/hp.webcams/)

I'm in 100% agreement that this was likely not HP's intentional fault, and
also that face detection of darker complexions is computationally more complex
than it is for lighter complexions because of how contrast is used by the
algorithm...but I most definitely know that if I were an HP engineer, and if
the CEO and/or my direct boss were black and tried out a prototype that
behaved as it does in the aforementioned YouTube video, there is almost no
fucking way that the product would be released as-is, with my excuse being
"Well, accurately detecting black faces requires a much more complicated
training set -- that's just how math works!"

~~~
anindyabd
I have no doubt that neither Google nor HP made those error maliciously. I was
just curious as to whether or not it's possible to incorporate some sort of...
tact?... into these recognition algorithms to avoid labeling people (or other
things) offensively. Is it just a matter of a larger training set? It would be
hard to cover all sorts of people in all sorts of poses, with all sorts of
lighting conditions, etc.

~~~
tyrust
It's not about tact, it's just the algorithm doing its best. In order for the
algorithm to be capable of "tact", it needs to recognize that it's looking at
a person (or whatever). And if it recognized a person, then there wouldn't
have been this problem because it would just label it correctly.

~~~
IanCal
You can certainly include tact. The algorithm thinks it's more likely it's
51%-49% gorilla/person split, but a level above that chooses person as the
answer as even though it'll be wrong more often, the _impact of the error_ is
lower.

This is why you shouldn't just train your system to hit higher accuracy
figures but also investigate the type of errors it's making. This needs to be
done while thinking about your specific use case and domain.

