
Google Cloud Vision API enters Beta - axelfontaine
http://googlecloudplatform.blogspot.com/2016/02/Google-Cloud-Vision-API-enters-beta-open-to-all-to-try.html
======
abtinf
Disclosure: I am an evangelist for the Watson Developer Cloud suite of
services at IBM.

The new wave of vision services are amazing. There are a lot of players in
this field, including IBM Watson, which has a suite of vision APIs available
with similar features.

One key differentiator of the Watson offering is that we have a _trainable_
API called Visual Recognition [2]. The pre-trained APIs are excellent and have
broad uses, but it's amazing to see the results from even basic training to
identify image tags directly relevant to your use case. There is a demo [3]
that allows you to try it out by creating a new classifier right in the web
page.

You can find some demos at:

[http://vision.alchemy.ai/#demo](http://vision.alchemy.ai/#demo) \- example
images that demonstrate facial detection and identification, label extraction,
object identification, and so on.

Another demo at [http://visual-insights-demo.mybluemix.net/](http://visual-
insights-demo.mybluemix.net/) uses the Visual Insights [1] API to identify a
set of relevant tags.

[1]:
[https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc...](https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/visual-
insights.html)

[2]:
[https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc...](https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/visual-
recognition.html)

[3]: [https://visual-recognition-demo.mybluemix.net/](https://visual-
recognition-demo.mybluemix.net/)

~~~
teh
Do you have pricing information available anywhere? All I can see is [1] but
that's not really interesting. Compare with [2] which makes it quite easy to
guess how much I'd pay.

[1] [http://www.alchemyapi.com/products/contact-
sales](http://www.alchemyapi.com/products/contact-sales)

[2]
[https://cloud.google.com/vision/pricing](https://cloud.google.com/vision/pricing)

~~~
yrezgui
Hi Teh, I'm a IBM Watson Developer Evangelist. The pricing is available here
when you click on Standard plan:
[https://console.ng.bluemix.net/catalog/alchemy_api/](https://console.ng.bluemix.net/catalog/alchemy_api/)

~~~
philip142au
You make it too inconvenient to get started as compared to the others, its
about convenience and simplicity for on boarding people.

~~~
yrezgui
Hi Philip, can you drop me a mail at yrezgui [at] uk.ibm.com, I would be happy
to have a chat with you :)

------
tegansnyder
I've been using this since the private beta to enrich my eCommerce crawlers
with product identifiers not found on the content of an eCommerce product
page, but found in the product image itself. Imagine a part number or UPC
displayed on a product box, but nowhere in the HTML content of the product
page. Using the Google CV OCR feature I can extract meaningful product data
from an image to compliment my existing crawl data. It works great.

~~~
Laaw
Out of curiosity, why are there _so many_ people crawling sites for prices
like what you're talking about?

All of the freelance sites are... well, crawling with jobs involving web
scrapers. What's the business here that I'm missing?

~~~
tegansnyder
I'm not crawling for price. Knowing where our products are being sold online
is important to understand the landscape for a large manufacture.
Understanding opportunities to develop sales relationships we may or may not
have directly, and knowing category expansion opportunity is also important.

------
jfoster
I feel like this is some really compelling tech. It would be so amazing to
build stuff with this in mind. I wouldn't be comfortable doing it, though.
This sort of API is available only until Google decide that they don't want it
to be available. There's not really anything close to equivalent that you
could drop in to replace it if it were being shut down, the price were being
hiked, or you had some sort of other issue with it.

I'm not trying to pick on Google for shutting things down; I would feel
similarly if this API were from Microsoft or Facebook. It's not the first time
there's been an API that I think is really cool, but was very apprehensive
about actually using for anything serious.

~~~
nl
TensorFlow + TensorFlow Serving + Google ReCeption model plus optionally a SVN
on ReCeption features for your custom detection. All that code and the
pretrained model is Open Source. There's some engineering to glue it together
and some extra work for the easier, non-image classification parts.

There is also [http://www.deepdetect.com](http://www.deepdetect.com)

~~~
misiti3780
can you provide some links ?

~~~
nl
TensorFlow: [https://www.tensorflow.org/](https://www.tensorflow.org/)

TensorFlow Serving:
[https://github.com/tensorflow/serving](https://github.com/tensorflow/serving)

ReCeption (actually they call in Inception v3. Not sure where I got the
ReCeption name - though I'm sure I read it somewhere?):
[https://www.tensorflow.org/versions/r0.7/tutorials/image_rec...](https://www.tensorflow.org/versions/r0.7/tutorials/image_recognition/index.html)

Using a SVN on neural network extracted features:
[http://blog.christianperone.com/2015/08/convolutional-
neural...](http://blog.christianperone.com/2015/08/convolutional-neural-
networks-and-feature-extraction-with-python/)

If you want a quick and dirty version here's some Python to create a web
service that calls a Caffe based Image recognizer:
[https://gist.github.com/nlothian/c3519adb81b3452c1938](https://gist.github.com/nlothian/c3519adb81b3452c1938)

~~~
misiti3780
thanks!

------
netinstructions
I was looking into label detection APIs (and Google's offerings as well) for a
silly game/website I was thinking of writing, but $5 per 1000 images is way
too steep, especially if each user is submitting 1-5 images per interaction
with the website. The $2 per 1000 images price they mention on the blog post
is only if you're doing 5+ million images a month.

I played with IBM Watson visual recognition API and it didn't look like it did
what I needed it to (recognize a hand drawn image of a cat for example -- it
just kept labeling it only as a 'cartoon').

Bummer. At least the first 1000 images are free so I can prototype it out of
curiosity.

~~~
tommoor
Agreed, this is an exciting api but the pricing puts it out of reach for a lot
of applications

~~~
spoiler
I'd like to preface this with the fact that I have next to zero experience
about the topic of pricing/charging for such service's, so what I say might
sound naïve, but…

Is it really _that_ expensive, though? I mean I can see it being expensive if
you use it inside a product you offer for free, but if it's a commercial
product, I imagine the pricing isn't that much especially if you consider that
using your own infrastructure for this type of machine learning and image
scanning would be much much more expensive.

~~~
tommoor
Once the networks are trained it's probably very cheap to run the service.

But I guess that's my point, the pricing puts it out of reach of free or
cheaper subscription products. Even the robot example they give in the video
to achieve that functionality it would have to be taking a picture every
couple of seconds or so and uploading them to the cloud - at that rate it
would cost almost $5 for 15mins of use!

------
chippy
I couldn't find a specific legal SLA for this new service. Does anyone know
if:

1) by using the service you grant Google use of the uploaded images. (e.g.
they can use your image to increase their corpus, improve the service or use
it for advertising, or use it to extract street numbers for their maps, or its
always private and never stored)

2) What the resulting copyright is of the returned data. If you were to build
a database based on the results, what license or copyright status this would
be. Would all rights belong to me, or would Google claim rights over the
results.

~~~
SEMW
If there's no service specific T&Cs for this, it falls under their general
cloud T&Cs:
[https://cloud.google.com/terms/](https://cloud.google.com/terms/). s.5:

> 5.1 Intellectual Property Rights. Except as expressly set forth in this
> Agreement, this Agreement does not grant either party any rights, implied or
> otherwise, to the other’s content or any of the other’s intellectual
> property. As between the parties, Customer owns all Intellectual Property
> Rights in Customer Data and the Application or Project (if applicable), and
> Google owns all Intellectual Property Rights in the Services and Software.

> 5.2 Use of Customer Data. Google will not access or use Customer Data,
> except as necessary to provide the Services to Customer.

~~~
chippy
Thanks! I also note that the API pages each state "This is a Beta release of
Google Cloud Vision. This API is not covered by any SLA"

~~~
rajivm
SLA is not the same as T&C. No SLA means there is no uptime guarantee.

------
ig1
If the OCR is good then they're totally burying the lede, it's pricing is 100x
cheaper than commercial OCR APIs.

It's potentially a game changer, plenty of industries have piles of scanned
documents. Cheap OCR means this data suddenly becomes accessible even if the
value per individual document is low (i.e. for input into machine learning).

~~~
mgpc
Paper from a few years ago comparing Google's OCR system to commercially
available benchmarks:

[http://www.educatingsilicon.com/wp-
content/uploads/2013/10/p...](http://www.educatingsilicon.com/wp-
content/uploads/2013/10/photoocr_iccv_paper.pdf)

A lot better for text in photographs. Comparison might be different on dense
document text though.

------
jfoster
I don't know for certain, but I suspect that Google utilized images from the
web in training this system. Even if they didn't, suppose they had. I think
this can raise an interesting question around copyright.

In training an AI system with hundreds/thousands of bits of data, no single
piece of training data makes much of a difference. If one of my images on the
web that I had captioned with the keyword 'dog' was used to train this system
about what a dog looks like, is the model they end up with a derivative work
of my captioned image? Yes, but my data would make up an infinitesimally small
part of that model. Yet, in aggregate, the trained model might almost wholly
rely on lots of copyrighted, rights-reserved images.

Would the resulting model be a copyright infringement? It would seem as though
no rights owner would have a substantial enough claim. Yet, without all of the
copyrighted works, perhaps the model would be ineffective.

~~~
jaxondu
By using this API, we're effectively training Google's system to be more and
more accurate. Shouldn't Google pay us for using it? Just saying :-)

~~~
bduerst
Not quite. The terms and conditions say that you own the IP and that it isn't
used beyond providing the immediate service, meaning they aren't using it to
train future models.

Besides, it's a call without any feedback, so it's not that valuable as far as
training goes.

------
stevesearer
I would definitely pay for a WordPress plugin that uses this as I manually tag
photos on my site with a lot of standard things this could probably just knock
out in a flash.

~~~
arondeparon
Hi, noticed your comment and just wanted to let you know that, during the
alpha testing phase, I actually made this. It's somewhat rough and unreleased
now, but I am wondering what you would expect from such a plugin. Could you
answer some of these questions?

\- Would you primarily use the plugin to auto-tag uploads in the admin area?
\- Would you have problems with getting your own Google API key (and thus
billing via Google), or would you expect the plugin to take care of this as
well? \- Any other expectations?

As said, what I have now is not really release-worthy, but I can get it to
that point in a matter of days.

Thanks.

\- Aron

------
jrnkntl
Just tried a couple of images with digits to test out the OCR w/ the
TEXT_DETECTION setting, unfortunately it assumes what it reads is a defined
language with words. I am looking into using this for digit-recognition and
only digits, but that doesn't seem to be a use case (as it is now). Does
anybody know of another service/API that can do reliable digits-only OCR on
(not the finest clear quality) images?

~~~
mikevin
Weird, their example at [https://github.com/GoogleCloudPlatform/cloud-
vision/tree/mas...](https://github.com/GoogleCloudPlatform/cloud-
vision/tree/master/python/text) shows digits. Your best bet is probably ABBYY
if you want something else.

------
swampthinker
"joyLikelihood:VERY_LIKELY"

While this technology is fascinating, I can't help but feel a little unsettled
reading that.

------
mayank
Interesting that this is released in source closed, API-only form, rather than
the open-code model taken by TensorFlow. I wonder how far you could
approximate the model by training a learner on the API responses.

------
Jabbles
Is the best way of sending an image still base64 encoding it in JSON?

~~~
dsp1234
If 'best' means most compatible with varying types of systems large and small,
then yes.

------
dzhiurgis
Is there any website where can I just upload a pic and see how it works
without trying to figure out how to access their API?

~~~
ahamino
Try that app ..

[https://play.google.com/store/apps/details?id=com.affectiva....](https://play.google.com/store/apps/details?id=com.affectiva.affdexme&hl=en)

[https://itunes.apple.com/us/app/affdexme/id971529011?mt=8](https://itunes.apple.com/us/app/affdexme/id971529011?mt=8)

------
miltonmoura
This is great news. I have been working on a Swift framework for using this
API in OSX and iOS
([https://github.com/mgcm/CloudVisionKit](https://github.com/mgcm/CloudVisionKit))
and I was wondering when it (the API) would become available for public use.

~~~
ramramanathan1
With the Beta release, it is available for public use.

------
misiti3780
When I hit "go to api console" i get the following:
[https://www.dropbox.com/s/xsysabgywa4t5mm/Screenshot%202016-...](https://www.dropbox.com/s/xsysabgywa4t5mm/Screenshot%202016-02-18%2017.09.14.png?dl=0)

------
dk8996
At Cortex ([http://www.meetcortex.com/](http://www.meetcortex.com/)) we are
using this and technology like it to help brands be smarter about marketing
content on social media. Really cool stuff.

------
ahamino
Affectiva offers SDKs for facial expression and emotion analysis from images
that work in realtime and offline without having to send images to the cloud.

[http://developer.affectiva.com](http://developer.affectiva.com)

disclamer: I work for them.

------
SuperPaintMan
There is mention of GCV being able to calculate various image properties
(dominant colour, being the example) yet there is no reference to what it
actually returns in the API docs.

Can someone who has this active shed some light?

------
kevando
Google is scary good at releasing scary technology in a friendly box.

------
kauegimenes
Anyone tried using the API to solve captchas?

~~~
tommoor
"Google Cloud Captcha API"

------
chenster
Would it be possible to do product recognition such as brand and model from
images without label?

------
afro88
Does something like this exist for sound? Any open source projects worth
looking at?

~~~
killahpriest
[http://clarify.io/](http://clarify.io/)

------
piyushmakhija
Can we find the dimensions of things in a photo using this api?

------
misiti3780
Am I the only one that signed up but cant have access to it ?

------
nchiring
High price. May be suitable for MVP.

------
alwaysdoit
[http://xkcd.com/1425/](http://xkcd.com/1425/)

Errata: I'll need a research team and a year and a half.

