Hacker News new | comments | ask | show | jobs | submit login
Google Cloud Vision API changes the way applications understand images (googlecloudplatform.blogspot.com)
262 points by ingve on Dec 2, 2015 | hide | past | web | favorite | 73 comments



I wonder what this means for computer vision startups like http://clarifai.com?

Its hard to compete with Google on a task like image classification when Google has immense computational resources, tons of data, and hoards of top researchers.


Google has a long history of failing to productize otherwise brilliant technology. Google Analytics is free and yet Mixpanel, Kissmetrics, Intercom, and a whole raft of analytics startups appear to be thriving by solving specific problems. AI and Computer Vision is a big space.


I think you're forgetting the reason why Google Analytics is free. It's not to generate profit, it's to collect data.


His point still stands. With data as the focus, GA should be built in such a way that users don't see the need tyo use Mixpanel & others.

But that is not the case. Which suggest Google isn't that good at productivizing stuff.


They can pivot to selling their technology to company that doesn't want to the expose their data to public cloud providers, or become more like a consulting company for customized vision technology though.


Hi Nick! I think every startup considers "what if Google launches into direct competition with me" at some point. We've definitely been preparing for this day (oh, I work at Clarifai). From a sheer technology standpoint, Google is the big gorilla in the room. But, computational resources and data are basically commodities. What matters is what you do with the data - how you train the models (just like different media companies, each machine learning company has its own editorial voice when it comes to tagging), custom training and domain models, the level of service you provide to developers, the integrations you have with other services, etc. These are all things we feel like we do better, and we'll continue to make our API the most developer-friendly one out there. In the immortal words of Coach Taylor, "Clear eyes, full hearts, can't lose." :D


Oh and shameless plug here, if you want to try video recognition, the Clarifai API already does it: http://clarifai.com/


Forget new services, even a small change in their existing products by Google can destroy lot of companies. Everyone remembers how they ruined email marketing by caching gmail images.


From what I know, it's still possible to track Gmail opens and the change Google made in fact made it easier by making image loading default to on.

When you open an email with images, Google will proxy the request for the tracking image and then cache it. If each user has a unique tracking image, you know when it was opened. Google is not caching the images before the open so you do know that it was opened.

What you possibly lose is repeat opens which might end up with the cached image.

That seems like a loss, but with this change, Google turned on images by default. So you get loads of Gmail users loading images by default rather than the old way where many more people would be loading your message with images off.

MailChimp has a little write-up about it: https://blog.mailchimp.com/how-gmails-image-caching-affects-...

tl;dr: Google's change didn't stop marketers from knowing you opened the email, but did potentially block cookie sending, IP address information, referrer information, etc.


It was arguably a net-win for user privacy. Admittedly - it doesn't hurt Google to weaken other marketing channels but it's disingenuous to frame it this way without also pointing out that it's a lot more user-friendly than 'images in this email are blocked. click to enable'.


Similarly to https://xkcd.com/1172/, every change breaks someone's business model.


There are still plenty of successful email marketing companies.


caching images prevents knowing the open rate; it doesn't prevent knowing the response rate on click actions. If people are reading your emails but not clicking, how is that valuable?


It's also valuable to know the open rate. If people aren't opening your email, maybe it means your email subject line isn't "zingy" enough. At least you could A/B test with different subject lines and see if it changed the open rate.


one annoying as crap thing is that it prevents images from working against my dev vpn


Even a change in the pricing of say, the Google Places API (previously part of Maps, now billed differently) can cause drastic effects. I wonder if anyone at Google realise how much power they wield.


While this is a great example of monopoly done right and I'm very happy they did so, it is also a lesson about being too dependent on third party solutions.


They won't be the only ones with this tech forever. In so much as a solution depends on "cleverness" rather than raw computing power, it'll have room for competition. They put a lot of effort figuring out what works and what doesn't but now that they have a product, the lesson is out and will be recreated by others. Google is leading, but they have also shown the direction to go and provided something to benchmark against.


> In so much as a solution depends on "cleverness" rather than raw computing power, it'll have room for competition.

Unfortunately, it's looking more and more like it's going to be a competition on training data and raw computational power, and it's hard to compete with Google's corpora from the web, gmail, captchas, maps, etc. -- not to mention Google's tremendous number crunching resources.


What other data sets exist that would be on par with Google's web/gmail/maps/etc. data sets for training models? Is there a chance Google would every make a version of their data sets available to the public?


At least right now there aren't many other places that have as much data flowing through them, or at least as much almost unrestricted access to data (for example, Amazon has plenty of data but very little legal access).

And possibly, though for privacy reasons in a much more filtered down form (see the word2vec Google News dataset). I find it unlikely for private data though.


The solution for those startups may be to target more specific solutions and smaller niches instead of aiming for "general" classification and image tagging solutions.

Could be any kind of niche, from specific industries (utilities, media, transport...) to specific use cases (I don't have many in mind, one could be https://sightengine.com). Ideas welcome :)


They can focus on selling their product to EU companies, with the guarantee that the data never gets into the hands of NSA, GCHQ, etc?


[deleted]


You're betting on a a startup's product to stick around longer than a tech giant's?


I don't think it's right to consider all tech giant products equally. You have to consider things like whether the product or service is a significant source of revenue relative to the company's overall business.

For example, Google probably could've made Reader profitable, but it was not on track to be used by a billion people. Relative to Google's overall business, it was uninteresting to them. It might've been a great startup, though.


It's exciting to see more of these services come to market.

IBM Watson has a suite of vision APIs available that have some similar features.

For example, the demo at http://vision.alchemy.ai/#demo has example images that demonstrate facial detection and identification, label extraction, object identification, and so on.

Another demo at http://visual-insights-demo.mybluemix.net/ uses the Visual Insights [1] API to identify a set of relevant tags.

And the recently released Visual Recognition [2] API allows you to train the model with your own dataset. Demo: http://visual-recognition-demo.mybluemix.net/

Disclosure: I am an evangelist for the Watson Developer Cloud suite of services at IBM.

[1]: https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc...

[2]: https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc...


Wow, that looks fantastic! I was looking like something like this for ages.

Some feedback though:

- As an independent developer, pricing is important to me. It was very difficult to find pricing for the Watson APIs (apparently it was in Bluemix?) and if I wasn't a little more determined (thanks to the ability to train my own classifier), I wouldn't have persevered.

- If I already have a wealth of labelled data (I do), it seems difficult to train a new classifier for the Visual Recognition service. If I have 200 000 images each with an average of 20 labels (from a set of ~2000 labels) for example, the positive + negative sample per label is very time and bandwidth consuming, as I have to train 2000 classifiers using ~ 5000 images per classifier (for plenty of training data), for a total of ~10 000 000 uploads. It'd be far nicer to be able to upload a folder of JPEGs with a JSON blob per file containing labels (or a classifier name) and have Watson derive positive and negative samples from it.


Thank you for the solid feedback!

As a work-around for mass uploading, I might suggest signing up for a 30 day free Bluemix trial [1]. You could then upload your data to a container and script the creation of sample archives and uploads from there.

[1]: https://console.ng.bluemix.net/registration/


One of the various open source alternatives, http://www.deepdetect.com/applications/model/


Thank you for posting this. I had seen Watson's stuff before whenever it was originally released, but have recently stumbled upon it again and am just gobsmacked at the APIs available, such as speech/text translation. I don't want to insinuate that you could be doing a better job at evangelizing ;)...but the more you just remind people of Watson, the more happiness you'll bring to a good number of developers purely through introducing them to what they might have ignored because they didn't think to look at IBM. I mean I see far more buzz around WolframAlpha stuff and yet not much of what they've offered seems as accessible or purely useful as what Watson purports to have.

(I would also say the exact same thing for Microsoft and it's Project Oxford APIs, whoever here is working as its evangelist)


I would be interested in seeing a comparison - how the different vision APIs react to the same 10-15 images.


There is an ad-hoc comparison of a few of top existing services here:

http://imagga.com/blog/imagga-and-6-alternative-image-recogn...

and here: http://imagga.com/blog/some-more-image-recognition-tests-som...

Disclaimer: I'm CEO & CTO of Imagga.


It would be very interesting if the two looped together, each using the other as a reference to improve their own parameters ;).



I wonder what this'll mean for startups like https://imagga.com/ and how pricing will change.

Also wonder if they'll bring the ability to train your own classifiers using their networks...


We kind of expected that, as it was just a matter of time since they announces Google Photos capabilities and recently the TensorFlow.

Funny enough I just answered a similar question a few days ago here - http://kaptur.co/10-questions-for-a-founder-imagga/

The bottom line is - we believe we can provide much better API service with competing level of technology precision.

At the same time we also stress a lot on specific things like custom categorization training and enterprise/on-premise installations (both differ from custom software).

Actually we don't plan to run away in a niche market, though some people suggest it as a proper strategy. We'll give them a good run for their money on the broad use-case. I believe that the "Hacker" culture works in our favour in this case and I hope you help us prove it :)


I wonder how long it will take for someone to integrate this with Google recaptcha...


Using Google Vision API to break Google Recaptcha? Interesting...


They may already had the foresight to train what their Recaptcha looks like.


sort of how the old captcha helped digitize books, people entering recaptchas is free tagging to train vision's net.


Which is why the data from ReCaptcha and NoCaptcha should be public domain – it was created by society, it should be usable by society.


If the data was public the black hats would use the ReCaptcha and NoCaptcha data to defeat *Captcha and we'd be back to Square 1.


The black hats have far easier way to use it.

But this data shouldn’t belong to a corporation.

It’s inacceptable that Google has the ability to use this dataset, their market monopoly in this online market, as competitive advantage in the self-driving car market.

Btw, black hats have no demand at all for captcha-solving solutions by training captcha solvers – instead, hiring people from Bangladesh is far cheaper and faster.


Isn't recaptcha just a checkbox now?


yes, but the fallback is a more traditional, image tagging captcha. see https://googleonlinesecurity.blogspot.com/2014/12/are-you-ro...


Finally Google is releasing their best dog food as cloud service. Is better than "copy" AWS.


Wondering how this can be used for specific content detection! Say I want to take pictures of a crop using a drone and analyse the pictures for pests and disesases etc ...


One if our key features in imagga.com is exactly this - being able to train our own classifier with our help. Let's talk more if you are interested.


I've been going round in circles for the last 15 minutes, trying to sign up for access. I've tried every ID I can find in my account for the "Google Cloud Platform user account ID", but none seem to be working, and it won't allow me to submit the form without a valid ID.

Is anyone else having the same issue/know where I can find this ID?


As it's for whitelisting, it is generally an e-mail address - the ID you are logged into Google with when accessing the Cloud console. If that doesn't work for you, let me know - I work on GCP and I can ping the people who can fix it.


It worked — thanks!


I already quit the signup process once because I got frustrated trying to find the ID and couldn't, then came back later, still no success.

Anyone who has successfully signed up, where is the user account ID?


Apologies for the difficulty in entering your information. We just need the email address that you use to log onto the GCP console. For your project Id - you can find that in the developer console. IF you click on Dashboard and the upper Left corner has the Project Name and the project Id.


i was confused by this too, i have never seen a 'google cloud platform user account id' before and i have been using app engine for years - so i just used the email address tied to my app engine account, seemed to work just fine


Indeed, it does. Thanks.


I wonder if there is an API to train this on a custom dataset.

They have a few built-in models, but those seem pretty limiting in terms of uses cases. I can think of a lot more where you would want to be able to train on your own specialized images and labels.


There are multiple open source libraries for this, however optimising and iterating is the key. Preparing the dataset to be representative enough and has no overlap of categories/labels is also quite important.

This custom categorization is actually one of our most requested services at Imagga - here is some info if you are interested - http://imagga.com/solutions/custom-categorization.html


You can use TensorFlow to train your own networks and setup your own ML pipeline.


The default assumption with cloud apis is the company offering the service will not use data for internal consumption (even when it comes to training ML models?). I'm assuming this will be the case with this one also?


Where can I see the couple hundred lines of Python code that powers the robot?


You jest, but https://tensorflow.org


I'm not jesting. They mention it on the page.


I know this may not be useful early on, since Google may not have a lot of this kind of data that would make image recognition more useful, but I think it makes sense to have a plan for eventually making it possible for developers to integrate depth information. For example for those who have an XBox Kinect, or more recently a "Project Tango" tablet.


This seems like a warning sign for all horizontal companies that claim to solve AI/CV/ML in X years. If you don't focus on a specific vertical, Google will beat you.


worth mentioning: https://www.projectoxford.ai/ from Microsoft


What about a speech recognition API? That would be awesome!


Can this tell you approximate volume or quantity?


Great! The recognition seems fairly accurate, based on the examples they provided (haven't used Google Photos myself much, though). I'm still wary, though; I really hope we won't see a repeat of the labeling black people as gorillas fiasco, which happened as recently as earlier this year: http://mashable.com/2015/07/01/google-photos-black-people-go.... The article mentions that Google was looking into how these mistakes can be prevented... I wonder what they did/are doing?


They likely didn't purposefully "do" anything wrong -- at least in the sense of some racist engineer tampering with the system to have it come out with those results.

But this is the nature of machine learning algorithms, including both the process of supervision and ability to view the impact feedbacks have on the algorithms, and also, the impact the quality of the training set given to the algorithms. At a lesser company, the problem could be as simple as very few black people represented in the training set, so that when the algorithm sees a dark-colored human-like shape, it is more "likely" that that shape is a gorilla (which is human like and pretty much always has dark fur) than it is a human, because the algorithm was trained mostly on light-colored humans. The Google Photos algorithm obviously takes in more kinds of input and factors besides visual composition so there was probably more to it than this.

Or maybe not...who knows? I'm not interested in reviving a discussion about importance of diversity in the engineering workforce, but this is one kind of problem that can slip by the most competent and well-intentioned of engineers simply because they're less aware of how disenfranchisement can propagate into technical problems, no matter how correct and powerful the math behind the algorithm.

Another example from a few years back was when HP released a auto-tracking webcam that became infamous after a black retail employee uploaded a YouTube video of how the camera ignored him but not his white co-worker:

http://www.cnn.com/2009/TECH/12/22/hp.webcams/

I'm in 100% agreement that this was likely not HP's intentional fault, and also that face detection of darker complexions is computationally more complex than it is for lighter complexions because of how contrast is used by the algorithm...but I most definitely know that if I were an HP engineer, and if the CEO and/or my direct boss were black and tried out a prototype that behaved as it does in the aforementioned YouTube video, there is almost no fucking way that the product would be released as-is, with my excuse being "Well, accurately detecting black faces requires a much more complicated training set -- that's just how math works!"


I have no doubt that neither Google nor HP made those error maliciously. I was just curious as to whether or not it's possible to incorporate some sort of... tact?... into these recognition algorithms to avoid labeling people (or other things) offensively. Is it just a matter of a larger training set? It would be hard to cover all sorts of people in all sorts of poses, with all sorts of lighting conditions, etc.


It's not about tact, it's just the algorithm doing its best. In order for the algorithm to be capable of "tact", it needs to recognize that it's looking at a person (or whatever). And if it recognized a person, then there wouldn't have been this problem because it would just label it correctly.


You can certainly include tact. The algorithm thinks it's more likely it's 51%-49% gorilla/person split, but a level above that chooses person as the answer as even though it'll be wrong more often, the impact of the error is lower.

This is why you shouldn't just train your system to hit higher accuracy figures but also investigate the type of errors it's making. This needs to be done while thinking about your specific use case and domain.


I guess this is where diversity is needed.

If they have an African-American engineer working on the product, they would have detected this during development.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: