Hacker News new | past | comments | ask | show | jobs | submit login

I feel like this is some really compelling tech. It would be so amazing to build stuff with this in mind. I wouldn't be comfortable doing it, though. This sort of API is available only until Google decide that they don't want it to be available. There's not really anything close to equivalent that you could drop in to replace it if it were being shut down, the price were being hiked, or you had some sort of other issue with it.

I'm not trying to pick on Google for shutting things down; I would feel similarly if this API were from Microsoft or Facebook. It's not the first time there's been an API that I think is really cool, but was very apprehensive about actually using for anything serious.




You are creating a double bind for yourself and others by posting these types of comments. Your comment indicates you want the experience making things with it which would be amazing, but you are also speculating (based on historical evidence) that Google may "take away" the amazing thing from you later.

> I'm not trying to pick on Google for shutting things down

People usually say what they are thinking. In this case, I can certainly appreciate and respect questioning what Google's actions will be in the future, but the problem at hand is that none of us can tell the future. By attempting to do so, and creating a situation where we literally believe two conflicting things at once, we get mired down in illogical arguments that end up making zero sense. Worse, when we post those illogical arguments, others get dragged into the dissonance and end up making similar arguments that make no sense and the result of that is others get lost in the mess we made. For example:

> I think Google understands how big this could be (well, why else do they do anything I suppose?)

Do we really know Google understands how big this could be, or is that just us wishing they wouldn't "shut it down later"? Both thoughts are speculative, at best.

As for me, I have no way to know if I can trust Google will leave these APIs up for as long as I need them or will not change the methods in the APIs at some point breaking my code I've written to talk to it. The latter happens to me all the time!

I realize this comment may stir some emotional responses. All I would ask is that we consider alternative ways of thinking about these illogically binding "feelings" and expand our awareness to the fact that what we really want is our software, regardless of who wrote it, to be open, transparent, trustworthy and capable of running wherever and whenever we want it, regardless of when that is.

Obviously we have a long ways to go before that statement can be a reality. One can hope though!


TensorFlow + TensorFlow Serving + Google ReCeption model plus optionally a SVN on ReCeption features for your custom detection. All that code and the pretrained model is Open Source. There's some engineering to glue it together and some extra work for the easier, non-image classification parts.

There is also http://www.deepdetect.com


+this. I've implemented a subset of this kind of pipeline before on AWS (image tagging + face identification) using the building blocks that existed last year (it was AlexNet at the time, with a pre-release version of MXNet, because Google hadn't released the trained inception model). Implementing this basic functionality at a basic working level, given the tools Google has released, isn't impossible.

Now, making it production-quality, efficient, scalable, and the rest -- well, y'know. That's why people use cloud-based services in the first place.

But I think there's less fundamental lock-in than you think. Cloudinary, for example, will let you upload an image and get a tag out. ABBYY and OmniPage/Nuance and others offer cloud-based OCR.

I'm biased - I'm at Google this year - so take this with a grain of salt, but while I have the feeling that Google can do it better and more affordably than a small team could do it on their own, I don't think that Google pulling the API would leave people up a creek without a paddle.


> the pretrained model is Open Source.

Google's face/landmark/label/text/logo detection models are open source? Or there exist open source pretrained models?

The quality and size of the training set is (at least) as important as the machine learning tools. I imagine Google has access to a pretty big data set, along with the computing resources to process it.


Google's face/landmark/label/text/logo detection models are open source? Or there exist open source pretrained models?

Google's Inception v3 pre-trained image recognition model is open source: https://www.tensorflow.org/versions/r0.7/tutorials/image_rec...

That's the hard part because as you note this is computational intensive (the training data is actually open source as the ImageNet dataset)

There is existing code for the others part that perform pretty adequately (with the possible exception of landmark detection).

Eg:

Face detection: http://docs.opencv.org/master/d7/d8b/tutorial_py_face_detect...

Logo Detection: http://www.pyimagesearch.com/2015/01/26/multi-scale-template...


can you provide some links ?


TensorFlow: https://www.tensorflow.org/

TensorFlow Serving: https://github.com/tensorflow/serving

ReCeption (actually they call in Inception v3. Not sure where I got the ReCeption name - though I'm sure I read it somewhere?): https://www.tensorflow.org/versions/r0.7/tutorials/image_rec...

Using a SVN on neural network extracted features: http://blog.christianperone.com/2015/08/convolutional-neural...

If you want a quick and dirty version here's some Python to create a web service that calls a Caffe based Image recognizer: https://gist.github.com/nlothian/c3519adb81b3452c1938


thanks!



The question is, is this your core business? If you want to roll your own ML / CV API and your investors / customers will value you based on it, great. If it's not your core business, then SaaS / API interfaces save you time / money / ability to get into the market. Composability is what we do nowadays and you're always going to be exposed to risk. Recognize that risk and move on or go out of business trying to do it all yourself.


If I were you I'd build a MVP with this API (or IBM Watson's API) and see how it goes. If your product/service starts to take off, you could start looking into implementing your own machine learning / computer vision algorithm, hoping one day it gets good enough to replace whatever API you bootstrapped the product/service with.


IBM's Watson service, provided via Bluemix, does this:

https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc...


If you want to be more comfortable with it, one really cool API you should check out is Clarifai (someone else already mentioned it in this thread, but I'll bring it up again anyway). They're really highly regarded in terms of classification, and the API is pretty simple to use, too. Like if you just make a quick cURL request:

  curl -H "Authorization: Bearer {access_token}" --data-urlencode "url=YOUR_IMAGE_URL.jpg"  "https://api.clarifai.com/v1/tag/"
...you get the top 20 tags for that image and their confidence levels, too. Their homepage at Clarifai.com has a pretty good demo that lets you see it in a more visual way.


There actually are other APIs, though with a smaller scope. For text extraction for example there is the OnDemand Api, https://dev.havenondemand.com/apis/ocrdocument#overview, backed by HP. They also have logo detection. I'd be surprised if no replacement for the category detection exists.

Though I admit I also hesitate to replace that API with the google offering for the one app where I actually use it. The results would probably be better, but I just got burned again from stuff shutting down, and I remember Google Reader.

But the Vision Api looks cool.


There's also this API by Microsoft Research. https://www.projectoxford.ai/


I will throw ours into the ring as well.

If you need to understand emotional reaction on video sources, our API can fill in the gaps not currently filled by Google's Cloud Vision API: https://www.kairos.com/emotion-analysis-api

Disclosure: I'm CTO of Kairos.com


I feel like there's a lot of scope for big companies to make commitments to do the right thing by developers building on top of their services. Stripe, for example, have a data portability clause where (subject to some conditions) they'll move customer data to other payment processors at your request. Funnily enough, that's the type of commitment that will make me never want to leave Stripe.


Clarifai are quite a big name in the classification field and that's pretty much all their API does.


I understand your apprehension because I feel the same way, to some degree. But I'm excited about the new types of tech that I can already imagine using this API. There are so many fascinating ideas that come to mind, and I'm usually not that creative of a person.

I think Google understands how big this could be (well, why else do they do anything I suppose?). I guess you'd just have to put faith in them. I imagine those that do will build some amazing projects from this.


You're being paranoid for several reasons. First, you will presumably be keeping the classifications you get from the API (if for no other reason than $2/1000 can really add up and why would you do that to yourself?). Second, Google has historically always had very generous end-of-life announcements. Third, there are already a number of competitors, including the trained CNNs Google has already released. Fourth, even if there were no competitors nor pre-trained models, deep learning is increasingly accessible and you could learn to imitate a past ImageNet-winning CNN in Torch/Theano/Tensflow/etc and train it within a few weeks. Fifth, paid Google services tend to hang around longer. Sixth, machine learning is a major part of Google and they are increasingly rolling it out to their services and may well be using this API themselves, making closing it not such a great idea.

So you're passing up what could be substantial benefits, Google isn't going to close it anytime soon, if they do you will have months of warning, and can easily replace it with a competitor or your own.


That makes sense. Cheers!


There's a nice opinion on this from CloudTP:

http://www.cloudtp.com/2016/01/21/dont-let-vendor-lock-in-fe...

TL;DR: You don't get the girl if you don't ask for a dance :)


Well, we've actually been using AlchemyAPI for this exact function for some time now. They were recently acquired by the IBM Watson team, so it's kind of a pain to get set up through them now (you have to go in through the Bluemix control panel to create an account and set up billing). They're probably not as accurate as Google, which is why I'm going to be spending tomorrow implementing this and comparing the results, but they are pretty darn good.

http://alchemyapi.com


What about Clarifai.com ?


Here is an analysis of Google projects that have been closed later: http://www.gwern.net/Google%20shutdowns


I have all kinds of product ideas from this API, but have the same fears as you.. can't rely on an API.

So for now, my best idea is to use it to build something fun with my kids.

If only Google would adopt some sort of policy where they would let you download the dataset+code when they shut something off...


Yeah, if they could make some sort of commitment to open it up in the event that they no longer want to run it, problem solved. (mostly, at least; pricing adjustments could still be disruptive for certain applications)


Want to share some of those ideas? :)


I like how the announcement post already prepares for the end of "our incredibly journey" by saying "Google Cloud Vision API is our first step on the journey".


Yes, a journey has an end, but hopefully it leaves us at a desirable destination.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: