
Ask HN: Microsoft Computer Vision API or Google Cloud Vision API? - jharohit
Hi HN community!<p>I am trying to decide on Microsoft&#x27;s CV offering vs . Google&#x27;s CV offering for my B2B startup. Any recommendations from people who have tried both??<p>Background - We are trying to use images of models uploaded by agencies and deriving labels &amp; image properties. Face detection is something that is an added bonus if possible.
======
buro9
[http://www.clarifai.com/](http://www.clarifai.com/)

Pricing is more friendly than the other two. The API is nice.

I did like Google's a lot, but the price just wasn't there for me. Especially
if you want most processing options.

Microsoft have been in this game a lot longer, but surpisingly a lot of their
cool stuff isn't in their APIs. i.e. ability to spot similar images and
seamlessly stitch them. This stuff was in their maps products a long time ago,
and you can download tools of theirs: [http://research.microsoft.com/en-
us/um/redmond/projects/ice/](http://research.microsoft.com/en-
us/um/redmond/projects/ice/) but no APIs. Their basic APIs are just basic...
so why not save the dime and go with a smaller player offering just the basics
but very well instead.

~~~
jharohit
Thanks for the suggestion. Tried Clarifai's demo - their API & Pricing is nice
as you said but they just don't have much to offer in terms of labels and
feature detection when you compare to what Microsoft's or Google's version has
to offer. Plus, they also don't seem to offer confidence numbers for labels

~~~
jasonnovack
Disclosure: I work at Clarifai.

Thanks for checking us out! Our demo may not fully represent what we have to
offer, but you can see a sample response here with confidence numbers included
(we call them probs):
[https://developer.clarifai.com/guide/tag#tag](https://developer.clarifai.com/guide/tag#tag)

In terms of labels, our "general" model has over 11,000 labels, and we also
have specialized models with labels tailored for other purposes, including
NSFW, travel, and food, among others.

Hope this helps

------
malux85
Depends on the complexity of what you require. I know I might get down-voted
for this, but if your task is relatively simple, then roll your own using deep
learning. Message me if you want help with this.

I wouldn't rely on either for my own startup, because I dont think these API's
will have broad appeal, as a result wont get traction, and will be shutdown
with little warning.

I could help you if you want - my email is in my profile

~~~
Eridrus
I've had a surprising amount of success rolling my own deep learning vision
system too since the community is so open, but I wanted to learn about the
field and I've sunk a few weeks of spare time into it at this point so I'm not
sure I'd really recommend it to a startup unless the API offerings just don't
work for them for performance or cost reasons.

------
JoshTriplett
OpenCV includes face detection, and given a reasonably limited corpus of
faces, it performs quite well and quite reliably.

(Whether you want to use that or use a service depends on how close to your
core business this is.)

~~~
rememberlenny
I would add to this comment. I recently thought face detection was a difficult
service, but its actually really easy to implement yourself.

I found these really useful:
[http://www.pyimagesearch.com/](http://www.pyimagesearch.com/)

------
wstrange
Interesting looking at Google's Vision API overview, where they explicitly
state that facial recognition is not available.

The technology to do this clearly exists, but I gather they are concerned
about the potential for abuse. Which makes sense. You could build some very
creepy apps with this.

~~~
Negative1
I won't directly speculate but I doubt the Face Detection functionality is
absent due to some ethical quandary.

~~~
obmelvin
Why do you say that?
[http://megaface.cs.washington.edu/results/](http://megaface.cs.washington.edu/results/)
shows that Google has one of the best Face Detection models.

~~~
Spooky23
Sounds like a competitive advantage to me.

------
chdir
A request to those suggesting "why not X" or "consider X" : If you could
mention a reason or two favoring X over Y, that'll help OP & future visitors.

------
ANaimi
[http://algorithmia.com](http://algorithmia.com)

Pay-as-you-go, many APIs, supportive community.

For your use case you might want to check the Computer Vision tag,
specifically the "Illustration Tagger" algorithm.

[https://algorithmia.com/tags/computer%20vision](https://algorithmia.com/tags/computer%20vision)

------
beagle3
I'm interested in good OCR, preferably local, but I'm close to giving up and
using Google Cloud Vision API --- It works well for text that's not prefectly
aligned and laid out - unlike e.g. Tesseract or any other local OCR I've used.

As far as I can tell, clarifai.com doesn't have OCR, and neither does anyone
else except MS and G.

~~~
megalodon
Hey, I made this [1]. Based on a neural net trained with generated character
sets with intentional distortions and/or several font variations.

I would not recommend it for use in production, but maybe you're interested in
looking at the code and customizing it to your liking. Could perhaps be
combined with OpenCV.

[1]
[https://github.com/mateogianolio/ocr](https://github.com/mateogianolio/ocr)

~~~
beagle3
Thanks!

------
shireboy
Develop your code so that the API is pluggable. Try both and decide which
works best for you.

------
danielmorozoff
It really depends what you are attempting to accomplish, and what you wish to
detect in the images.

As you mentioned faces:

Are you looking for face detection or recognition? Face detection has been
robustly solved before the advent of DL with HAARs/ face models. Now being
pushed a bit further with DL.

([http://docs.opencv.org/master/d7/d8b/tutorial_py_face_detect...](http://docs.opencv.org/master/d7/d8b/tutorial_py_face_detection.html#gsc.tab=0))

Current cutting edge face recognition systems rely on DL, and the top
performing models are one out of Russia (NTechLAB, facenx_large) and one from
Google (FaceNet v8). These were the top two performers in the MegaFace
challenge - identification with 1M distractors. Truly remarkable results.
[http://megaface.cs.washington.edu/results/](http://megaface.cs.washington.edu/results/)

As with most DL systems you will need a massive corpus of labeled faces (aka,
google or vkontakte - which the NTechLab group used)

~~~
wcrichton
Note from personal experience: Haar cascades only work really well on frontal
faces in high quality and good light. It sounds like OP has these kinds of
photos so it will work well, but if you want to detect faces in any other kind
of image/video, you'll need something more powerful. I still haven't found
anything that works well.

~~~
ryangal_msft
You may want to check out the face detection approach described in this paper:
Joint Cascade Face Detection and Alignment. ECCV
2014([http://www.jiansun.org/papers/ECCV14_JointCascade.pdf](http://www.jiansun.org/papers/ECCV14_JointCascade.pdf))

I'd also encourage you to try out the Face API from Microsoft (full
disclosure, I work on it). One of the focuses is on improving detection when
challenging lighting and occlusions are present:
[https://www.microsoft.com/cognitive-services/en-us/face-
api](https://www.microsoft.com/cognitive-services/en-us/face-api)

~~~
danielmorozoff
Very interesting paper. Thank you for sharing it.

I had a question after reading it. The parameter rho, that is used to either
classify/regress depending on the t cascade stage{1..T}, is said to be set
empirically. This, I would assume, could change across data sets, how were you
able to decide when classification switched to regression in your tree and how
was this adapted when testing on other data?

Also forgot to attach the paper I was referring to:
[http://arxiv.org/pdf/1502.02766v3.pdf](http://arxiv.org/pdf/1502.02766v3.pdf)

------
mitbal
I have tried both some time ago for an OCR task. In my brief experience, GCV
performs better than Microsoft. Also last time I tried, I sometimes randomly
get server error from Microsoft, so I guess Google infrastructure is more
ready. The downside is GCV is a bit pricier. Also both do not provide
parameter to set language models, so that's a minus in my eyes.

------
kevinSuttle
What about Watson?
[http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercl...](http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/visual-
recognition.html)

~~~
IshKebab
I don't know about face recognition but I've quantatively analysed their
speech recognition and it came dead last after the 5 or so others that I
tested.

~~~
burger
Any chance you could share this?

------
jimmcslim
Is there a publicly accessible API that can geocode photos, to a degree of
accuracy? I'd like to be able to decorate digital photos taken before
geocoding was a thing with geo data. I figure photos I have taken off St.
Marks Square in Venice have probably been taken a million times by other
people, some of whom have probably added GPS coordinates to theirs, so a smart
CV offering should be able to figure it out to a sufficient degree of accuracy
(for reasonably well photographed and unchanging locations of the earth).

EDIT: I see Google Cloud Vision has landmark detection, that might be useful
if the API returns the GPS coordinates of the landmark.

~~~
disillusioned
Google Photos actually makes a reasonable estimation of your photo's location
based on the content of the image (and the context of the photo, if it was
available.)

For example, if I shoot with my non-GPS-enabled DSLR, those images are
uploaded to Google Photos, which will reconcile my location history to apply a
location to those shots. It'll also do that if it sees DSLR shots in between
geotagged cameraphone shots.

But more to your use case, GPhotos will actually recognize landmarks and other
information to tag photos, I believe, with a rough location (such that it'll
match a location like "Paris" or "Eiffel Tower," but perhaps not lat/long...
yet.)

Even more impressive, they're very nearly able to do exactly what you're
describing, though my understanding is that it isn't in use in GPhotos yet:
[https://www.technologyreview.com/s/600889/google-unveils-
neu...](https://www.technologyreview.com/s/600889/google-unveils-neural-
network-with-superhuman-ability-to-determine-the-location-of-almost/)

------
adityapatadia
We build customised image labelling solutions where you can label many more
things like type of neck in a cloth, pattern of label on a mug and many such
things which is not supported by Google or Microsoft.

We also offer finding similar images as well as image search capabilities
apart from finding tags from images. Please connect at
[https://twitter.com/adityapatadia](https://twitter.com/adityapatadia) to
discuss further.

------
moofight
[https://sightengine.com](https://sightengine.com)

Alternative solution for image moderation and nudity detection. Simple API and
simple pricing.

------
ismdubey
When I checked last, Google API does not allow to identify specific faces. It
can detect faces but that's it. Clarify or Microsoft do. Pricing wise almost
all are the same. In my view, Watson is a complete no no..

~~~
chris_st
Just curious, what makes you say that about Watson?

------
NEDM64
I think Microsoft's works better, try also IBM's if you can.

~~~
jharohit
So I tried Microsoft's CV & IBM's AlchemyVision on the same image (since they
both have an online demo sandbox). The Microsoft just gave me back more labels
and stronger sentiment figures for same labels. Hence narrowed it down to
these 2.

~~~
jwp729
If you tried the Watson vision offering when it was "AlchemyVision" then you
may have tried a now out-of-date version of the service. The AlchemyVision and
Visual Recognition tiles on Bluemix have recently been combined in a way that
utilizes their complementary strengths. Consider retrying the updated service
if you'd like!

Disclosure: I work at IBM Watson.

~~~
jharohit
ok - my bad. I just saw the AlchemyVision has been merged into Visual
Recognition starting May 20th. We will definitely check it out to see what
extra features have been added.

qq - Is the API stabalized, by which I mean will there be further
changes/merges?

~~~
kognate
The API has stabilized. There will be future changes, and I would expect to
see those as the service gets new features added (like retraining).

------
swframe
Do you know of a human pose estimation from 2d images service/library? I've
seen papers about it. I would like to try it out.

------
rocky1138
Consider Imagga.

[http://imagga.com/](http://imagga.com/)

~~~
spullara
They explicitlessly didn't allow it for Google Glass apps so I wouldn't be
surprised.

~~~
rocky1138
Can you explain this a bit better? I'm curious.

------
ma2rten
I think it depends on your dataset and application. It should be easy enough
to try both.

------
rajeshr
What works well for face detection in low light images?

------
staticmalloc
Microsoft's CV API worked better for me.

~~~
singularity2001
The last time I checked WolframAlpha worked surprisingly well

