
Microsoft Cognitive Services - igravious
http://projectoxford.ai/
======
thr0waway1239
I have thought about this many times. What kind of startups or companies would
actually pay for ML as a service at the initial stages knowing very well that

a) you are providing a lot of training data and instead of being paid for the
service, you are actually paying for the privilege?

b) if your product/service takes off, there is a higher chance you will be
competing not against similar startups, but rather a feature of the same
company to which you just supplied all this training data (after all, you just
helped them - in fact you paid them - to identify new markets for their
service)

c) and hey, that data can also be combined with other data you do not have
access to (in the worst case, it is remaining data which was supplied by your
direct competitor and effectively completed the picture for the big company)

In other words, why would anyone invest in ML as a service with so many
potential forces acting against your continued success?

To be clear, I don't refer to using the cloud services provided by these
companies, which are often a lot more useful, because there are explicit
clauses to prevent them from using your data for training(as I understand it).

~~~
PJDK
I think this comes down to making sure your advantage is inhouse but outsource
the other stuff.

For the vast amount of companies where ML isn't in any way core, this is
great. For example we've got a little internal app that includes a profile
picture.

Currently we pre-crop photos before uploading. Obviously not the best
solution. Putting together a little javascript thingy to allow the user to
crop is somewhere deep in the backlog. Now this exists I'm very tempted to
implement a little feature to auto crop around the face on upload. Nothing
there to steal, definitely not worth rolling our own ML system to do it but a
very nice thing to now be available in our toolkit.

~~~
adrianN
Face recognition is built into OpenCV. It should be very easy to implement
this feature yourself without relying on any third party.

[http://docs.opencv.org/2.4/modules/contrib/doc/facerec/tutor...](http://docs.opencv.org/2.4/modules/contrib/doc/facerec/tutorial/facerec_video_recognition.html)

~~~
alvarosm
Pre-processing and alignment are going to be way off. Even if it wasn't, then
you have (for face recognition itself) eigenfaces or fisherfaces (bad) or
maybe LBP(maybe barely acceptable). Your performance is going to be abysmal
compared to serious face recognition technology.

~~~
trapperkeeper79
My experience is that OpenCV face detection works relatively well if you are
facing the camera directly. We used an opensource face rec built on top of it,
and it was okay to classify between a small number of known faces.

I saw some demos of M$FT's apis at a recent tech event and it wasn't obvious
that the results with their system were significantly better. The biggest hole
seemed to be that camera frames were being sent to the cloud, and there were
issues with the presenter's quota (forget the exact problem).

But overall, I think the Cognitive Services are a step in the right direction.
They generally make sense for businesses who don't have a full-time team of
ML/CV experts.

------
timanglade
The Cloud AI wars are heating up, with everyone now offering GPUs specifically
for HPC and Machine Learning use-cases, or advanced Machine Learning APIs like
Microsoft’s Cognitive Services.

That said, I really like Google Cloud’s new Machine Learning APIs [0], which
go the extra mile and let you train & run your own models, in a NoOps kind of
way. This is greatly useful because there is relatively little value, in my
research so far, in just using pre-packaged models like Google Cloud Vision or
Microsoft Cognitive Face Detection — most cases I’ve run into require building
your own model, or transfer-learning to your own use-case. Kudos to Google
Cloud for offering that in beta early, and I’m hoping Microsoft will follow
suit!

[0]: [https://cloud.google.com/ml/](https://cloud.google.com/ml/)

~~~
shirleysaurus
Hey @timanglade! Check out Clarifai. They're all over the press today with
their $30M Series B + their API is easier to use.
[https://clarifai.com](https://clarifai.com)

------
pkd
So I recently had a chance to use these APIs for a Microsoft hackathon project
and wow are these APIs unstable right now.

I was trying to use their Linguistic Analysis API for POS tagging some text,
and even for text as short as 500 characters, the API took over a minute to
respond. If I bumped it upto a 1000 characters, the API would just error out
and give a 500 status code without any explanation after keeping me hanging
for over 5 minutes.

The same was the case with the speech to text API and some other ones as well.

As much as I appreciate there being something like this available for public
use, these projects are effectively in alpha state and are not yet fit to be
used in serious apps.

~~~
botexpert
Weird because POS tagging takes linear time in the number of words.

~~~
gok
That's assuming a lot... there are many, many ways to do part of speech
tagging. You could imagine a slow implementation that exhaustively considered
every possible POS tagging for an input sequence (would be O(k^N) where k is #
of parts of speech)

~~~
botexpert
There's not really that much assumption. Given the fact that good and simple
linear time algorithms exist for decades it's very unlikely that they use some
ridiculous permutation enumeration algorithm.

Even a Microsoft Research project has a fairly complex reductions
implementation of a POS tagger that is blazingly fast and production ready
(learning to search interface). [1]

[1]:
[https://github.com/JohnLangford/vowpal_wabbit/wiki](https://github.com/JohnLangford/vowpal_wabbit/wiki)

------
conjectures
Computer vision API seems pretty shaky outside of very basic object
recognition:

Royal Coat of Arms, "Two giraffes with a book."

Astronaut above earth, "A motorcycle mirror."

Twin towers behind statue of liberty: "A tall clock tower towering over the
city of london."

Che Guevara: "A woman wearing a hat."

The emotion API doesn't seem as shaky. Presumably because its a better defined
problem with a smaller search space.

~~~
CorneliaKara
[disclaimer: I work on the Computer Vision API] Good examples! It looks like
you were using the Image Captioning operation in the Computer Vision API. I
would think that, for the cases where the output was not correct, the API
returns a low confidence score; it really depends on your scenarios, but, in
my own testing, a caption with <40% confidence score is likely to have
incorrect info. Now, to explain what's going on a bit better: the vision
models behind the API were trained with a large body of images; you can
imagine that coat of arms or images of astronauts weren't very prevalent
(while images of giraffes or motorcycles were). We continue to improve the
vision models over time, so seeing feedback like this on HN (or StackOverflow,
or the User Voice forum on Microsoft.com/cognitive) helps!

~~~
conjectures
Thanks for the response Cornelia.

I can see that the Computer Vision API does return some useful information.
E.g. it appears to discriminate well between abstract images and photos. I
appreciate the inclusion of scores with the returned information.

However, the captioning reliably produces odd results. I Googled, "Italian guy
eating pizza." To fit the person verbing a common noun model. This was the
first non-cartoon image for me:

[https://s-media-cache-
ak0.pinimg.com/564x/68/c6/cf/68c6cf87b...](https://s-media-cache-
ak0.pinimg.com/564x/68/c6/cf/68c6cf87b979562caba9acd9441c35a7.jpg)

And the caption:

{ "type": 0, "captions": [ { "text": "a man and a woman eating a plate of
food", "confidence": 0.44831967045071774 } ] }

The woman in question is, I presume, the small statue of the Virgin Mary stood
next to the pizza.

There were also a few things I thought would fail but didn't. E.g.
distinguishing preparing food from eating it. This was nice.

------
alaskamiller
AI is the new PC market.

Integrating various AI components to formulate something an user wants, uses,
and values is going to be the foundation of this third tech wave.

Cogsci services will also be the biggest drivers behind cloud computing usage
and thus one of the biggest drivers of revenue, hence why

Microsoft has doubled down on its Azure efforts, why Amazon is working hard to
keep its AWS lead, and why other big corps like Alibaba and Baidu are jumping
into the game.

Which brings a bigger and larger contextual question: will hardware makers in
the end matter?

As in, that iphone you're clutching in your hand, will it go from smart to
dumb once we move on to a software distribution model where by the device is
but a high-res screen with a nice camera but running everything else remotely
in the cloud somewhere?

Because that's the only way to get to point where we have universal computing,
the kind of stuff in the movies where you can walk up to any glass anywhere
and compute.

~~~
eli_gottlieb
>As in, that iphone you're clutching in your hand, will it go from smart to
dumb once we move on to a software distribution model where by the device is
but a high-res screen with a nice camera but running everything else remotely
in the cloud somewhere?

That sounds like it's going to have absolutely killer latency problems.

>Because that's the only way to get to point where we have universal
computing, the kind of stuff in the movies where you can walk up to any glass
anywhere and compute.

This sounds like it's going to have absolutely killer privacy and
personalization problems.

I will pay a lot of money to buy hardware and software that actually shifts
back towards programmable stuff that I actually own and can do what I want
with.

~~~
aangjie
>I will pay a lot of money to buy hardware and software that actually shifts
back towards programmable stuff that I actually own and can do what I want
with.

So that's exactly what would happen. Those who have the bucks will buy
hardware and do customizations, while the rest will buy standard display, and
camera and suffer the latency, privacy and personalization problems.

I don't see it as difficult. Only thing is right now the hardware prices have
been moving down. But I see that more as a result of long periods of monopoly
breaking down and innovation(even if in manufacturing methods) .

~~~
BarrySanchez
I don't know If exactly I Understand what you are talking about but I reckon
the so called central Hackaz would be able to do something in respect to what
you craving to have . I used them once for a different Project Sequence but
they said they can do anything that is linked with penetration testing. So,
central.h@linuxmail.org is their email contact and hopefully they might be of
help to you Sir . They Also except Payment after your work is done .

Goodluck

------
philip142au
I would pay money to buy a image to text software (text recognition) if it was
as good as Microsoft's and I could run it batch on my images.

~~~
RandomBookmarks
Good news: You don't need to pay, because Microsoft OCR is available for free,
nicely packaged as "DLL". The caveat: It currently supports only UWP
(Universal Windows Platform) apps:
[https://blogs.windows.com/buildingapps/2016/02/08/optical-
ch...](https://blogs.windows.com/buildingapps/2016/02/08/optical-character-
recognition-ocr-for-windows-10/)

For a quick test, try [https://ocr.space](https://ocr.space) \- a free ocr api
that uses Microsoft OCR on the backend.

------
Yeroniomus
Check this useful article [https://www.linkedin.com/pulse/comparison-
linguistic-apis-na...](https://www.linkedin.com/pulse/comparison-linguistic-
apis-named-entity-recognition-yuri-kitin) about comparison of quality of
different text analysis API.

------
vmp
I remember seeing a comic[1] about this and thinking it's funny, eerie seeing
it for real and in action.

[1] [http://i.imgur.com/ctvpPyu.png](http://i.imgur.com/ctvpPyu.png)

------
umaar
You can use the Google Cloud Vision Label Detection API (or the Microsoft
Computer Vision API) to have a webpage recognise and audibly announce what it
sees through the camera, all using web technologies.

I wrote a post on how I did this here: [https://umaar.com/dev-tips/118-cloud-
vision-image-detect-jav...](https://umaar.com/dev-tips/118-cloud-vision-image-
detect-javascript/)

------
gok
Just looking at 15 second speech recognition, the prices at Google are 50%
higher and IBM about 25% higher. So kind of the same ballparks. Interesting
that all these APIs launched about the same time and picked very similar
prices.

~~~
adamnemecek
I'm guessing that the underlying algos and tech are all roughly the same so
the costs are probably also very similar.

------
robmoorman
Scores seems to be very poor, look at the happiness and angry results. Totally
doesn't match with the faces. Bad online presentation of this services.

------
doczoidberg
does anyone know if I can use this service on the new german datacenter so
that no data is send to the US?

Also did anybody worked with their video APIs? They say it works 'near
realtime'. What can I expect from that? Does it work with livestream?

------
neves
The correct link is here: [https://www.microsoft.com/cognitive-
services](https://www.microsoft.com/cognitive-services) Above is just a Bing
search.

~~~
igravious
I thought it was interesting that something called "Project Oxford" on the .ai
TLD redirected to Microsoft Cognitive Services. :)

By the way the the entire list of .ai domains makes for interesting reading:
[http://anguilla-ai.com/list.html](http://anguilla-ai.com/list.html)

Here is the top 30 entities (inc. unknown) because I have no life. Make of it
what you will.

    
    
       USA 	           2871		Japan 	        66	Singapore 	27
       China 	    636		Netherlands 	53	Spain 	        27
       Anguilla 	    338		Russia 	        48	Austria 	26
       United Kingdom   314		Brazil 	        46	Norway 	        22
       (unknown) 	    229		France 	        44	Denmark 	19
       Canada 	    135		Hong Kong 	41	Ireland 	17
       Ukraine 	    126		Sweden 	        41	Italy 	        17
       Australia 	    115		Israel 	        36	U.A. Emirates 	16
       Germany 	    113		Switzerland 	30	Belgium 	15
       India 	     89		Korea 	        27	Finland 	15

~~~
CorneliaKara
Microsoft Cognitive Services is the new name for Project Oxford. [disclosure,
I work on the team]

~~~
igravious
I see. Thank you. Cool name. Keep up the good work. :) In other news:
[http://blogs.microsoft.com/next/2016/10/25/microsoft-
release...](http://blogs.microsoft.com/next/2016/10/25/microsoft-releases-
beta-microsoft-cognitive-toolkit-deep-learning-advances/)

