
New services expand IBM Watson capabilities to images, speech, and more - jsstylos
https://developer.ibm.com/watson/blog/2015/02/04/new-watson-services-available/
======
pesenti
Some context on the new services. They are built on technology that comes from
IBM Research and has been moved into the Watson group in 2014. Some like
speech, have been developed for more than 50 years. None of these technologies
have overlap with the Watson Jeopardy stack (except for the Watson voice). We
will release that stack later this year as a series of services allowing you
to build a full Q&A/dialog application.

All the Watson services are still in beta but will start going GA very soon
(first one next month). If you have any questions, please fire up, the Watson
team is ready to answer.

~~~
jcfrei
Do any of the Watson services allow for feedback to train them?

~~~
keelyw
Yes, all services include a feedback API, and the demos also include a
mechanism for providing feedback. As an example, see 4th paragraph in this
doc, which also includes a link to the API docs:
[http://ibm.co/1yNfztF](http://ibm.co/1yNfztF) And here's a link to the demo,
see the "Give us feedback" link:
[http://bit.ly/1EJllDF](http://bit.ly/1EJllDF)

------
qeorge
I've been uploading the easiest photos I can find to the visual recognition
demo[1], and its yet to get one right.

For example, I searched Google for "photo of girl", and found this image which
seems _very_ easy:

[http://www.wagggsworld.org/shared/uploads/img/rachel-s-p-
pho...](http://www.wagggsworld.org/shared/uploads/img/rachel-s-p-photo.jpg)

Watson says:

    
    
        Color		71%
        Human		67%
        Photo		65%
        Dog			59%
        Person		57%
        Placental_Mammal	56%
        Animal		50%
        Long_Jump		50%
    

Huh?

This isn't me cherry picking bad results; aside from their demos I'm not
finding any photos that are accurately classified. I even tried a headshot of
a person isolated on a white background, and Watson told me I uploaded a photo
of "shoes".

Seriously - how is this data useful? What could I build with this level of
accuracy?

Watson team - do you agree? Is this product about to get a lot better, soon,
or is this considered "pretty good"?

[1] [http://visual-recognition-demo.mybluemix.net/](http://visual-recognition-
demo.mybluemix.net/)

~~~
pesenti
The top 3 classes in your example are actually correct - it is a color photo
of a human. But we expect it to get much better over time. Only real world
usage will allow us to make real improvement - and that's why we are eager to
release early.

We are also believe that the first applications (e.g., classifying animals or
plants or landmarks in dedicated apps) will have narrower use case that give
better accuracy.

~~~
qeorge
The top 3 may be correct, but they aren't very useful. What could I do with
this information? What feature could I build?

Also, the other results are very wrong. (i.e., Watson is more confident that
this is a dog than a person. And I have no idea where it got "Long Jump"
from). This makes it hard for me to trust Watson.

Is the recommendation that I incorporate a "confidence in Watson" metric, and
ignore most of the results?

What confidence from Watson would you say indicates an answer that is probably
accurate? And how confident are you that Watson's self-reported confidence is
accurate?

~~~
ChuckMcM
I tend to disagree. Assuming they are correct on a larger corpus you can start
doing things like "only do face matching on pictures with people in them" and
weed out photos in a batch that don't have those three properties.

Watson is a training API rather than say the more fanciful emergent AI type
API. More data, the better it gets. It is like Google's voice recognition
isn't good because someone coded the magic constants for various accents,
rather it is good because Google fed it millions of samples of spoken words
and corrects it when they get it wrong.

~~~
qeorge
Thanks for your comment. This makes sense - I would use Watson to determine
which photos have humans at all, and then run those through, e.g., my facial
recognition software. But Watson would keep me from having to waste resources
looking for faces in photos of trees, for example.

I'm not in this field, so I'm having trouble understanding what use cases /
consumer facing features this API unlocks. Your comment is very helpful in
that regard.

------
SlipperySlope
Compare the Watson text-to-speech voices with Nuance ...

Watson [http://text-to-speech-demo.mybluemix.net/](http://text-to-speech-
demo.mybluemix.net/)

Nuance [http://www.nuance.com/for-business/text-to-
speech/vocalizer/...](http://www.nuance.com/for-business/text-to-
speech/vocalizer/index.htm#demo)

I prefer the Watson version voicing a sample paragraph. Both are good enough
for an application that selects on price. For a voice-first application, maybe
Watson is better for TTS.

For speech to text, Nuance has been the leader, e.g. Apple's Siri. Has anyone
compared IBM speech recognition to Nuance, Microsoft & Google?

~~~
picheny
We know we have strong core speech technology based on various comparisons we
have done in the context of competitive evaluations done in conjunction with
various government funded speech programs. However, our service is still very
new. We could have waited for months to tune it, but our primary goal here is
to solicit feedback from the community for how to make our services easier to
use, especially in the context of our other platform services. We don't want
to wait till the design is so mature that it is impossible to change - so any
and all feedback is very welcome!

~~~
taf2
yes but how do you sign up for either service?

~~~
nfriedly
The Watson services are only accessible through Bluemix for the moment. Create
an account on [https://console.ng.bluemix.net](https://console.ng.bluemix.net)
and then add a service instance.

The idea was that you'll also host your application in bluemix, although I
think the services are actually accessible elsewhere once you create the
instance in bluemix.

------
Kronopath
The text-to-speech is surprisingly good, but I'm amazed at one thing, and not
in a good way: the Spanish voice can't pronounce the word "Español". It
pronounces it as "Espanol" with a hard "n" sound. In fact, it seems to
pronounce all "ñ"s as "n"s. How that kind of an oversight got into the system,
I'll never know. Did no one think to check?

Edit: And to add insult to injury, the _English_ voices _do_ pronounce
"Español" correctly!

~~~
vaibhava72
Fixed.

------
bkeroack
Pricing page (which they don't make easy to find):
[https://console.ng.bluemix.net/#/pricing](https://console.ng.bluemix.net/#/pricing)

When this was first announced I remember reading about their pricing model
where they would take a percentage of app revenue. I'm glad to see they offer
flat pay-as-you-go pricing now. Some of the Watson services are intriguing.

------
jsstylos
I'm on the Watson team and we're interested in learning from developers to
make our APIs and documentation easier to use. Have feedback? We'd love to
hear it. jsstylos@us.ibm.com Twitter: @jsstylos

------
bhuga
The text-to-speech is actually a little nicer than Siri or Cortana, but not
groundbreaking. This was the only one of the 5 that I thought did well. The
rest might have been better without demo pages.

For visual recognition, I used a picture of a snowmobile from
[http://www.1888goodwin.com/2013/11/14/what-do-you-need-to-
do...](http://www.1888goodwin.com/2013/11/14/what-do-you-need-to-do-to-drive-
a-snowmobile-in-michigan/), which it identified with 73% confidence as
"Invertebrate".

Speech to text is a parody twitter account waiting to happen. Here's me asking
it how it does with technical transcription:

How do you doing technical words.

If you were going to have to talk about get an jute cushion pull.

And you wanted to discuss the impact on a file server memory.

Issues that cross processes talk about home forks rivers slowed difficult.

~~~
jp8000
Make sure you use a headset, not your laptop's microphone.

~~~
iandanforth
That's not a reasonable requirement. It only sounds like one to you because
the technology has been so bad for so long.

~~~
pesenti
Smartphone microphones are much better than laptop microphones and pretty much
on par with using headsets on a laptop - they represent our primary use case.

------
humanfromearth
I tried using Watson a month ago without much success. I wanted to do a
classification of some random text, and say that this text for example is this
category. But as far as I could understand it only allows using their own
datasets.

It's not possible to train their service with your data, unlike wit.ai for
example. Seems obvious to me that people would want to train with their own
data.

~~~
pesenti
Pretty much all the services that we are releasing will have some adaptation
capabilities - allowing you to provide your own data, create your own models,
etc - at some point. Stay posted.

------
FrankenPC
Text to speech is pretty good. [http://text-to-speech-
demo.mybluemix.net/?cm_mmc=developerWo...](http://text-to-speech-
demo.mybluemix.net/?cm_mmc=developerWorks-_-dWdevcenter-_-watson-_-lp)

I decided to test it a little. I copied phonem challenges and non-sensical
phrasing from the web. Then I added some stuff that I know has problems from
past experience.

\----- Let's explore some complicated conversions, shall we? The old corn cost
the blood. The wrong shot led the farm. The short arm sent the cow. How can I
intimate this to my most intimate friend? Don't desert me here in the desert!.
They were too close to the door to close it. The buck does funny things when
does are present. Today is 1/1/2015\. Today is Jan 5th, 1992. It's currently
half past 12. Or 12:30PM. Twenty thousand dollars. 20,000 dollars. 20 thousand
dollars. 2^5 = 32. NASA is an acronym. This ... is a pause.
EmailAddress@somedomain.com.

------
Poiesis
Two things that jumped out at me:

1\. No "special characters" allowed in passwords when creating an account. 2\.
...where's the REST API? I've "added a service" (TTS), but I have to write a
webapp to expose it over HTTP? It sure is a different experience than your
typical API documentation.

~~~
aroopPandya
take a look @
[https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc...](https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc).
There is also doc link once you click on the service you have bounded. There
are samples in java and nodejs. (samples coming soon in github)

~~~
Poiesis
I did indeed take a look at the docs. That's why I commented. If there's
documentation of REST end points, it's not obvious to me. Maybe someone else
will point out what I'm missing.

As far as I can tell, most of the documentation essentially begins, "First,
deploy a web app on our platform". Which is fine I guess, but isn't nearly as
simple as the HTTP APIs you see from many other recent SaaS providers. As
least for me, I'm pretty unlikely to jump through those hoops. Maybe others
will be different.

Edit: All the way down on the bottom of the documentation page, past the
research references, there's a link to HTTP API documentation--literally the
last link on the page.

------
karmacondon
So the gist of what I'm seeing in this thread is, "Watson's API services
aren't very good yet, but they will get better as it collects and processes
more data".

So basically, IBM is charging us to provide it with training data to make
Watson useful for practical applications. Makes sense, but I can't help but
feel that it would be a smarter move to skip charging entirely for now, or to
use drastically reduced pricing tiers that exist only for the purpose of
preventing abuse. The idea of releasing a product like this with less than
impressive demos is a bit of a risk. It's not going to encourage people to use
it if the demos aren't compelling, and the demos won't be compelling until a
lot of people are using it. I'd err on the side of optimism here, it'll
probably work out for the best, but it will be interesting to see how this
goes and provide a good case study.

My other thought is that if IBM can't get sufficient training data on their
own, what hope do the rest of us have? Performing classification on arbitrary
data is a herculean task. People could throw literally _anything_ at this api
and will expect to get common sense results, it's nearly impossible and
pushing the boundaries of what even cutting edge software can do. But if a
company like IBM spends billions of dollars and their demos still end up
generating mostly confusion and complaints... This kind of open ended "AI"
might be more difficult than even the most conservative experts thought.

EDIT: As an after thought, the real value here isn't so much software as it is
pooled training data. Facebook has been able to identify human faces in photos
for years, speech-to-text and concept modelling have all been around for a
long time. What's difficult is getting the labelled data necessary to
distinguish between "is this a picture of a person or a picture of a cat?".
Watson is great and it seems like IBM has made an investment in acquiring and
collecting the data necessary to do that. But their big play here might be to
build a consumer friendly enough product that their users contribute the rest
of that data for them over the next several years, building an aggregate data
set that is worth as much or more than the software itself. Again, will be
interesting to see how it plays out.

~~~
jsstylos
All of the Watson services are free in beta. (Bluemix, through which the
services are accessed, requires a credit card after 30 days, but doesn't
charge you for use of the beta Watson services.)

We wanted to get the services into peoples hands early, even though we're
still working on them, rather than wait until we had a perfect product.
There's a tradeoff here, but we figure that we can improve the services faster
and better with public usage and feedback than we could in private isolation.

Since they're free, hopefully people will be able to have some fun playing
around with the services, also!

------
corin_
@IBM people: Is there any information available yet either regarding future
pricing, or regarding timeline for getting pricing information?

~~~
johnward
Someone else posted this but here is the bluemix pricing page:
[https://console.ng.bluemix.net/#/pricing](https://console.ng.bluemix.net/#/pricing)

~~~
_delirium
That gives the pricing for running compute instances on BlueMix, but at the
moment there's no pricing for these Watson services, since they're free-while-
in-beta. Presumably post-beta there will be some kind of charge per N queries,
like the other out-of-beta services (e.g. the Business Rules service charges
$1.00 per 1000 API calls), but there's not currently an indication of when
that's likely to happen and/or the likely price range.

------
cabirum
Visual recognition has some room for improvement

[http://i.imgur.com/V59IeQH.png](http://i.imgur.com/V59IeQH.png)

~~~
aroopPandya
hey, try changing the classifier from "All" to "Scene". It does much better..
and stay tuned we will release some more api's on top of visual recognition to
allow for image labeling..

------
harisamin
This is great! There was a startup JetPacCity (acquired by Google) that was
doing some CNN for image recognition, mostly on the mobile client side. They
had open sourced their lib:
[https://github.com/jetpacapp/DeepBeliefSDK](https://github.com/jetpacapp/DeepBeliefSDK)

------
davegolland
Interesting! We over at Prismatic released our interest tagging API just
yesterday ( [http://blog.getprismatic.com/interest-graph-
api/](http://blog.getprismatic.com/interest-graph-api/) ). Seems like there's
a lot of opening up APIs going around.

------
enricobruschini
I've been developing a product with Watson from within the Partner Ecosystem,
some of those capabilities are pretty useful. Others, sometimes, are kind of
confusing, creating a broad overpopulated constellation of Watson-based APIs
inside Bluemix.

------
ConfuciusSay
Now you can buy back stock algorithmically in the cloud!

------
jcoffland
Don't pay a company to do what can be done with a library.

------
z3phyr
This

>>Speech to Text : This application only works in recent versions of Chrome
supporting HTML5 audio capture

~~~
picheny
Yeah, Chrome currently seems to have the best support for audio capture.....

------
anonbanker
Can we all just drop the charade and start calling Watson SkyNet already?

------
taf2
How do I signup and pay them money?

~~~
tparikh
Watson services on Bluemix are currently in beta. You can use the beta
services at no charge, even after your 30 day Bluemix trial, although you will
need to provide a credit card to Bluemix. You will not incur any charges
unless you use any of the production services.

------
X-combinator
The future is HERE!

------
niels_olson
In other news, Watson will be RA'd at the end of the month.

~~~
johnward
If anyone is not going to be RA'd it's Watson group. There is a lot riding on
the success of Watson.

~~~
CrazyCatDog
Agreed, it's hard to find any other marketable IP in IBM's portfolio.

~~~
mmf
Care to provide some backing info on your statement?

[http://en.wikipedia.org/wiki/List_of_top_United_States_paten...](http://en.wikipedia.org/wiki/List_of_top_United_States_patent_recipients)

~~~
warkdarrior
He said "marketable IP" (i.e., "useful IP") not "patentable IP".

