

Watson now brings cognitive speech capabilities to developers - pesenti
https://developer.ibm.com/watson/blog/2015/02/09/ibm-watson-now-brings-cognitive-speech-capabilities-developers/

======
mind_heist
Hmm ... I along with a couple of my friends spent the last two days at the
DeveloperWeek hackathon trying to explore Watson's capabilites. IBM's PaaS
solution is called BlueMix and all of Watson's capabilities are available as
Services for you to use.

We tried using the "tradeoff Analytics" service for the project - and I must
say , the tools and help available around it, the API and its documentation is
pretty bad , convoluted and unusable. This is true too other services
available through watson too.

We looked into IoT ( Internet of things ) as well . And once again , ran into
a ton of dead ends without being able to proceed. The API documentation and
examples just suck . If you are used to playing around with well documented
APIs / Tools / Languages - this is going to be frustrating.

If the OP is the person who actually wrote the article , please please please
go back to Watson Dev Cloud or BlueMix and try it out your services and APIs
an a consumer.

~~~
picheny
We really appreciate the comments and will try to fix the problems. To be
honest, sometimes developers can't see even the most obvious flaws in
documentation. If you can highlight even one incomprehensible point it would
help a lot to accelerate the revision process.

~~~
sk5t
Overall the BlueMix and Watson documentation is very green and fair to call
alpha-stage. This problem tends to all directions, although if a specific
example would be helpful, trying to discover what load balancing options exist
within BlueMix was something I tried and failed to learn last week.

For a Watson-specific example, at least a few months ago it was the case that
putting together a full-featured client implementation for Q&A required poking
around several obscure webpages and then plenty of runtime experimentation on
top.

------
frik
In 1999 IBM released a free version of ViaVoice
([http://en.wikipedia.org/wiki/IBM_ViaVoice](http://en.wikipedia.org/wiki/IBM_ViaVoice)).
IBM sold ViaVoice in 2003 and all distribution functions passed to ScanSoft,
now called Nuance
([http://www-01.ibm.com/software/pervasive/viavoice.html](http://www-01.ibm.com/software/pervasive/viavoice.html)).
Does IBM still own the whole stack, or is it based on Nuance code?

Are there plans to open up parts of the older voice technology and contribute
it e.g. to CMU Sphinx?

~~~
picheny
Sorry, I missed the question at the bottom. We were very proud of ViaVoice at
the time but to make an obvious point, the technology has moved on a lot over
the past ten or so years...

~~~
frik
The old ViaVoice can't compete with Watson Voice & Nuance but would be a good
alternative to existing open source voice technology that is years behind.
It's highly unlikely that IBM would release such tech, nevertheless it would
be appreciated.

~~~
picheny
We are flattered by the interest and will look into it; obviously given no one
has looked at this for years, it is not a likely possibility.

------
oomkiller
The ability to create your own models is very important to use this, as your
existing ones do a bad job of processing my normal speech. To test I just
tried reading a few simple phrases and the error rate is pretty high. The Web
Speech API did a great job with the same phrases
[https://www.google.com/intl/en/chrome/demos/speech.html](https://www.google.com/intl/en/chrome/demos/speech.html)

~~~
vaibhava72
Just to rule out a known issue with some laptop built-in microphones (e.g.
[https://developer.ibm.com/answers/questions/174176/speech-
to...](https://developer.ibm.com/answers/questions/174176/speech-to-text-
audio-problems/?smartspace=watson) ) - could you let us know if you tried with
an external close-talking microphone, and if that mattered at all?

~~~
oomkiller
It seemed a bit better with my bose headset, but still was quite lacking.

------
reledi
Live demos that you can play with:

Speech to Text: [https://speech-to-text-demo.mybluemix.net](https://speech-to-
text-demo.mybluemix.net)

Text to Speech: [http://text-to-speech-demo.mybluemix.net](http://text-to-
speech-demo.mybluemix.net)

~~~
sho_hn
The speech synthesis is impressive. It's still clearly a computer, but the
prosody is a step up from Google and Bing. I threw some random comments I've
written at the English Female voice model. It seems capable of contrasting
clauses via rising and falling (and handles patterns like "On the other hand,
..." and "She was either ..., or .... when ..." well), makes little dramatic
pauses after noun clusters to allow the listener to catch up, inserted a
little mental comma into stuff like "greater than x [,] half the time", put
emphasis on an "and" after a comma ("[...] something I still haven't gotten
used, and am not sure I want to") etc., lots of traits of an aware speaker.
Heck, I almost felt like it picked up speed and layered in an ounce of
incredulity when it was reading a rant I wrote, but it might just be good
enough that I can project into it on that one.

~~~
visarga
I copy pasted your paragraph into the TTS. I like it very much, but it might
just be on the level of Alex from Mac OS.

> it might just be good enough that I can <project> into it on that one.

Funny. It doesn't stress <project> as a verb, but as a noun, making the whole
phrase mean something else.

Alex has a list of words like that too: live (to live) and live (live
concert), progress (also verb and substantive), record, suspect and a bunch of
other words that have multiple pronunciations based on the surrounding words
(they are called homophones).

They should add homophone disambiguation - probably solvable with a classifier
based on features extracted from surrounding words and POS tagging.

------
Yhippa
Has anyone used Bluemix past the 30-day trial? It looks like you get 375 GB-
hours. That sounds like quite a lot of time. It sounds like as a developer I
can mess around with their beta services and not worry about paying anything.

~~~
mind_heist
You should give it a try . I have a 30-day trial account and spun up a couple
of services. Well the UI is cool , and there are a lot of templates ( boiler
plates ) that you could use to get you app up and running. But you might
struggle with respect to documentation depending on what services you use. A
lot Watson stuff is in Beta ( you could see it when you login to Bluemix) -
and you might have troubles with them.

~~~
picheny
Beta is certainly beta but if you find problems we will try to fix them as
quickly as we can. Real users of technologies tend to find issues with the
technology much faster than the actual developers...

~~~
ilyaeck
The ASR seems to be a bit immature, but the TTS sounds very nice. Any plans to
add more voices?

------
ilyaeck
It looks like IBM is getting serious about Watson, but still not serious
enough. To create an ecosystem and incentivize developers to work through all
the issues, IBM should probably create an investment fund for startups who
build their products based on Watson. Any such plans?

~~~
skadamat
They do have an investment fund for Watson products / startups --
[http://www.tefunds.com/](http://www.tefunds.com/)

------
mind_heist
There is some thing else I would like to point out as well , IBM folks on this
thread can answer the question. This is regarding the "user modeling service"
of Watson. I spoke to a couple of IBM folks and asked what are some of the
coolest Apps they had seen that was built using Watson - and someone mentioned
the following MSNBC article . It's Watson perception about the State of the
Union Speech .What the user modelling service does - is to take text as input
and sentiment analyze it ( and give outputs around it)

[http://www.msnbc.com/msnbc/how-supercomputer-sees-the-
state-...](http://www.msnbc.com/msnbc/how-supercomputer-sees-the-state-the-
union)

What the folks @ MSNBC did was to pass the last 10 SOTU speeches to Watson and
collate the results over a graph.

But, Here is why I have trouble believing Watson's perception. Try passing the
following input to Watson - (or any other gibberish)

"jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg
g;fkgljsdfgk;gjldfg dgkjldfgdhfgkjdfhjg fkldskf;ksdlf;ksdlfks
jkdhfkhsdjkfhksdhj ljfsdjfhdsjkfjskdhfkjsdhf sdfkls;dkfl; dkfl;sd;fsk
roweruoweuroiweuroiwe uweoruweoruweo ruweuro kjgsfgjkldfsjgs klfgjfdsl gjfdlkg
fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg
g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdslgjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g
dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs
klfgjfdsl gjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg dfsgj
df gfgd flg;dfg g;fkglj sdfg kjgs fgjk ldfsj gs klfgjfds lgjfdlkg fd jkldsjg
lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg fsgj df gf gdflg;dfg g;fkglj sdfg
kjgsfgjkldfsjgs klfgjfd slgjfdlkg fd jkldsjglk fdsjgdfls kg;jsf g dsfg fdg
jsdfjgdf skg fsgj df gfgdflg;dfg g;fkgl jsdfg kjgsfgjk ldfsjgs
klfgjfdslgjfdlkg fd jkldsjg lkfdsj dfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj f
gfgd flg;dfg g;fkgljsdfg kjgsfgjkldfsjgs

fkldskf;ksdlf;ksdlfks jkdhfkhsdjkfhksdhj ljfsdjfhdsjkfjskdhfkjsdhf
sdfkls;dkfl; dkfl;sd;fsk roweruoweuroiweuroiwe uweoruweoruweo ruweuro
kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg
jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdslgjfdlkg
fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg
g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g
dsfg fdg jsdfjg dfskg dfsgj df gfgd flg;dfg g;fkglj sdfg kjgs fgjk ldfsj gs
klfgjfds lgjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg fsgj
df gf gdflg;dfg g;fkglj sdfg kjgsfgjkldfsjgs klfgjfd slgjfdlkg fd jkldsjglk
fdsjgdfls kg;jsf g dsfg fdg jsdfjgdf skg fsgj df gfgdflg;dfg g;fkgl jsdfg
kjgsfgjk ....

=====

And Watson rates it as the following.

Big 5

Openness100% Adventurousness100% Artistic interests2% Emotionality1%
Imagination100% Intellect100% Authority-challenging100% Conscientiousness93%
Achievement striving94% Cautiousness57% Dutifulness1% Orderliness1% Self-
discipline81% Self-efficacy3% Extraversion1% Activity level1% Assertiveness1%
Cheerfulness1% Excitement-seeking2% Outgoing1% Gregariousness1%
Agreeableness1% Altruism1% Cooperation1% Modesty1% Uncompromising1% Sympathy1%
Trust1% Emotional range11% Fiery1% Prone to worry10% Melancholy34%
Immoderation24% Self-consciousness6% Susceptible to stress9%

Needs

Challenge61% Closeness84% Curiosity51% Excitement66% Harmony65% Ideal54%
Liberty75% Love23% Practicality86% Self-expression25% Stability60%
Structure57%

Values

Conservation78% Openness to change5% Hedonism15% Self-enhancement76% Self-
transcendence11%

========

Its plain gibberish , and you still get some results. I tried passing other
text transliterated to English and Watson still gives results like this. I
would expect it to atleast call it out as gibberish-text.

~~~
jschoudt
Sorry for the slow response, it's been internet years 8-)

We have an update coming for User Modeling (to be announced soon). After that
update, such a gibberish post will return an error.

User Modeling is based on word counting. Users should ensure that their input
is actually from a human and intelligible. The service looks for certain words
in the input, and will reject input that doesn't have enough of those words
for the service to estimate characteristics. In the upcoming release, the
documentation will explain how this works and what the relevant words are.

Also, we will provide a measurement of how accurate our results are based on
the number of words that are in the input. This should allow users to
understand the reliability of the results in the context of their application
(e.g. a casual movie recommender app might be ok with very low confidence,
while an application that makes more critical recommendations might require
higher confidence).

------
z3phyr
By cognitive speech, do they mean 'understanding' the natural language? Can
somebody please explain how is it different from other solutions?

------
npalli
What are the plans to release native versions of the api that can be plugged
into iOS and Android apps.

~~~
picheny
It's an obvious extension of what we put out; keep watching the Watson
Developer Cloud announcements.

------
grimborg
Most developers I work with already have quite good cognitive speech
abilities...

