
Why computer voices still don't sound human - soundsop
http://www.slate.com/id/2212800/pagenum/all/#p2
======
wallflower
We are still far from the HAL 9000 in Stanley Kubrick's "2001". However, once
we are able to be understood emotionally and spoken to with humanesque
emotions, the line between personal and impersonal software will become
greyer.

Check out this online demo of the near current state-of-the-science in real-
time speech synthesis (in many languages). It does not do natural pauses, for
one thing, and there is a noticeable lack of tone and emotion. However, once
real-time speech synthesis gets to the level of National Public Radio's
"Selected Shorts" or books-on-tape, we'll be talking to our AI psychologists.

TTS demo (Flash based)

[http://www.acapela-group.com/text-to-speech-interactive-
demo...](http://www.acapela-group.com/text-to-speech-interactive-demo.html)

NPR's Selected Shorts

<http://www.symphonyspace.org/shorts>

------
whughes
The article doesn't mention this, but a potential problem: The Da Vinci Code
doesn't have <goodnews> tags. For a text-to-spech system to truly be able to
read books naturally, it has to be able to parse emotion out of a text and
figure out how to read the text based on the situation. We could have people
annotate books, but that would give the TTS limited ability.

------
Timothee
"Why is Amazon's text-to-speech system so bad?"

I find that harsh. From the sample from "The Da Vinci Code", I think it's
pretty good. It sounds a little bit like an old radio recording but it could
be mistaken for a human voice if it were not for the flat intonations.

In any case, it sounds way better than the current text-to-speech on a Mac,
which, even though has improved since 1984, doesn't sound that much different
from the original Macintosh. (still have the synthesized sound to it)

So, no it's not perfect and sounds odd especially for dialogs but I was
expecting something more like on Macs and was pleasantly surprised.

------
TrevorJ
They should go back and study Alexander Bell's work on human physiology and
the voice. I wonder if it would be possible to use virtual models of human
vocal anatomy to inform the production of more accurate sounds in realtime?

~~~
ciscoriordan
Even if you just modeled a lung and rhythmic breathing, that could be really
useful in figuring out when to put in more natural pauses.

------
diN0bot
the last recording of <good news>These cookies are delicious</good news> is
amazing. 'delicious' sounds almost human.

------
siong1987
In fact, this article missed a very strong research team in this area. Take a
look at how ATT Labs has achieved so far.

<http://www.research.att.com/~ttsweb/tts/demo.php>

Try the demo.

