

Roger Ebert: Hello, this is me speaking - absconditus
http://rogerebert.suntimes.com/apps/pbcs.dll/article?AID=/20100226/PEOPLE/100229986

======
rriepe
I wonder if someone will develop a "The quick brown fox jumped over the lazy
dog" for English pronunciation. Something you could read aloud that would
cover all the sounds they needed to build something like this.

It'd be a cool graduate project... kinda wish I was into linguistics right
now.

~~~
DLWormwood
It would probably be several paragraphs long, at the shortest. Depending on
accent and cultural upbringing, a person varies how they'd pronounce a phoneme
depending on nearby sounds, words, or even sentences.

~~~
jimmyjim
I am very annoyed by the current brute-force, heuristic approaches to human-
sound acoustics. I wish the sounds were dynamically computerized by way of
mechanical simulations of the anatomical parts involved in human speech
articulation.

~~~
DLWormwood
> I wish the sounds were dynamically computerized by way of mechanical
> simulations of the anatomical parts involved in human speech articulation.

Actually, I think that's what was attempted in the first place in the early
80's. I remember seeing TV shows and museum exhibits that demonstrated this
approach. One I especially remember, and highly dates the efforts, had a
vector imaging display (think of the original Tempest and Asteroids arcade
games) project a silhouette of a tongue and vocal cavity to demonstrate how
the current phoneme was generated to listeners.

Of course, back then, such simulacra were limited by lack of parallel
processing power and inadequate understanding of biophysics. This lead to the
brute force "sound sampling" approach nowadays as memory became more cheap and
audio capture hardware was perfected. I do wonder if it's time to return to
vocal anatomy modeling again, now that we have a better understanding of how
to perform biometric and physics modeling via massive computational
parallelism.

~~~
jimmyjim
I imagine the reason why progress on this model has been slow is how very
extremely challenging the task is. It would require a sturdy knowledge of
linguistics, physics, computer programming, etc. The sampling model in
contrast is a piece of cake.

The anatomical model does indeed sound very interesting. Each phoneme would be
recognized as one particle, on which intonation and dynamics effects could be
applied algorithmically; and much of advancement in this area would be
employable by speech recognition models, probably increasing their accuracy by
a considerable amount.

I really hope some serious contenders step up to the plate for this.

------
AngryParsley
NPR's All Things Considered has a short interview with the CTO of CereProc:
[http://www.npr.org/templates/story/story.php?storyId=1240872...](http://www.npr.org/templates/story/story.php?storyId=124087291)
The voices aren't perfect but they're definitely better than anything else
I've heard. Ebert's voice isn't demoed in the interview. I'm guessing Oprah
wants to be the first to show it.

This combined with improved subvocal stuff like
<http://www.youtube.com/watch?v=xyN4ViZ21N0> would make silent, covert voice
communication possible. No more annoying one-sided conversations from cell
phones.

~~~
whughes
Isn't this mainly an aesthetic thing, though? Consumer text-to-speech is
certainly adequate for vocalizing almost any conversation. One could just use
that to carry out "covert voice communication."

~~~
bricestacey
At the end of the YouTube video, he mentions being able to think "nearest bus"
and having it query the internet and speak the results to you. This would
allow you to augment reality without pulling out your phone, unlocking it,
launching Google Maps, selecting your location, etc, etc. Sure, it sounds like
the flying car dreams of the last century, but given they can recognize 150
words now, it hasa lot of future potential.

------
mortenjorck
Right after Ebert mentioned Alex, I stopped reading, selected the text of the
article, went to OS X's Services menu, and listened to Alex read the rest of
it.

It reminded me how far consumer voice synthesis has yet to come, but it did
give me a better appreciation of some of the more subtle things the Alex voice
has in terms of intonation. Despite still sounding obviously synthetic, it's
obviously doing quite a bit of analysis on the sentence structure to vary the
pitch in a natural way.

But that makes me wonder: Why, when complex things like structural intonation
are already in consumer TTS products, do (deceptively) simple things like
consonant sounds and pacing still sound so stilted?

------
bmalicoat
Very cool technology, can't wait to hear what it sounds like compared to his
real voice. The only downside of course is normal folks don't have isolated
audio of themselves speaking. The company really needs to figure out how to
isolate it themselves so home movies and voicemails could be used without the
background noise affecting quality.

------
ctingom
"Shakespeare used more than 25,000 [words in his vocabulary], but he was
making up a lot of them as he went along."

------
drinian
Also see: <http://tombakersays.com/>

------
paul9290
For me although it sounds better then previous years ... the text to speech
offered by this cerpra company still sounds robotic. An area interest for my
start-up as we chose to use real actors rather then a text to speech engine.

~~~
nazgulnarsil
It appears to me that the robot sound is exacerbated by the fact that we
modify the way we pronounce words based on what other words bookend them. how
are you escaping this with live actors?

------
muffins
This service is unavailable. I get it.

