
Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis - daisystanton
https://google.github.io/tacotron/publications/text_predicting_global_style_tokens/index.html
======
buss
Wow, these audio samples are incredible. I'm surprised to hear the model
actually outputting natural-sounding breathing between and inside sentences.
Most TTS systems explicitly remove things like that, but the addition of
breathing makes it sound so much more natural.

The style tokens result in pretty incredible and realistic audio.

~~~
nmstoker
If you want to see some more research samples check out this link:
[https://google.github.io/tacotron/](https://google.github.io/tacotron/)
What's especially impressive is how fast they're moving along with new ideas
(see the dates) Bear in mind that the WaveNet outputs are likely to be pretty
slow to generate (but they do yield remarkable quality!)

------
zestyping
Good heavens. These give me the shivers. In some of these samples you can hear
breathing, emphasis, and even what sounds like genuine emotion.

------
sgillen
This seems like it could be great for automatically generating audio books.
Personally I would one day like to have a program that can read arbitrary text
to me in a more or less human way, that would allow me to read papers for work
while driving.

~~~
ChuckMcM
You could except you would get sued. The Kindle 2 was announced with a feature
that would read the book to you and Amazon landed in court.
[https://sunsteinlaw.com/read-it-aloud-and-weep-
controversy-s...](https://sunsteinlaw.com/read-it-aloud-and-weep-controversy-
surrounds-text-to-speech-feature-of-amazons-kindle-reader/)

~~~
jscholes
> You could except you would get sued. The Kindle 2 was announced with a
> feature that would read the book to you and Amazon landed in court.

But yet here we are, 9 years later, and the Kindle apps on Android, Windows
and iOS support screen reader access to books. Those screen readers can use an
array of voices, undoubtedly including the speech engines used in the original
TTS feature written about here.

