
Expressive Speech Synthesis with Tacotron - daisystanton
https://research.googleblog.com/2018/03/expressive-speech-synthesis-with.html
======
coryfklein
This is the audio equivalent of the Face2Face algorithm that takes one
person's face and places it onto the character in a video, matching the latter
subject's expressions.

This means we now live in a world where you can create a recording of Donald
Trump saying, "I colluded with the Russians to rig the election," and not only
have the voice sound like Trump but also bring along his personal expressive
style so that it becomes indistinguishable from Trump himself.

Would love to see these two combined - make an audio-video recording of an
actor confessing to election fraud, then use Face2Face to swap in Trump's face
_and_ use Tacotron to swap in his voice.

~~~
vbezhenar
Also this means that any confession evidence can not be trusted.

------
modeless
Note that this is separate from the other front page post about Google Cloud
TTS powered by WaveNet. That's a product, while this is exciting new research
(which will hopefully become part of a product).

------
visarga
This technology has been around for a year but we only got a few samples. I'm
very excited. I use TTS to read back all the text I consume on PC.

This web demo allows you to enter your own text:

[https://cloud.google.com/text-to-speech/](https://cloud.google.com/text-to-
speech/)

(select US American and Wavenet)

~~~
rryan
> This technology has been around for a year but we only got a few samples.
> I'm very excited. I use TTS to read back all the text I consume on PC.

Nope! These two papers are fresh work on prosody modeling. You can see the
evolution of work this team has been publishing about here:

[https://google.github.io/tacotron/](https://google.github.io/tacotron/)

> This web demo allows you to enter your own text:

That web demo is unrelated to this. It's about a Google Cloud TTS API, which
only includes WaveNet, not Tacotron.

[https://cloudplatform.googleblog.com/2018/03/introducing-
Clo...](https://cloudplatform.googleblog.com/2018/03/introducing-Cloud-Text-
to-Speech-powered-by-Deepmind-WaveNet-technology.html)

------
polishTar
[https://google.github.io/tacotron/publications/global_style_...](https://google.github.io/tacotron/publications/global_style_tokens/demos/gstwn/gstwn_vs_g_2.wav)

Ha!

~~~
aero142
A bunch of RPG games are going to have much better character voices after
this.

------
John_KZ
Well, I guess it's time for authenticated phonecalls.

------
aaronharnly
Congratulations Daisy! This work is really impressive (and quite fun).

