
Can modern Text to Speech voices replace professional voice over actors? - h99
https://play.ht/blog/can-modern-text-to-speech-voices-replace-professional-voice-over-actors/
======
stubish
I'm always surprised that the computer games industry hasn't latched onto
speech synthesis. Modern speech generation coupled with some sort of emotional
+ accent markup and a library of base voices would allow going from script
writer to voice over, without the time and expense of traditional approaches
and often poor results. You could even go procedural, dynamically generating
speech. The main character can finally get a name and stop being a silent
protagonist. Mods & DLC. Include novels' worth of content without the
immersion reduction of text boxes.

~~~
paulryanrogers
After recently playing Unavowed I kind of hope there'll always be a place for
human acting. That said, for games with no voiceover it would be nice to have
an accessible TTS option without monotonous tones.

------
mikekchar
The answer to the question: no. Not yet. The examples in the article are very
impressive, but note that they are also very short. Listen to them more than
once. There is a shape to the sentences: they start at a low pitch, rise at
the end on clauses and then finally fall at the end of the sentence.

Now imagine those same tones in this text: "Stop. I don't want any of your
damn potatoes. If you try coming on to my property again, I will set loose the
dogs." It would be perfectly understandable, but comical.

We won't get full text to speech replacement until the computer understands
what's in the text and that won't happen until we get GI. We can probably
refine it a bit, but it's always going to be a bit odd. With enough text you
will always get that uncanny valley effect.

Having said that, probably we can build systems that will allow a single voice
actor to play all parts in a production. That could have many advantages and I
expect it to happy relatively soon.

~~~
wjnc
I would disagree. Sentiment analysis is already there. Good text to speech
too. Basically one needs to use them together to give the speech engine a clue
towards the required sentiment and the speech engine enough samples to use
different intonations for different sentiments. Won't beat a skilled speech
artist, but should be more than sufficient for any mediocre quality cartoon.
The big upside is instant translation to all the languages supported by the
speech engine. Make mediocre cartoons, add speech, upload to YouTube for ten+
different locales, profit.

------
aaron695
> Can modern Text to Speech voices replace professional voice over actors?

It's a bit like saying can super intelligent AI create better Sonic the
Hedgehog games.

Once you get nice Text To Speech the fabric of society will change
dramatically.

I guess in a video game you can spend hours choosing and modifying parts of
the speech, so a professional could do great work with text to speech that in
normal society wouldn't work. But it doesn't seem like what they are talking
about.

------
terrycody
This seems promising, though I think the price is a bit too high.

I also hope there is a service that can automate syndicate the voice track
while making a Youtube video, any existing services?

Btw, what do you think this service compare with the amazon polly?

------
sytelus
WaveGlow from NVidia released at the end of 2018 is much more impressive:
[https://nv-adlr.github.io/WaveGlow](https://nv-adlr.github.io/WaveGlow).

Are there any state of the art improvements after this one?

------
loosetypes
Is there a best way to go the other way, that is, from speech voice to text?

I’ve found the transcription tools tools to have difficulty with at least the
stock computer voices.

