
Ask HN: When will text-to-speech replace narrators - jdmoreira
When do you think text-to-speech technology will become good enough that human narrators will be replaced by software? Narration is big business nowadays (audiobooks for example) but for how long?
======
PaulHoule
A good Q.

You can steal the voice of somebody like Lester Holt or Tanaka Rie with about
20 hours of audio where you manually splice segments to compose sound.
Sometimes you can lift a sentence, sometimes you compose paired phonemes. It
is a lot of work, but it has been possible long before deepfakes.

No matter what you will need to listen to the voice and give feedback, pro
narrators have directors just as actors do and it is what makes them sound so
good.

------
sethammons
A narrator is performing art. Without true AI, you could program a given text
to have dramatic pauses and inflection and tonal changes. You could probably
do that with today's tech, though it would be very labor intensive to pull
off. But for general narrate($book), I think it is as far off as real, general
AI. This is something I don't expect in my lifetime.

------
nmstoker
I believe that the "narrator" on Superintelligence by Nick Bostrom is done
with text-to-speech. It's ironic given the subject matter, but the intonation
and pronunciation is just too consistent and repetitive. A lot of reviews
comment on the speaker (the underlying voice is an upper class English accent
with a rather actorly demeanor), but I think they're being thrown off or maybe
find it convincing enough not to consider this possiblity. With normal human
narrators there's always a bit of variety whereas with this audiobook it was
just identical, like a machine. I ended up returning the book as it was
tiresome and distracting to listen to, but it shows the potential.

As others have said, to an extent you could program this without AI using some
current techniques but it would be impractical. An area that might help in
this regard is efforts with GST, global style tokens, as this should allow
more variation. Clearly more work needs to be done to get it to be more
acceptable, but there are some examples here:
[https://google.github.io/tacotron/publications/global_style_...](https://google.github.io/tacotron/publications/global_style_tokens/)

------
admils
A good Q. You can steal the voice of somebody like Lester Holt or Tanaka Rie
with about 20 hours of audio where you manually splice segments to compose
sound. Sometimes you can lift a sentence, sometimes you compose paired
phonemes. It is a lot of work, but it has been possible long before deepfakes.

No matter what you will need to listen to the voice and give feedback, pro
narrators have directors just as actors do and it is what makes them sound so
good.

------
sp332
When you can program inflection into narrator software. Tell it where to slow
down, when to emphasize a point, and wait for the next line to exactly match
the next scene transition. For audiobooks it will be a lot more involved
because they're more like actors. You have to program different voices,
broader range and more nuance in emotions, and probably the ability to
interpret more creative formatting in the book layout.

On the other hand, [https://www.theverge.com/2019/8/23/20830057/amazon-
audible-s...](https://www.theverge.com/2019/8/23/20830057/amazon-audible-
speech-to-text-feature-lawsuit-major-book-publishers)

------
flanbiscuit
Not an answer but I've been thinking of how that at some point in the future
we'll be able to use text to speech and the tech being developed today, where
we can synthesize the sound and cadence of a real person just based on a few
recordings, to replace voice actors in video games.

If a console integrates the system into its OS then maybe that can help reduce
file size and also allow indie devs to add a lot of voice to a game without
having to record a ton.

------
kleer001
A little before digital actors replace real life actors. And I don't mean de-
aging or bringing actors back from the dead using previous footage, but a
system you can give a script and some direction to and will output a authentic
performance.

Well after we get a best selling book series that's completely written by a
bit of software.

So, not very soon.

------
nstaller
I've seen some really good tech on this in Seattle. They had made a version
with obama's voice and then he could narrate any book. Still some things that
needed to be done to make it sound more human but 90 percent there

~~~
lonelappde
They can make it sound like Obama, yes, but can the make it sound like Obama
reading the material with inflective justice, or just a monotone with Obama
speech tics mixed in?

------
jbr901
Hmm

------
yipman888
kop

------
solehshow
how?.

------
cac22
Good

