Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: When will text-to-speech replace narrators
24 points by jdmoreira 9 days ago | hide | past | web | favorite | 13 comments
When do you think text-to-speech technology will become good enough that human narrators will be replaced by software? Narration is big business nowadays (audiobooks for example) but for how long?





A good Q.

You can steal the voice of somebody like Lester Holt or Tanaka Rie with about 20 hours of audio where you manually splice segments to compose sound. Sometimes you can lift a sentence, sometimes you compose paired phonemes. It is a lot of work, but it has been possible long before deepfakes.

No matter what you will need to listen to the voice and give feedback, pro narrators have directors just as actors do and it is what makes them sound so good.


A narrator is performing art. Without true AI, you could program a given text to have dramatic pauses and inflection and tonal changes. You could probably do that with today's tech, though it would be very labor intensive to pull off. But for general narrate($book), I think it is as far off as real, general AI. This is something I don't expect in my lifetime.

I believe that the "narrator" on Superintelligence by Nick Bostrom is done with text-to-speech. It's ironic given the subject matter, but the intonation and pronunciation is just too consistent and repetitive. A lot of reviews comment on the speaker (the underlying voice is an upper class English accent with a rather actorly demeanor), but I think they're being thrown off or maybe find it convincing enough not to consider this possiblity. With normal human narrators there's always a bit of variety whereas with this audiobook it was just identical, like a machine. I ended up returning the book as it was tiresome and distracting to listen to, but it shows the potential.

As others have said, to an extent you could program this without AI using some current techniques but it would be impractical. An area that might help in this regard is efforts with GST, global style tokens, as this should allow more variation. Clearly more work needs to be done to get it to be more acceptable, but there are some examples here: https://google.github.io/tacotron/publications/global_style_...


A good Q. You can steal the voice of somebody like Lester Holt or Tanaka Rie with about 20 hours of audio where you manually splice segments to compose sound. Sometimes you can lift a sentence, sometimes you compose paired phonemes. It is a lot of work, but it has been possible long before deepfakes.

No matter what you will need to listen to the voice and give feedback, pro narrators have directors just as actors do and it is what makes them sound so good.


When you can program inflection into narrator software. Tell it where to slow down, when to emphasize a point, and wait for the next line to exactly match the next scene transition. For audiobooks it will be a lot more involved because they're more like actors. You have to program different voices, broader range and more nuance in emotions, and probably the ability to interpret more creative formatting in the book layout.

On the other hand, https://www.theverge.com/2019/8/23/20830057/amazon-audible-s...


Not an answer but I've been thinking of how that at some point in the future we'll be able to use text to speech and the tech being developed today, where we can synthesize the sound and cadence of a real person just based on a few recordings, to replace voice actors in video games.

If a console integrates the system into its OS then maybe that can help reduce file size and also allow indie devs to add a lot of voice to a game without having to record a ton.


A little before digital actors replace real life actors. And I don't mean de-aging or bringing actors back from the dead using previous footage, but a system you can give a script and some direction to and will output a authentic performance.

Well after we get a best selling book series that's completely written by a bit of software.

So, not very soon.


I've seen some really good tech on this in Seattle. They had made a version with obama's voice and then he could narrate any book. Still some things that needed to be done to make it sound more human but 90 percent there

They can make it sound like Obama, yes, but can the make it sound like Obama reading the material with inflective justice, or just a monotone with Obama speech tics mixed in?

Hmm

kop

how?.

Good



Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: