Hacker News new | past | comments | ask | show | jobs | submit login

Now we can have audiobooks read by anyone we like!

They can direct us to our destination!

They can speak at our funeral, being long dead themselves (as long as there is sufficient training material recorded).

The future is awesome.




I legitimately think this could be huge for self-published authors. It takes a skilled professional about forty hours of work to produce an audiobook from a novel-length manuscript. Tacotron could do it in minutes.


I don't see that coming soon. The voice is one thing, but the performance goes far, far beyond that. Without understanding the text, you can't get good prosody out of a single sentence, much less developing a character for a whole performance.

You'd have to "direct" this on a word-by-word basis: "Put the emphasis here. Speed up 10% here. Decrease vocal intensity 25%". You'd end up producing a whole "score", and it would take at least as long as the human actor puts into it.

Having done that, it would be amusing to switch it from voice to voice, as a party trick. But the result would still be much poorer than you'd get out of an actor. Really solving the work of an actor is strong-AI-complete.


What about using a tablet to direct the piece by drawing? You can get values for the intensity, speed and volume (up/down) pretty easily and intuitively.

Even better if its linked to the voice generation system in real time, then you can save/redo sentences etc. as you go along.


Audio books with genuinely good performances seem rare to me. There are a handful of voice actors that stand out, but many of the titles I've listened to have very flat delivery; the first sample in the original article has as much inflection as they do.


Feed in enough "good" audio books, and you would probably get something passable for smaller titles.


AI is separating the talent from the looks across the board. As it is now, one has to both be able to act and look good, but now the AI will enable those who can act, to be re-skinned, literally, into whatever the client needs.


I did something like this before my grandmother passed. She was a teacher and loved reading books to kids. I recorded her reading Dr Seuss and the Giving Tree to my cousins so I could give my future children a glimpse of that wonderful woman.

It seems that we aren’t far from being able to take those recordings and spin it into a reading of anything. Fascinating. It’s kind of scary though. Grandma’s voice can read anything. Anything.


The emphasis on a lot of these sentences is all wrong; I wouldn't want to listen to an audiobook by this engine. It's still super impressive/terrifying though.


I am currently working on an audiobook project, called Odiobooks.com. I hope to release something soon.

If anyone's interested in the project, feel free to contact me at iamjsonkim@gmail.com.


Imagine when they can also generate the visuals of the book to show you the book as an auto generated video




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: