This is astonishing. I can type anything I want into the "vibe" box and it does it for the given text. Accents, attitudes, personality types... I'm amazed.
The level of intelligent "prosody" here -- the rhythm and intonation, the pauses and personality -- I wasn't expecting anything like this so soon. This is truly remarkable. It understands both the text and the prompt for how the speaker should sound.
Like, we're getting much closer to the point where nobody except celebrities are going to record audiobooks. Everyone's just going to pick whatever voice they're in the mood for.
Some fun ones I just came up with:
> Imposing villain with an upper class British accent, speaking threateningly and with menace.
> Helpful customer support assistant with a Southern drawl who's very enthusiastic.
> Woman with a Boston accent who talks incredibly slowly and sounds like she's about to fall asleep at any minute.
I don't see how a strike will do anything but accelerate the professions inevitable demise. Can anyone explain how this could ever end in favor of the human laborers striking?
I am not affiliated with the strikers, but I think the idea is that, for now, the companies still want to use at least some human voice acting. So if they want to hire them, they either have to negotiate with the guild or try to find an individual scab willing to cross the picket line and get hired despite the strike. In some industries, there's enough non-union workers that finding replacement workers is easy enough. I guess the voice actors are sufficiently unionized that it's not so easy there, and it seems to have caused some delays in production and also some games being shipped without all their voice lines.
But as you surmise, this is at best a stalling tactic. Once the tech gets good enough, fewer companies will want to pay for human voice acting labor. Unions can help powerless individuals negotiate better through collective bargaining, but they can't altogether stop technological change. Jobs, theirs and ours, eventually become obsolete...
I don't necessarily think we should artificially protect jobs against technology, but I sure wish we had a better social safety net and retraining and placement programs for people needing to change careers due to factors outside their control.
> Everyone's just going to pick whatever voice they're in the mood for.
I can't say I've ever had this impulse. Also, to point out the obvious, there's little reason to pay for an audiobook if there's no human reading it. Especially if you already bought the physical text.
Didn’t look closely, but is there a way to clone a voice from a few seconds of recording and then feed the sample to generate the text in the same voice?
I am always listening to audio books but they are no good anymore after playing with this for 2 minutes.
I am never really in the mood for a different voice. I am going to dial in the voice I want and only going to want to listen with that voice.
This is so awesome. So many audio books have been ruined by the voice actor for me. What sticks out in my head is The Book of Why by Judea Pearl read by Mel Foster. Brutal.
So many books I want as audio books too that no one would bother to record.
The ElevenReader app from ElevenLabs has been able to do that for a while now and they’ve licensed some celebrity voices like Burt Reynolds. You can use the browser share function to send it a webpage to read or upload a PDF or epub of a book.
It’s far from perfect though. I’m listening to Shattered Sword (about the battle of midway) which has lots of academic style citations so every other sentence or paragraph ends with it spelling out the citation number like “end of sentence dot one zero”, it’ll often mangle numbers like “1,000 pound bomb” becomes “one zero zero zero pound bomb”, and it tries way too hard to expand abbreviations so “Operation AL” becomes “Operation Alabama” when it’s really short for Aleutian Islands.
The level of intelligent "prosody" here -- the rhythm and intonation, the pauses and personality -- I wasn't expecting anything like this so soon. This is truly remarkable. It understands both the text and the prompt for how the speaker should sound.
Like, we're getting much closer to the point where nobody except celebrities are going to record audiobooks. Everyone's just going to pick whatever voice they're in the mood for.
Some fun ones I just came up with:
> Imposing villain with an upper class British accent, speaking threateningly and with menace.
> Helpful customer support assistant with a Southern drawl who's very enthusiastic.
> Woman with a Boston accent who talks incredibly slowly and sounds like she's about to fall asleep at any minute.