Ned Flanders is shockingly good. I'm from Australia so I don't know half these characters it has listed, but it has all the major Simpsons characters, and that's enough for me.
If I ever end up paralysed and unable to speak, you bet your ass I'm talking like Ned Flanders to everyone.
>If I ever end up paralysed and unable to speak, you bet your ass I'm talking like Ned Flanders to everyone.
That's funny. What was the sentence you typed if you don't mind me asking? I had subpar results with 50 cent, chosen randomly, and very NSFW language, but also Obama with "My fellow Americans" was strange.
I wonder how they're training and improving the models. I work on a product that could help them train/track/package/deploy/monitor models effectively. Maybe they're not up to date. There also is an issue with the language.
This got me thinking that it would be nice to have some sort of recursive processing in which if you say, that Flanders is personalizing Obama, then when you quote "My fellow Americans", you should hear an attempt of Flanders saying that the Obama's way. A way to detect fake voices, I guess :P
What would be sweet is a voice changer, where you can record yourself talking and it changes it to sound like one of those characters. That way you can intonate and say things less robotically.
I think the AI is now capable to reply to "how likely this will mean this other thing if you say it this way", that applies to text, pictures, videos, whatever, you name it. Is not intelligent in any way, but still gives meaningful responses back, which is useful for the use cases to which it is applied, search, games, etc. That been said, I have no doubts, that with time, it will reach understanding resolving basically almost any problem... but.. we always have doubts, and you can't just divine the future, look at the James Webb Telescope, we launch it to get some answers, so doesn't matter how intelligent a system may be, we would need more research, and the system will need it too, even if it's an AI, (because its needs to know things, to learn from them, in case that wasn't obvious)
In case some moron from the west comes after you, please accept my permission as an Indian to use Apu's voice. Pathetic puritans killed off my favourite character.
15.ai was the original, in a lot of regards. I used to chat with the dude on 4chan, he was a genuinely intelligent (albeit rather socially awkward) guy, and a bit of a perfectionist at that. The downside was that there was always downtime and a few hidden servers floating around. The upside was... almost perfect voice synthesis, even a year ago.
> I used to chat with the dude on 4chan, he was a genuinely intelligent (albeit rather socially awkward) guy, and a bit of a perfectionist at that.
Funny, I just wrote about this type of encounter with exceptional talented people with similar results, and while I didn't detail it in the response, several of those were from those I met on 4chan from 2005-2015 (I checked out after gamergate as it got toxic and less fun).
I hadn't seen 15.ai but its pretty accurate from what I played with. I wonder if he'd be open to see how he trains his algo (deepthroat) with the data sets he gets. He seems to not care about making money from this, or IP and hates NFTs so this would be a good indication he might be up for it.
Sidenote: Also, did you just out yourself as pony*** (brony)?
> Also, did you just out yourself as pony** (brony)?
In all fairness, if you looked up my Mastodon account I'd be outed in moments :p
It's not that big of a deal anyways. I still consider it less degenerate than the folks who burn a decade of their life working on an SAAS that they hate. At least I got to go to a few cool conventions.
The modding community also has xvasynth, which is an app you can download. It has some voices from popular games. If you don’t want Another Web Service (tm)
> you must not mix 15.ai voices with other services in the same project
Good luck enforcing that. If someone makes a fandom crossover video and the voices are only available from separate services, that’s only good retroactively if it gets popular on YouTube and someone then tattles to the 15.ai C&D team. Even then, that doesn’t take down the mirrors.
The code repos used are listed in their credits section, and it looks like a mixture of (customised?) Tacotron2, Glow-TTS, HifGan, and others. Videos are generated using Wav2Lip.
Text-To-Speech (TTS) has improved greatly over the past several years, but there's still a lot of metallic sounds in "pure" TTS implementations. I've started exploring voice style conversion, otherwise known as "voice cloning", and there are some interesting repos out there with decent results. These work differently from TTS, in that you don't type out the text to be spoken, but rather pass in an audio file of what you want the cloned speaker to say, and the system outputs an audio file with the same sounds (words, intonation) but with a different speaker identity.
This may be easier to get the right cadence and emotion in the generated audio, as text doesn't capture proper emotion and intonation. I suspect game character audio will use more of voice-style conversion instead of pure TTS simply to get the right emotional cadence of the lines being delivered.
Some interesting voice style conversion repos (in no order, just a random selection if anyone is interested in exploring):
Thanks a lot for posting this. I've been meaning to take some audio from my grandpa to resurrect my grandpa's voice before my grandma dies, so maybe I'll finally get around to doing it now.
Yes, notation seems to be critical. I wanted to see how well it can recreate the training data; so I picked Kyle from South Park and put in "they killed Kenny". I assume this should be in the training set, being a running theme in the show. The inference puts it in a question-like tone, not like the original.
If it gets good enough eventually you can bet games will do this at their core too rather than re record lines whenever anything new or different is needed. Then mods just need to add the new script.
I know at least one studio that's already using AWS Polly, (IIRC) for at least prototyping voice lines. I'm not positive that they end up in production, but I've heard samples and IMO they could fly as-is for at least informational lines. I've not yet heard TTS even attempt lines with strong emotion, though.
I have a sudden urge to want to deep fake Steven Hawking's voice. I wonder both how the deep fake tech would handle Stephen Hawking as an input, and I'm just realizing that he just missed the survival threshold for being able to use this tech.
My naive understanding is that he considered the voice he used to be his voice, and had no desire to change it.
I also read some weird teardown of all of the chips and technology used to synthesize that voice a while back and it was very interesting. I'd like to read it again and archive it.
That one is "just" a French accent on English words? and relatively few unique ones at that. probably wouldn't be hard to train or just brute force the most frequent number and duration words.
Seemingly, assuming there are 10 + 6 combinations (1-10, seconds/minutes/hours/days/weeks/months), it'd probably be less than 250 words and you could use Fiverr to get some native French speaker who knows English (and applies a heavy accent) to do it for you, for less than 10 EUR
I think seconds and minutes are too short, right? The idea is that some significant time elapsed I think. Unless the context is some normally very short event, those are too short to be useful.
You would need more than just ten -- you wouldn't want it to read digits, you would probably prefer 'sixteen hours later' over 'one-six hours later'.
It was a long while ago I last watched SpongeBob but I seem to remember that some of the timecards were comically short, like "2 seconds later", hence I included that. But I might be wrong.
An interesting site, with amazing tech once again let down by a hopeless UI. Seriously, a dropdown with several hundred options is not a great user experience. Still not as bad as the NVIDIA image generator posted a few weeks back.
Fictional characters, sure. Real people? That's crossing a line. I believe I own myself, and manipulating my face and voice to say what I did not and never would say is a misuse of my property. I can't fully justify that belief, but I would certainly be enraged beyond reason to see a representation of my principled father speaking against his values.
There are plenty of ethical uses for using this with real people. Entertainment, a language interpreter, helping disabled people communicate, etc. The lines don't change based on what technology is available, it can, however, make it easier for people to cross them.
Much as with other things, the problem isn't the creation of the lie, its the publishing of it. I can claim your dad spoke against his values, publish a fake document, draw or photoshop a photo showing him committing a horrendous crime, etc.. Publishing something anonymously as a fake primary document is vaguely new but probably not going to be too common.
It's pretty dumb to suggest that I shouldn't be allowed to create something on my computer privately that I can just imagine on my own, imo.
Notice that I didn't say it shouldn't be allowed; I said it wad wrong and would make me angry. That's an important distinction that is rarely drawn today, between what people shouldn't do and what they shouldn't be allowed to do.
Not really. You can't force people to say stuff they don't want to say, and it was always possible (although with a higher barrier to entry) to come up with recordings of people who sound like other real people. Impersonators aren't a new concept.
Just because you have a recording that sounds like person x saying thing y doesn't mean that person x actually said thing y and, critically, it never did. Nothing has changed here.
> manipulating my face and voice to say what I did not and never would say is a misuse of my property
Your appearance and the way that you sound a) aren't property, and b) aren't yours.
I want to point out that there are people that sound or look like famous characters/actors all their life and are being, in some cases, mocked for that.
Now imagine people getting sued for being themselves.
I think there should be, at most, a protection covered by libel laws as written today and nothing more. Otherwise we risk getting into that mess.
I think that some real people (public persons) are ready to see/hear that someone mocks them, and the rest is not ready for such attention. It’s interesting that our voices and appearances are not unique. I knew people who look or sound like me or like someone else, but when it becomes “and”, we read it as an indentity theft or something like that. We are not far from calls and records to be signed by personal identity keys and fingerprints included into id documents, because otherwise it would be hard to recognize who is who.
I agree. Now, who's going to be the one to step up to the plate and stop these rascals from running open-source code on their GPUs that allows for this kind of nonsense?
You've read about how, before the health consequences were widely understood, people would eat radium, or paint household objects with radium paint? This technology feels like the epistemological equivalent of making a toy out of radium.
I’ve noticed on any of those deep fake sites you can’t find Dwayne Johnson the Rock. I wonder if there is a legal reason because surely he is super famous and would be there. I’ve searched multiple ones if there is one I missed it.
As tech like this nears perfection, How long before models created using data from movie audio would be treated as unauthorized use of content by classifying it as derivative copyrighted works?
Super interesting. Bernie Sanders' voice is very good (way better than some of the others I tried such as D.Va and Richard Ayoade), but it sounds very flat. You can fake a eulogy with this, but not a political message.
To be clear, in Sassy Justice they had actors doing the voices and body movements, the only part that was deepfaked was the faces. That’s why the results were so good.
If I ever end up paralysed and unable to speak, you bet your ass I'm talking like Ned Flanders to everyone.