Hacker News new | past | comments | ask | show | jobs | submit login
Use deep fake tech to say stuff with your favorite characters (fakeyou.com)
382 points by Jugurtha on Dec 25, 2021 | hide | past | favorite | 105 comments



Ned Flanders is shockingly good. I'm from Australia so I don't know half these characters it has listed, but it has all the major Simpsons characters, and that's enough for me.

If I ever end up paralysed and unable to speak, you bet your ass I'm talking like Ned Flanders to everyone.


>If I ever end up paralysed and unable to speak, you bet your ass I'm talking like Ned Flanders to everyone.

That's funny. What was the sentence you typed if you don't mind me asking? I had subpar results with 50 cent, chosen randomly, and very NSFW language, but also Obama with "My fellow Americans" was strange.

I wonder how they're training and improving the models. I work on a product that could help them train/track/package/deploy/monitor models effectively. Maybe they're not up to date. There also is an issue with the language.


This got me thinking that it would be nice to have some sort of recursive processing in which if you say, that Flanders is personalizing Obama, then when you quote "My fellow Americans", you should hear an attempt of Flanders saying that the Obama's way. A way to detect fake voices, I guess :P


What would be sweet is a voice changer, where you can record yourself talking and it changes it to sound like one of those characters. That way you can intonate and say things less robotically.


Nice! What about Flanders doing an impression of Obama doing an impression of Flanders?


I’m guessing what you basically want is two switches:

- simulate tone and timbre

- simulate cadence and mannerisms.

Now I wonder if AI can do this or if this is a perfect example of where AI falls short?


Disclaimer: I know nothing about AI.

I think the AI is now capable to reply to "how likely this will mean this other thing if you say it this way", that applies to text, pictures, videos, whatever, you name it. Is not intelligent in any way, but still gives meaningful responses back, which is useful for the use cases to which it is applied, search, games, etc. That been said, I have no doubts, that with time, it will reach understanding resolving basically almost any problem... but.. we always have doubts, and you can't just divine the future, look at the James Webb Telescope, we launch it to get some answers, so doesn't matter how intelligent a system may be, we would need more research, and the system will need it too, even if it's an AI, (because its needs to know things, to learn from them, in case that wasn't obvious)


Yes AI can do this. It‘s called Audio Style Transfer.


JukeBox did that for music so it's not impossible ... https://openai.com/blog/jukebox/


Wow that's creepy and fascinating. It reminds me of the way I can never quite remember lyrics and also how music happens in dreams.


Obama saying "Don't have a cow man" sounds like an Australian who's lived in South Africa for many years.


In case some moron from the west comes after you, please accept my permission as an Indian to use Apu's voice. Pathetic puritans killed off my favourite character.


It's worth mentioning that one similar site is 15.ai[1]

[1] https://15.ai


15.ai was the original, in a lot of regards. I used to chat with the dude on 4chan, he was a genuinely intelligent (albeit rather socially awkward) guy, and a bit of a perfectionist at that. The downside was that there was always downtime and a few hidden servers floating around. The upside was... almost perfect voice synthesis, even a year ago.


> I used to chat with the dude on 4chan, he was a genuinely intelligent (albeit rather socially awkward) guy, and a bit of a perfectionist at that.

Funny, I just wrote about this type of encounter with exceptional talented people with similar results, and while I didn't detail it in the response, several of those were from those I met on 4chan from 2005-2015 (I checked out after gamergate as it got toxic and less fun).

I hadn't seen 15.ai but its pretty accurate from what I played with. I wonder if he'd be open to see how he trains his algo (deepthroat) with the data sets he gets. He seems to not care about making money from this, or IP and hates NFTs so this would be a good indication he might be up for it.

Sidenote: Also, did you just out yourself as pony*** (brony)?


> Also, did you just out yourself as pony** (brony)?

In all fairness, if you looked up my Mastodon account I'd be outed in moments :p

It's not that big of a deal anyways. I still consider it less degenerate than the folks who burn a decade of their life working on an SAAS that they hate. At least I got to go to a few cool conventions.


>even a year ago.

I must be getting old.


I love how if you reject their ToS they Rick Roll you. Fair play.


Uberduck[0] has a great selection of voices

[0] https://uberduck.ai


"Sign up to synthesize speech".


The modding community also has xvasynth, which is an app you can download. It has some voices from popular games. If you don’t want Another Web Service (tm)


My ancestors are smiling at me, imperial, can you say the same?


That one has way less content but the quality is perfect.


Yeah OP's does sound as robotic as ubuntu speech to text, this one is almost completely clear.


> you must not mix 15.ai voices with other services in the same project

Good luck enforcing that. If someone makes a fandom crossover video and the voices are only available from separate services, that’s only good retroactively if it gets popular on YouTube and someone then tattles to the 15.ai C&D team. Even then, that doesn’t take down the mirrors.


There's no comparison between 15.ai and other TTS.


The Chell voice is on point.


She sounds suspiciously like Gordon Freeman.


This looks like it was previously known as Vocodes, made by echelon who is here on HN:

https://news.ycombinator.com/item?id=23965787

The code repos used are listed in their credits section, and it looks like a mixture of (customised?) Tacotron2, Glow-TTS, HifGan, and others. Videos are generated using Wav2Lip.

Text-To-Speech (TTS) has improved greatly over the past several years, but there's still a lot of metallic sounds in "pure" TTS implementations. I've started exploring voice style conversion, otherwise known as "voice cloning", and there are some interesting repos out there with decent results. These work differently from TTS, in that you don't type out the text to be spoken, but rather pass in an audio file of what you want the cloned speaker to say, and the system outputs an audio file with the same sounds (words, intonation) but with a different speaker identity.

This may be easier to get the right cadence and emotion in the generated audio, as text doesn't capture proper emotion and intonation. I suspect game character audio will use more of voice-style conversion instead of pure TTS simply to get the right emotional cadence of the lines being delivered.

Some interesting voice style conversion repos (in no order, just a random selection if anyone is interested in exploring):

https://github.com/yl4579/StarGANv2-VC

https://github.com/ebadawy/voice_conversion

https://github.com/RussellSB/tt-vae-gan

https://github.com/auspicious3000/autovc

https://github.com/edresson/yourtts

Papers With Code has interesting repos there as well: https://paperswithcode.com/task/voice-conversion/latest


Thanks a lot for posting this. I've been meaning to take some audio from my grandpa to resurrect my grandpa's voice before my grandma dies, so maybe I'll finally get around to doing it now.


This list is great, thanks!


Thanks for the links! Interesting reading.


This is amazing!

I wish there was some kind of notation to help generate intended inflection and emphasis. Like <sarcasm> or /s tags

Here is my bojack. What are youuuu doing here is read flat and funny! suck a D dumb S sounds decent though.

https://fakeyou.com/tts/result/TR:n45c3yyjwrg3fqbdcxfpn3xrac...


Yes, notation seems to be critical. I wanted to see how well it can recreate the training data; so I picked Kyle from South Park and put in "they killed Kenny". I assume this should be in the training set, being a running theme in the show. The inference puts it in a question-like tone, not like the original.


One interesting application of tech like this is to produce story mods for games that still sound like they're using the original voice actors.


If it gets good enough eventually you can bet games will do this at their core too rather than re record lines whenever anything new or different is needed. Then mods just need to add the new script.


I know at least one studio that's already using AWS Polly, (IIRC) for at least prototyping voice lines. I'm not positive that they end up in production, but I've heard samples and IMO they could fly as-is for at least informational lines. I've not yet heard TTS even attempt lines with strong emotion, though.


Is it possible to create a voice changer with these kind of AI?


In principle this could be done, even with decent results.

It would basically involve a two-step approach where the first model extracts text and intonation and the second model synthesises the target voice.


I have a sudden urge to want to deep fake Steven Hawking's voice. I wonder both how the deep fake tech would handle Stephen Hawking as an input, and I'm just realizing that he just missed the survival threshold for being able to use this tech.


Hawking turned down the use of new voices! https://en.m.wikipedia.org/wiki/Dennis_H._Klatt


My naive understanding is that he considered the voice he used to be his voice, and had no desire to change it.

I also read some weird teardown of all of the chips and technology used to synthesize that voice a while back and it was very interesting. I'd like to read it again and archive it.


Could someone make one for the SpongeBob timecard voice? That would be exceptionally helpful for D&D night.


That one is "just" a French accent on English words? and relatively few unique ones at that. probably wouldn't be hard to train or just brute force the most frequent number and duration words.


Seemingly, assuming there are 10 + 6 combinations (1-10, seconds/minutes/hours/days/weeks/months), it'd probably be less than 250 words and you could use Fiverr to get some native French speaker who knows English (and applies a heavy accent) to do it for you, for less than 10 EUR


I think seconds and minutes are too short, right? The idea is that some significant time elapsed I think. Unless the context is some normally very short event, those are too short to be useful.

You would need more than just ten -- you wouldn't want it to read digits, you would probably prefer 'sixteen hours later' over 'one-six hours later'.

but yeah that's the general idea I'm getting at.


It was a long while ago I last watched SpongeBob but I seem to remember that some of the timecards were comically short, like "2 seconds later", hence I included that. But I might be wrong.


Party gets sunk in saltmarsh:

three weeks later


Liam Neeson's deep fake voice has a weird stutter when he speaks.

https://fakeyou.com/tts/result/TR:twwgqfh2432z2sq1e1k1ek4340...


I found something similar with Mark Zuckerberg's one as well (impressive apart from that) - https://fakeyou.com/tts/result/TR:y47qqrr4kv7b07qncj5cyrdaxe...


It seems longer phrases just simply won't work well. Here is Eric Cartman having a brain aneurysm: https://fakeyou.com/tts/result/TR:81fj66esxpgctxx4s443npz3gf...


And, he can't say "Irish" properly.


An interesting site, with amazing tech once again let down by a hopeless UI. Seriously, a dropdown with several hundred options is not a great user experience. Still not as bad as the NVIDIA image generator posted a few weeks back.


What! No Bender? This is the worst kind of discrimination...


You could make your own speech synthesis site. With blackjack, at least.


And hookers! Don't forget those, meatbag!


What are these sounds at the end? https://fakeyou.com/tts/result/TR:n0vhhc8z8z4a1rfa8vrpn0s1fw... Seems like the number of full stops at the end of a sentence can generate weird mumbo jumbo. It's interesting.


This is really cool.

It could use some smoothing of the AI artifacts (no idea how you’d do that though). It’s like they’re talking over a broken mic.


"The Earth cannot be saved." - Chrisjen Avasarala (Shohreh Aghdashloo)

[1] https://fakeyou.com/tts/result/TR:sxshpsntvje985ymtknpyskr04...




Fictional characters, sure. Real people? That's crossing a line. I believe I own myself, and manipulating my face and voice to say what I did not and never would say is a misuse of my property. I can't fully justify that belief, but I would certainly be enraged beyond reason to see a representation of my principled father speaking against his values.


There are plenty of ethical uses for using this with real people. Entertainment, a language interpreter, helping disabled people communicate, etc. The lines don't change based on what technology is available, it can, however, make it easier for people to cross them.


Much as with other things, the problem isn't the creation of the lie, its the publishing of it. I can claim your dad spoke against his values, publish a fake document, draw or photoshop a photo showing him committing a horrendous crime, etc.. Publishing something anonymously as a fake primary document is vaguely new but probably not going to be too common.

It's pretty dumb to suggest that I shouldn't be allowed to create something on my computer privately that I can just imagine on my own, imo.


Notice that I didn't say it shouldn't be allowed; I said it wad wrong and would make me angry. That's an important distinction that is rarely drawn today, between what people shouldn't do and what they shouldn't be allowed to do.


> Real people? That's crossing a line.

Not really. You can't force people to say stuff they don't want to say, and it was always possible (although with a higher barrier to entry) to come up with recordings of people who sound like other real people. Impersonators aren't a new concept.

Just because you have a recording that sounds like person x saying thing y doesn't mean that person x actually said thing y and, critically, it never did. Nothing has changed here.

> manipulating my face and voice to say what I did not and never would say is a misuse of my property

Your appearance and the way that you sound a) aren't property, and b) aren't yours.


I want to point out that there are people that sound or look like famous characters/actors all their life and are being, in some cases, mocked for that.

Now imagine people getting sued for being themselves.

I think there should be, at most, a protection covered by libel laws as written today and nothing more. Otherwise we risk getting into that mess.


I think that some real people (public persons) are ready to see/hear that someone mocks them, and the rest is not ready for such attention. It’s interesting that our voices and appearances are not unique. I knew people who look or sound like me or like someone else, but when it becomes “and”, we read it as an indentity theft or something like that. We are not far from calls and records to be signed by personal identity keys and fingerprints included into id documents, because otherwise it would be hard to recognize who is who.


I agree. Now, who's going to be the one to step up to the plate and stop these rascals from running open-source code on their GPUs that allows for this kind of nonsense?


I don't want to force them; I want them to refrain.


And if they refuse?


I will be sad.


Precisely.


There are things society allows you to do in private, but not in public.


Not bad, but it seems to ignore newlines in the input. I pasted in newlines in the famous Pulp Fiction quote: https://fakeyou.com/tts/result/TR:d1yw37dftm42ayt34h17scy1n8...


We (Replica Studios) are releasing a major quality update next year. Here's a preview: https://replicastudios.notion.site/Preview-Voice-Quality-Upd...


What are the main differences between you guys and Resemble.ai?


You've read about how, before the health consequences were widely understood, people would eat radium, or paint household objects with radium paint? This technology feels like the epistemological equivalent of making a toy out of radium.


I’ve noticed on any of those deep fake sites you can’t find Dwayne Johnson the Rock. I wonder if there is a legal reason because surely he is super famous and would be there. I’ve searched multiple ones if there is one I missed it.


From https://fakeyou.com/about

> We'll be happy to remove any of the voices featured here for any reason.

Maybe they've requested to be removed from the site. Impossible to know though, and they are unlikely to acknowledge it if you ask.


I stand corrected one of the sites linked was uberduck and it seems to have him but I was unable to get it to work today.


As tech like this nears perfection, How long before models created using data from movie audio would be treated as unauthorized use of content by classifying it as derivative copyrighted works?



I chose "Batman" and used as test "this is a test". On the word "test" it sounds completely metallic distorted and glitchy.

Which characters seem to work well?


I checked out various and wasn't impressed by most, except most notably Nas and Yennefer. Its also biased towards English.


Majel Barrett-Roddenberry for Alexa's next iteration, for turn-by-turn navigation.

This is the perfect application for this technology.


Looks like a lot of these have been pulled from songs/movies, so their voice pattern follows of that media type.


Are those TTS models downloadable ?


Too bad I couldn't find the HEV suit voice from Half-Life. At least it has GLaDOS though.


I have always wanted to hear Ayn Rand call for workers to throw off the shackles of bourgeois capitalism. And now I can. Thank you, Internet!


I let (robot) Ayn Rand read a youtube comment I found funny..

https://fakeyou.com/tts/result/TR:ev54r4q5b1txx28k32e1s5nw6t...


These are cute but not even in the same league as Resemble.AI


Getting failed attempts for the Arnold voices


Someone do data


No Steve Jobs. Which got me thinking would they get sued by Apple if they had it?


Jobs' estate, maybe. Hopefully Apple doesn't own his likeness.

But really it seems like fair use IMO. these short clips seem very benign.


Super interesting. Bernie Sanders' voice is very good (way better than some of the others I tried such as D.Va and Richard Ayoade), but it sounds very flat. You can fake a eulogy with this, but not a political message.


Neat


still no DOOM (MF)...


My favorite deepfake is Sassy Justice by the South Park creators Trey Parker and Matt Stone:

https://www.youtube.com/watch?v=9WfZuNceFDM


To be clear, in Sassy Justice they had actors doing the voices and body movements, the only part that was deepfaked was the faces. That’s why the results were so good.


Yes, and they cherry-picked the results from endless tuning.


oh wow the ad breaks in this caught me off guard in the best way, thank you for sharing this!


This is great. Thank you for sharing. I'd never heard of this!


That's scary. Also, I am missing Cartman in the list of voices. And Ben Shapiro's voice talks way too slowly :D




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: