Fake Obama created using AI tool to make phoney speeches [video] (bbc.co.uk)
4 days ago | 107 comments

I was really suprised when the original paper for this didn't make it to the top of HN: http://grail.cs.washington.edu/projects/AudioToObama/

Nor any of the follow-up articles I posted. Given onslaught of fabricated "news" that spread around the last election, this type of ML technology is almost guaranteed to play a role in the next one.

After watching the video from that link, I'm very fearful for what people are going to do with this technology in the next decade. This combined with Adobe's new "speech photoshop" will make excellent propaganda.

At first I was also fearful of people using this tech to make fake "evidence" of things...

But now that I think about it, the most damage will come from making people even more incredulous... of everything.

It will become even harder to use evidence to prove a point. Uninformed people will just say "yeah sure, it's probably faked".

On the other hand... who could blame them?

I'm not surprised that a forum full of people who work in the information technology industry cannot see the epistomological failure of mass media, but these problems are not new, and the idea that nothing communicated through a screen can be trusted will most definitely cause more good than harm outside of the very near future.

Democratic institutions not only functioned but were put into effect before the advent of recorded or transmitted audio and video. Mass media has already been distorting the truth for over a century.

"Trust nothing you read online" becoming commonplace is something to be celebrated!

There's also another possibility: that media literacy will take off, and children will learn to verify sources. This was the original proposal early blogospherists: that ordinary citizens would learn to do journalism so as to keep microcommunities informed.

This is in fact happening, to everyone's benefit. Only in matters of vague mass policy (do we need giant walls? can markets sell health?) has it been largely corrupted by corporate propaganda. Which is probably not terribly worse than before. That scale has always been a big enough target to make it worth the effort to corrupt.

I agree with you in that mass media has been, willingly or otherwise, distorting the truth(s), but the easier and cheaper it is to "create fake information" the higher the verification bar for the public.

Right now, faking an interview via real-time 3d rendering plus capture of facial expressions would cost a ton of money. So anyone trying to do so would need (for e.g. to damage a competitor's reputation) both deep pockets and something valuable to gain from it (and would need to factorize the cost of getting caught, etc).

If this tech goes mainstream in the future, the cost would be so low that I can imagine some nasty things happening.

Hopefully there will be ways to detect this as this technique is further explored, so I'm not saying it's a "catastrophic development", but it certainly made me think more than just "oh wow, this is very neat!", which it also did by the way. It's incredible the kind of things researchers are able to achieve.

It'll just be 2020's "I can tell by the pixels" joke.

What we really need is simple, preferably free, tools to do this, so kids can start playing with it at an early age. That way the technology can lead to healthy skepticism.

On the other hand, there's a difference between being doubtful something is real, and being aware that people speaking in public have an agenda, and a large portion will utilize half-truths and lies more often than not.

The difference between "the President lied about WMDs and got thousands killed over it", and "maybe the president didn't actually say that - archive footage might be false".

Very simple: a video or data set that has not been verified with a private key by the issuing party on a Blockchain will not be taken for serious and be assumed to be a fabrication

That would make some things we currently rely on - like voice recordings as evidence of criminal acts nearly impossible. No one is going to sign their crimes into a blockchain.

> No one is going to sign their crimes into a blockchain.

Just like (to pick a hopefully safely historical example) no president would record himself conspiring to perform illegal activities?

What's the alternative? If it becomes trivial to fabricate voice recordings in general, surely you wouldn't advocate that we continue allowing them as evidence?

Of course not, I'm just saying throwing verifiable data onto a blockchain doesn't solve all cases we use voice data for.

Yeah, I'll assume you do not have one of 'those' uncles then, the flat-earther-ish types. Logic and numbers aren't working to begin with.

Indeed. Combine this technology with something like the video (https://www.youtube.com/watch?v=2VZ3LGfSMhA) from https://news.ycombinator.com/item?id=14101405. Instant panic.

Well there is such a thing as productive and useful propaganda. Apple knows how to get ppl to line up for blocks to sell a 100 buck phone for 600. Disney's current formula of starwars/marvel/animation is so effificent at getting ppl to shell out cash that last year they basically told analysts they had enough resources and the formula to produce a star wars type movie every week, there are just not enough ppl to buy it. So knowing how to manipulate ppl is now quite scientific. It just has to be towards a constructive end. Something beyond religious/celebrity worship and consumerism and consumption.

With such power, if all it takes is one douchebag to cause massive harm very quickly, then you should also believe it is possible for person to do massive good very quickly.

>With such power, if all it takes is one douchebag to cause massive harm very quickly, then you should also believe it is possible for person to do massive good very quickly.

I'm not sure that applies for this technology. What massive good could be done by making a fake video of someone talking?

Could undermine terrorist organizations and 3rd world dictatorships with it. Spreading a fake of Kim Jong Un saying something embarrassing in North Korea, for example. Unfortunately I think in the long run this is more likely going to be used against us effectively rather than something we can use to destabilize our enemies.

this is why i made https://TrumpTweets.io - to crowd-source and respin back his tweets in a more honest direction..

That last part doesn't follow at all. It's much easier to destroy than to create. The ability to quickly produce something harmful doesn't imply the ability to quickly produce something beneficial. We have the technology to level an entire city in an instant, but building one still takes decades.

Wow, that's absolutely incredible -- thank you for sharing.

Aside from the negative or comedy use cases, I could see this being used in video games, and probably other similar applications for narratives; based on the methodology it sounds near trivial to sync it up to any audio. Really incredible work -- love seeing this kind of research, which I've been much more exposed to doing while gamedev than I ever had been when doing web/app dev.

This would never work on someone who stumbles through their speeches like Trump.

Max Headrom had perfected this since the 80's : https://youtu.be/cYdpOjletnc

what is old is new again

What is special about stumbles that makes them impossible to fake?

Obama speaks more clearly, more calmly and more fluid than Trump, who speaks with more stuttering, filler words and with more intense facial expressions.

Nothing about either is harder in principle to duplicate. You would just take different approaches to designing the algorithms or if using deep learning be sure to include the relevant data in the training data.

When an open source package author releases a pile of source code, that person typically also releases a cryptographic checksum of the code.

In the future, when the President (or CSPAN, or CNN, or Fox News, or whoever) releases a segment (which they do all the time), they'll need to release (in a public, 'timestamped' way) a cryptographic checksum of the content.

I have many of the same fears as people here about future fake news, where the reality of something already comes as a distant second behind the outrage produced. So even if we had this big pile of content and checksums, the outrage echo chambers will still be going nuts.

But it's at least a partial technical solution to these problems.

(And I'm glossing over all kinds of other complications too, such as 'what format' and 'where does it get stored' etc etc)

I think it's reasonably certain some alternative network with these kinds of features will emerge, validating authenticity of the contents origination was a known requirement of theoretical networks , because it's hard, both technologically, but more fundamentally the sociopolitical aspects are mind boggling, loss of privacy, trust in the state, etc etc, this will take a while to resolve, but someday people will be looking back and laughing :)

I spend a non-trivial amount of time hoping that people, myself included, in the future will look back at many things and laugh. (:

I wonder if it's easier to make a fake speech of someone who talks the way Trump does? Not making this a political thing, but I've noticed that he's not very eloquent and frequently starts and stops in his speech and changes what he's saying mid sentence. Might it be easier to synthesize that type of speech pattern since any awkwardness can be hid in the erratic speech the real person employs?

I've seen claims that translators struggle with Trump because of this - accurately translating what he says comes out as incoherent, which makes people think that the translator is doing a bad job.

Although I think to some extent he relies on fragmented speech. It's all about emotional pattern recognition, rather than something that people engage with rationally. Tony Blair also tended to do this, with long verbless sentences.

To be fair, even in English he's incoherent.

I've written naive markov chains that appear way more coherent than Donald Trump on a bad day.

Politics aside, I think it is amazing that anyone would vote for the guy - I suppose it indicates how great a distance there is between the average voter and the average politician, if most can't seem to tell the difference between those that are able to string together coherent sentences and paragraphs, and those that don't.

Perhaps a stack-based representation of speech would work well? There was an interesting post on the English language StackExchange that talked about Trump's speaking style and how, due to its more off-the-cuff structure, ends up nesting various subjects in layers, pushing and popping from the stack:


Seems like the kind of verbal structure that the puzzle-minded among HN might actually enjoy piecing together actually!

Wow. That makes it easier to understand, but certainly not easy.

Imagine, if you will a world, where trump speaks for obama-care.

It makes you wonder just how far fake content--and fake content involving real people--can go.

Imagine a world where a service exists to which you can upload a dozen images of someone, along with a voice clip. In response, it can generate all kinds of videos--from the benign, to the person saying horrible racist things, to the person starring in graphic pornography.

It seems technically feasible in the medium term. But how do we react to it? Strict limits on the production or storage of these pseudo-artifacts? Criminal penalties for distribution? A cultural rejection of pretty much all video and audio evidence?

Oh man, it's gonna get way crazy. Imagine the 'mean girls' in HS with this tech. Imagine trying to be a male HS teacher with this around. We're gonna have to regulate the shit out of the internet just to keep teachers in schools and verify that they did not in fact cuss out a classroom and chug vodka.

I'm truly frightened by this. We are still struggling with how to deal with the "post-truth" world, and that's with the assumption that pictures are hard to fake, video even more so, and audio nearly impossible.

Fake news is going to reach a fever pitch when "speeches" of Obama leak saying, "We have to take all the white people's guns". And conversely, when a genuine "grab 'em by the pussy" leaks again, a huge chunk of people willfully will not believe it.

Seems possible that we could come up with technological and journalistic solutions, given enough time, but it's moving too quickly.

Yes, I think this may mean the end of any kind of political centre in the US. The different sides will simply have entirely different coverage of the same events, and will entirely refuse to believe each other's claims. Only a small distance from the current situation. But I can't see how people would voluntarily move back from that brink.

From the material posted it seems that this is basically a high-tech, convincing version of those late-night sketches where they have a fake version of their mouths moving; it doesn't actually solve the audio problem.

would be pretty easy to use an impersonator or something like lyrebird.ai to create the audio.

Yes but you can also track down the original video to prove something is fake. In order to have something truly convincing you need a video of Obama (or anyone else) that no one else has.

I have been saying for a while now that our current systems are all relying on the inefficiency of an attacker.

Soon, video and audio of an event or speech be proof of anything.

The only way to prove identity will be to have a device which can do challenge-response.

Without it, you won't be able to prove you're not a robot over the internet.


Forget "hacking elections". A botnet will be able to hack our trust in one another (see CIA reputational attacks), AI will be used to chat up girls online better than any person (see fb AI sales bots), and so on.

Computers can already beat us at Chess, Go, etc. How much different is humor, honor and reputation once companies add one more breakthrough to deep learning to model them?

An attacker that can make 100,000 jokes a second each of which is excellent? The missing breakthrough is how to automate the "human judging" factor. This is the problem when figuring out diets or treatments etc. Clinical trials take a long time. Same with textbooks.

Once we figure out how to speed that part up, we are going to be able to make AI that knows what's probably going to be funny ahead of time.

Looking ahead 10-20 years, I don't see how anyone born from 2015 onwards has any solid concept of reality the way I feel I know it. Between things like this and AR, I feel like "real" vs "simulated" will seem to the AR natives to be a pedantic distinction... Like how my parents still distinguish between "having met someone" and "someone I have talked to on the internet."

I wonder if eventually the SNR of electronic content will simply be so low that any communication of nontrivial value will simply take place over physical channels (e.g. face-to-face and trust networks), as it has for all of human history but the modern era. Or we truly embrace cryptography as a people and use it to build such networks of trust in the digital realm (e.g. news videos cryptographically signed by reporters and news organizations).

Speaking of which, the notion that one can possess irrefutable audio-visual evidence of an event is also a very recent one. Until the 19th century absolutely any news item traveled word-of-mouth (or ink) through trust networks and were absolutely subject to manipulation.

> Like how my parents still distinguish between "having met someone" and "someone I have talked to on the internet."

I am not sure that that distinction is really gone.

Where'd you two meet? On the internet.

I think it's a bit of a stretch to claim that because such an exchange is plausible there is no meaningful distinction between online and in-person communication.

I really don't think there is much of a difference anymore. I remember meeting people online and hanging out with them (talking, working on projects together, playing games, etc) and it honestly didn't feel any different than meeting someone in person and doing the same thing despite the fact that I have never seen some of these people in person.

I find it especially difficult to differentiate a meaningful distinction between the two when my interactions with people go from in person to online or vice versa. We just do the same stuff we did normally, albeit maybe those things occur in different settings such as at a computer instead of face to face at a restaurant or wherever.

If this technology merges with conversational AI improving, I think we could be in trouble. I remember that article that used machine learning on someone's chat messages who passed away and had some interesting conversations based on what the machine learning process could create that wasn't perfect but somewhat believable. Pair that with this tech and we are on the road to passing the Turing Test.

I really recommend the Adobe demonstration of their VoCo speech audio "photoshopping" because of Jordan Peele's reaction: https://www.youtube.com/watch?v=I3l4XLZ59iw.

More technology demos should have unscripted, sincere reactions like that.

> More technology demo should have unscripted, sincere reactions like that.

I'd agree with you if you scratched like that.

See here for an opinion on this demonstration: https://news.ycombinator.com/item?id=13002787

That's pretty impressive! nice demo too.

The next generation of fake news is going to be fantastic.

I look forward to a time when journalists doing journalism is what's valued in the news.

When technology to fake things is everywhere, maybe we'll pay for accurate newspapers again. With something other than our willingness to be manipulated by advertising.

> When technology to fake things is everywhere, maybe we'll pay for accurate newspapers again.

We didn't pay for them before, how could we do so again? Advertisers have always been the main funding for newspapers and sensationalism (clickbait before clicks) and propaganda have always been major influences on the medium.

Sure, things got worse for decades leading up to the arrival of internet news, and continued to get worse after that, but the golden age of readers footing the bill for accurate newspapers never happened.

This is kind of a slap in the face to anyone who's considered themselves a journalist, as well as anyone who bought a newspaper for the news. It also makes me wonder how the news industry ever got started if it never sold "news", but only something wrapped in that veil.

I don't know anything about the history of news; do you have any good reading material to recommend?

Problem is, there are always moments journalists aren't present for. Think Romney's "48%", Hillary's "deplorable", Trump's "grab em by the pussy". In all of those situations the video evidence is what proved the story. Without that all a journalist can do is trust the accounts of the people present, which aren't trustworthy at all.

You assume that news has ever been accurate. I would argue that the "news" in the past was not more accurate. The revolution that has happened is that we are much better at detecting fake news.

"A person said something? That's terrible... let me see for myself what he actually said... oh wait he said that in completely different context and didn't mean what was implied in the news at all." - Anyone who has checked primary source after reading a news article

We are no longer at the mercy of "journalists" to tell us what to think about what happened. They can be called out on their lies and misrepresentation as the same raw videos of what happened can be seen by people first-hand. How about asking person in question to clarify what they meant? We can do that too.

The news organizations, as a defense, have been publishing the "news" as "opinion pieces", "satire" and "comedy" which they can more easily walk back on when called out.

All this because they have an old population who hasn't figured out that one can actually be their own journalist a lot of the times by following people who are much closer to different sides of situation than "journalists" will ever be... and the reported news is ALWAYS a completely different version of what actually happened. When this population dies out, the mainstream fake news as we know it, will die with them.

My thoughts exactly. We're skipping fake news v2 and going straight to 3.0. Fake news has just entered a rapid release schedule. Brace yourself.

The audio itself can also be faked now.


That would not fool anyone. Impressive nonetheless.

It’ll only get better over time.... by them or someone else.

Just imagine what will happen when state actors start to use this as part of psychological warfare...

Future looks grim!

What worries me is that this is just a public research done in uni. What should we expect from secret projects then?!


Basically, we now have the video version of this graph. It came a little later than Photoshop, but as with any technology, anything that is technology possible will be implemented by somebody at some point in the future.

After the Falkland island war the British anarchopunk band Crass spliced together and leaked a tape that made it seem like the attack was a false flag and that Reagan wanted to start a war with the soviets in Europe


This demonstrates an immediate need for a trustworthy video hosting service--maybe with a companion app that records using some combination of crypto/proprietary formats/trust networks/I don't know what else, this isn't my area of expertise. Things are going to get pretty bad here soon if video evidence loses its street cred.

But how is it considered trustworthy? That's basically the central struggle every media organisation has right now and I don't see an easy answer.

Yours is an interesting idea, though. I dread to think how much it would have to exclude though - probably have no Android client because I'm sure a rooted phone could easily provide a fake video source. Maybe even on a jailbroken iOS device too. And once that becomes possible the entire platform is ruined.

Yeah, it might need to be a custom-made camera, made for news organizations and wealthy vloggers. And each device would need to be audited regularly. It would be a nightmare but not as much of a nightmare as not knowing whether anything you see on video is real or computer-generated.

Why do you need all this? Just authenticate the video file with a digital signature. It is even possible to build it into the video encoding itself, and I am almost certain there are already research papers on this, if not implementations.

Well...this kind of thing is here to stay, and at least some good may come of it. We can all be reminded that even "true" words are still just words.

An example from all of history: (This is only semi-serious, but felt like a good thought experiment).

> Oligarch to politician - "Make this economic change." > Politician to team - "Give me post-facto justification for this change I am making." aka "Spin this" > Team - Applies economics to numbers > Team to politician - "Here you go." > Politician to people - "Economics does not lie." > Vaunted economics publications - "Sold. And thanks for the like." > Economic failure > Future politician - "Well, we just didn't know then what we know now." > Historian of the future - "Their economic calculations lacked the full set of economic forces and incentives. The economists of the time were in effect hand-waving because they ignored a fundamental economic force--the oligarch. Given the size of the oversight, I'd say they were complicit."

Thoughts: --Economic policy sold without disclosure (or even acknowledgement) of these massive forces is knowingly flawed, and a willing lie to handle people. --Economic theory is rooted in psychology. When an economic decision is spun to cover hidden motives, the psychological motive basis of that instance of economics is, by definition, false. --Data can still fool good economists when it is cherry-picked; any data produced by a non-omniscient process is going to be flawed to some extent.

* This is not to say anything good or bad about oligarchs. Merely that they are a tremendous force, and economics, political policy, and civil discussion could greatly improve with a more accurate model of their effect on global systems.

* I'd love to see an economic modeling tool able to place "black boxes" where market distortions are occurring due to probable hidden forces. Captive markets are a real nuisance.

Now we just need some way to digital sign speeches (and hopefully eventually with quantum cryptography).

I'm not sure how but perhaps this could actually improve the issue of fake news with or least the assimilation if there as a broader realization how easily things can be faked... probably not though (I'm eternally an optimist).

Signatures only solve half of the problem.

What if someone is filmed doing something wrong?

They wouldn't sign it, and anyone could just say the footage was faked.

I hope they call it Fauxbama.

Time to learn how digital signatures work.

There's been a growing fear of how much dirt is going to get spilled when the current generation with omnipresent camera-phones goes to run for office. But this signals that we could be nearing the end of the age of embarrassing leaked video or audio!

From now on, it could be plausible to deny a video and say someone built a neural net and faked it - 'at least I have no recollection of those events.' (The excuse will work for supporters)

The Mitt Romney's '47% are takers' could go down historically as the last great leak where leaks could be believed.

If you've ever gone through the pain of trying to convince someone why some picture or video is an obvious fake then you know just how terrifying this technology is.

Like with other, more easily forgeable conteby, such as text quotation, we will find ways to provide a chain of trust in order to authenticate what has classically appeared much more difficult to spoof, such as voice or video. The only challenge that will remain is explaining to people that the nonsense they watch on Facebook is not factual just because it is on Facebook.

“Use the force, Harry” – Gandalf

Is the first section of the video badly out of sync for anyone else? Both the real and fake initial segments appear way off to me?

Came here to say the same thing.

Isn't this similar to Adobe project Voco? If not how are they different. There is also another tool called Face 2 face. I think overtime people will assume everything to be fake unless it agrees with their pre-conceived notions of truth.

Project Voco creates speech based off new text for a person, this creates video based off speech. When these technologies get better I assume you'd be able to use Adobe's technology to fake a speech from someone, than use this to create a video for it.

...their pre-conceived notions of truth.

This is the wrong way to think about this, I think. Think of media consumers, instead, as Bayesians. We have prior beliefs, we consume media, and our posterior beliefs are a function of both of those. As the process iterates over time, our priors are updated, but more slowly over time.

This is why children believe everything they hear from authority figures, while those with memories e.g. tend to doubt the offered justifications for the new war when we consider how similar they are to the previously-offered-but-eventually-disproved justifications for the previous wars.

Perhaps...but this has been my observation recently. A persons ideological position is always used as a gauge to judge the quality/value of his opinion on a particular topic. If a person is of/ perceived to be of an ideological disposition different then mine and then whatever he or she says gets discounted. Your argument about people being Bayesian assume that they are rational. In practice, you find people sticking to their prior beliefs even more strongly even when the evidence to contrary is stronge, so called backfire effect. In fact, a lot of people reject the evidence simply based on the Ideological disposition of the source that it comes from and terming it as XX wing conspiracy.

I guess that we will start to rely on multiple recordings of an event to consider it proven. Until, of course, 3D recreations start to appear rendering that useless as well.

Seeing is believing they say but what did people do years ago when they only ever caught a glimpse from behind a large crowd of their future president talking out the back of a caboose on the literal whistle-stop tour? Some people say they saw Abe Lincoln once, some people were even right.

I dread the narcisim this allows to unleash, as people will just select whatever they believed in the first place to be true.

The only solution to this is a education towards knowledge masochism. Destroy what you believe in, only in failure there is truth.

> I dread the narcisim[sic] this allows to unleash, as people will just select whatever they believed in the first place to be true.

I think we've gone past this point already.

This also may allow high quality educational videos to be created. (Think of all those smart people who are bad presenters)

And high quality entertainment.

Ain't all bad.

What's the benefit with a technology like this? Seems like it can only cause harm

If you were able to make it work in real-time you could do some really cool stuff with characters in multi-user VR spaces. Your avatar could be shown to move its mouth in a realistic way as you spoke to other users, which would go a long way towards building a feeling of "realness".

Well, maybe this doesn't actually answer the question but the obvious "legitimate" use would seem to be military/intelligence psy-ops.

E: maybe dubbing foreign films could work better

"And make no mistake."

and then you have this as well: https://lyrebird.ai/

Proof of provenance.

We're going to need signed videos...

Well it's better than what we have now...

