
Algorithm allows video editors to modify talking-head videos as if editing text - jonbaer
https://news.stanford.edu/2019/06/05/edit-video-editing-text/
======
jalgos_eminator
You know, I thought that deepfake videos would be politically weaponized when
I first heard about them. However, after doing more thinking on this, we have
had photoshop for 30 years already! We see photoshopped images all the time
and while some people can be fooled, many others remain skeptical of an image
then try to verify it hasn't been altered. I don't think photoshopping has
really been a big problem yet, which makes me think that deepfakes won't be
one either because it is fundamentally the same kind of deception but in video
form.

~~~
VikingCoder
All of these things make it easier to mass-produce bullshit at low cost.

I'm pretty sure I know people who have been convinced by meme quotes. A
headshot of a politician they don't like, with text overlaid, which they never
said. People are outraged! And never bother to inspect the source.

Anything that makes it easier to lie about what someone said or did, or makes
it harder to disprove... They're all politically weaponized, already.

Look at the "drunk pelosi" video.

~~~
babypuncher
The thing is deepfakes don't really make this easier. They are a lot of work
to produce and ultimately aren't a whole lot more effective than a good old
fashioned photoshop or crappy meme.

~~~
VikingCoder
The research demonstrations I've seen are sufficiently terrifying. I believe
we'll have something like
[http://www.xtranormal.com/](http://www.xtranormal.com/) for major political
figures within two years, producing deepfakes that are sufficiently realistic
that I have several relatives who will be tricked by them. Do you not know
people who will be fooled?

~~~
mrandish
People's eyes and ears may be fooled by a video but as this capability becomes
widespread, which it certainly will, I'm not so sure that many people will be
deceived in the long-run.

Technology is evolving but not in a vacuum, society's reactions also evolve in
response. Today, many people interpret video to be "evidence" but those same
people can interpret a photo to be a "claim" or perhaps some form of lower-
confidence indication. Before photo manipulation was commonly known, I think
photos were in a similar place as video - more trusted. Based on history, it's
reasonable to expect video may follow a similar trajectory as photos to
becoming less trusted in situations where it matters.

So, what happens when media types which were previously more trusted as
evidence become less trusted? The same things that happened with print, audio
and photos. Viewers will evaluate external cues such as the reputation of the
publisher and corroborating evidence. The leading indicators that we should
suspect deception will likely be similar. For example, how divergent the
behavior depicted is from expectation, how contentious the surrounding context
is and the existence of parties with an interest in creating such a deception.

This effect already happens with manipulation of intent through tricky video
editing, for example deleting the rest of a reply to a question or even
swapping in an alternate question. In the last decade I'd say the typical
person is far more aware this is possible.

So, in the near-term there may be some successful deception but in the long-
term I expect the potential value of creating such deceptions will diminish
and we'll arrive at a new "normal" much like we have now. The biggest long-
term impact may be false claims of "doctored video!" from those who were
actually caught on video doing something they didn't want seen by others. But
as we already see now, those pre-disposed to believe whatever is shown is
false will search for indications it's doctored. Those pre-disposed to believe
whatever is shown is true will search for indications it's just more
confirmation of what what they already suspected. Either way, the existing
reputation of the person shown, the distribution source and the pre-existing
knowledge of viewers will likely be more determinitive than the media itself.

~~~
bsanr2
What you're saying is that video will cease to be a useful tool for exposing
flawed-yet-entrenched viewpoints for what they are. If you have any idea of
the role expository media has played in civil rights and anti-war efforts,
this should terrify.

------
ollifi
I find it worrying that only now people start to be sceptical about visual
information. There is a huge difference between real world and the framed and
curated view photographer or documentarist gives you. If you go to school to
learn about this stuff it's mostly about how to convey your view through these
tools and the ethical implications of it.

I think people vastly underestimate how much editing and framing change the
perceived truth of what happened. It is more subtle than manipulating the
contents of video, but I think it can be in many ways more effective as most
of this stuff bypasses your cognition and is not straight up lying.

It feels the same as in written news changing the quote vs. changing text
around the quote.

I think we would be better of looking at video like it was picture drawn or
text written by someone. It's an artistic rendition of the events.

~~~
_Codemonkeyism
Every news is to some degree manipulated.

Together with a friend I was one of the geocaching and confluencing pioneers
in Germany.

Some large papers and TV stations reported about this "phenomenon" and wanted
to make an article/ documentation about us. Every one of them came with a
story in their head which we had to fill with our pictures and quotes. No one
was interested in "reality". For a news clip we had to shoot situations
several times, I remember leaving a house 5 times until they shot was done. Up
until then I thought news would be unstaged.

~~~
DoctorOetker
the moment they essentially ask you to cooperate in staging a scene, why would
you cooperate?

~~~
_Codemonkeyism
I was young and needed the money ;-)

------
keiru
The danger is not in false positives, but false negatives. The very existence
of this kind of things erode trust and sow paranoia.

A simple morph cut in a John Pilger interview of Assange made a sizeable
portion of nutjobs believe Assange has been long dead. Don't think this kind
of behaviour can't eventually extend to the mainstream.

~~~
astrodust
It's the slow erosion of video evidence being trustworthy.

~~~
anigbrowl
Cryptographic signing of video footage is a useful blockchain application, but
it will presumably be subject to the same flaws as domain security
certificates.

~~~
NullPrefix
If you can sign a video then you can sign a doctored video. If it's only your
camera that can sign the video, not you directly, don't be fooled that it will
be possible to protect the private keys in the camera from extraction.

~~~
tyrust
>If you can sign a video then you can sign a doctored video.

I don't understand your point, you wouldn't sign a doctored video unless you
wanted to do so. It's entirely possible to apply the principles of PGP to
video.

>If it's only your camera that can sign the video, not you directly, don't be
fooled that it will be possible to protect the private keys in the camera from
extraction.

I doubt this would be the solution on which the world settles.

~~~
astrodust
If you can sign anything then what purpose does a signature serve?

~~~
tyrust
You only sign what you want to sign.

When you publish a video of you speaking at a public event, you sign it. When
someone else publishes a doctored video of you, you do not sign it.

This alone doesn't protect you in the case that someone speaks and then later
intentionally doctors and signs the video in order to change what they said.
In this case trusted third parties (e.g. news organizations) could sign videos
as well. A set of signatures taken together can provide trust.

~~~
astrodust
That's not how this works though. If I want to post video clamining someone
did something, I sign it, but what does this prove? Nothing.

~~~
tyrust
Well yeah, because you aren't the subject of the video and you have no
reputation.

You wouldn't think a letter from your mom is from your mom unless your mom
signed it.

------
mendelbot
The "Ethical concerns" section in the article feels like a punt. The author
quoting "this technology is really about better storytelling" is aspirational
-- the technology's story will be written by those who use it, and you can bet
people will use this maliciously.

~~~
vernie
And that's loads more self-aware than other researchers' I've seen completely
blindsided by obvious ethical questions at end of their paper talks.

------
falcolas
Forget deepfakes and such for a moment...

Think of the impact of this on dubbing movies between languages. This seems
like an incredible tool.

Of course, we can’t just forget about deepfakes and such, but this particular
usecase kind of excites me.

~~~
guelo
I've watched many dubbed movies and tv shows and the slightly-off lip movement
never bothered me, you stop noticing it after a bit. It wouldn't be that big
of an improvement.

~~~
bradlys
Well, it bothered me. I never watch dubs - if possible. I watch subs instead.

~~~
osdiab
I prefer the subs tho mainly because the voice acting in different languages
is usually not great - movies’ original languages usually sound much more
natural.

------
imgabe
I suspect in the not too distant future we'll need a way to produce provably
true videos. I'm thinking something like the subject, a politician giving a
press conference for example, carries something that emits a signal that the
cameras encode into the video in a way that any alterations could be detected,
something like a cryptographic signature. I don't really know enough about
cryptography time be sure how / if it would work.

~~~
underwater
More likely that trusted sources become more important. A politician can share
their own stream of a speech.

~~~
britch
Right... What about when someone posts a video of them saying a racial slur
backstage.

The politician says it was doctored.

The poster says it is unedited.

How do you verify who is telling the truth.

------
eatbitseveryday
I had this idea that devices which record content like images or video should
have an unforgeable key internal to their hardware, like we have with PGP /
GPG. Content that comes from the device would be signed, and allow users to
validate whether it originated unmodified from the hardware source.

Granted, derived content will fail validation, but it will motivate tracking
down the original, until validation can be performed. Maybe you can take
pictures of fake imagery printed onto large high-def paper, but at least you
eliminate one stage in the process...

Honestly, we should not trust digital content these days.

~~~
dooglius
> unforgeable key internal to their hardware, like we have with PGP / GPG

PGP involves a private key, and if you have the private key you can "forge"
any message. If you put the key in hardware, it can be read by an adversary
with access to a powerful microscope.

~~~
thefreeman
I mean how is this any different than the "secure enclave" on iphones or other
forms of hardware security modules. Yes a sufficiently advanced adversary with
an electron microscope could possibly extract the key but it still greatly
raises the bar for 99% of other would be abusers.

~~~
dooglius
Because the sort of actors who would try to forge videos are precisely the
sort of actors who would have such advanced technology. The secure enclave on
an iphone protects against say someone trying to convert a stolen phone into a
stolen identity, not against nation-states.

~~~
DoctorOetker
what if the camera is connected to the internet, generates its own random key
every 10 seconds, signs the new public key with the previous key, and a quorum
of receiver citizens selected by sortition send their public keys, and the
camera uses threshold cryptography to send each receiver their share of the
secret frames for ~10 seconds.

The adversary would have to extract the key within 10 seconds without damaging
the security envellope of the device, which can't be powered down. If such a
camera is powered down, a replacement camera would need to be manufactured and
sent and isntalled (again by citizens selected through sortition) at the place
the malfunctioning / perturbed camera was. If the cameras cover each other
(say cameras along both sides of a street such that a camera sees 2 or more
other cameras) the perturber can be tracked, both where he came from, as well
as where he went to...

------
mopsi
Everyone seems to think of nefarious uses, but I can't wait for this tech to
appear in video calls, combined with translation. This could enable two people
without a common language to have a conversation while appearing to each other
as native speakers of their respective languages.

~~~
castis
Wow, that would be something else. I assume whoever gets that to market first
would make a killing.

------
scott_s
I imagine the most above-board use for something like this would be in
scripted tv and movies. Basically an enhanced form of ADR
([https://en.wikipedia.org/wiki/Dubbing_(filmmaking)#ADR/post-...](https://en.wikipedia.org/wiki/Dubbing_\(filmmaking\)#ADR/post-
sync)). Of course, I anticipate plenty of nefarious uses.

A place where I _don 't_ think it will be used much is actual facing-the-
camera-talking-head content. Something we have learned from YouTubers is that
audiences don't care if there are discontinuous cuts during a monologue.
YouTubers don't try to pretend they did it all in one take, and will happily
edit their video as if editing text. The cuts are obvious in both the audio
and video. And still it works.

------
nickjj
Seems really cool but I wonder how well it will handle a case where you want
to swap a phrase for a phrase, but have the new phrase have a "human specific"
emphasis or variant to it.

Example: "That was a short trip" vs "That was a reaaaaaalllly long trip".

Language is so much more than words. When you deliver the variant message,
your whole facial expression might change. So much would get lost if that
doesn't carry over. Your facial expression and tone in that context also
completely changes the meaning from you enjoyed the long trip to not enjoying
it, but how can a machine know which one to pick.

------
MichaelEstes
It's strange to me that people are so concerned about these deep fakes when
the National Enquirer has been around for so long. It's been easy to lie to
people in mass for awhile now. I don't think this changes the number of people
that are open to these suggestions, I think people in general are smarter than
a lot of people give them credit for.

~~~
meribold
On the flip side, it may become harder to convince people of the truth when
there is such a convenient way to reject unwelcome video evidence. This could
amplify echo chambers.

------
doctoboggan
Science Fiction author Greg Egan wrote a novel called Distress[0] where the
main character is a science journalist who makes documentaries. He uses
software exactly like this. The book was published in 1995. It's a very good
book and I highly recommended it and basically any other book written by Egan.
(My personal favorite is probably "Diaspora" followed closely by "Permutation
City".)

[0]:
[https://www.goodreads.com/book/show/19328253-distress](https://www.goodreads.com/book/show/19328253-distress)

------
kingkawn
This tech allows the state or corporations to quietly adjust the historical
record of their representatives words and statements to fit their ambitions at
any given point.

~~~
castis
How is that any different than the bullshit they do right now.

~~~
kingkawn
It’s not at all, that’s how you can be pretty sure it’s being done basically
as soon as the tech gets invented.

------
program_whiz
All human evidence rests upon the shaky foundation of "because I believe its
true", at the bottom of which rests the shaky foundation of your personal
experiences. Don't believe me? Just ask a schizophrenic how hard it is to
disbelieve your own experience.

Reminds me of the HP Lovecraft novel "The Call of Cthulu" Page 1, Paragraph 1:

> The most merciful thing in the world, I think, is the inability of the human
> mind to correlate all its contents. We live on a placid island of ignorance
> in the midst of black seas of infinity, and it was not meant that we should
> voyage far. The sciences, each straining in its own direction, have hitherto
> harmed us little; but some day the piecing together of dissociated knowledge
> will open up such terrifying vistas of reality, and of our frightful
> position therein, that we shall either go mad from the revelation or flee
> from the deadly light into the peace and safety of a new dark age.

------
dpau
Interested to see if counter-measures begin to be deployed in order to make a
deep fake more difficult and buy time. Incorporating dynamic backgrounds and
body gestures like touching one's face while talking.

------
lainon
Project page: [https://www.ohadf.com/projects/text-based-
editing/](https://www.ohadf.com/projects/text-based-editing/)

------
linux_devil
Just imagine people using this tool to spread false claims and propaganda. Can
we also determine if the video was actually edited or forged?

~~~
RandallBrown
Their mouth movement wasn't particularly natural in the parts where the speech
was edited.

It's good enough that it will fool some people on Facebook/Twitter, but it's
pretty far from being able to stand up to any scrutiny.

~~~
latexr
I agree that their mouth movement wasn’t particularly natural, but I disagree
that it isn’t enough to fool most of us if we’re unaware. We’re not fixing our
eyes on a person’s mouth while they talk.

Think of the scenario where someone edits a few seconds of a 30-minute
interview. They make the interviewee go from saying they hate drugs to saying
they love drugs. Even if you weren’t expecting that claim from that person,
would you go back and recheck their mouth movement, to be certain if it was
edited or not? Unlikely. Even if _you_ would, I’d wager most wouldn’t,
including most of us that could detect it.

~~~
RandallBrown
For propaganda, this will be great. The thing is, you don't even need
convincing videos for propaganda to work.

For truly scary things, like falsifying evidence, I think it will be awhile
before this will get past expert analysis, or even a group of people on Reddit
trying to prove it wrong.

In the long term, video will simply be treated like photos are now. With
disbelief.

~~~
creaghpatr
It can also be used to 'ratfuck', if a video is released with a politician
saying a gaffe, they can release a bunch of similar but fake videos and then
claim that 'all of them' are fake. The confusion would sow doubt- the
technique could be used offensively as well as defensively.

------
jonplackett
At least they can’t synthesise the voice automatically. But pair this with
Lyrebird.ai and you can basically just stop trusting all video right now.

~~~
fooey
Adobe was showing off some amazing tech for editing voices several years ago,
called Adobe Voco

[https://www.youtube.com/watch?v=I3l4XLZ59iw](https://www.youtube.com/watch?v=I3l4XLZ59iw)

------
hanniabu
Where is the audio coming from? If that was computer manufactured that was
pretty good because it sounded very natural

~~~
chefandy
The video mentions that the audio was recorded separately, and shows a few
other options like text-to-speech (which obviously doesn't match the voice)
and some smarter voice matching audio generation (VoCo) which _could_ pass for
the original voice sent over heavily compressed, low-bandwidth video
conferencing or something like that. I'm guessing that if this is used for
actual disinformation, finding a voice actor/audio engineer to try and match
the speech would be most effective.

------
vinayms
I actually love this ongoing cat and mouse game. I don't follow the events in
this field keenly so I don't know if it exists, but the challenge is to find
antidote to this concoction, created by mad scientists just for sake of
science, that will be weaponized anytime now.

------
sbhn
The algorythm demonstrated in this video, that moves the cats mouth in real
time to the naration, was written in javascript and html5 audio.

[https://youtu.be/qdN__7C5kl4](https://youtu.be/qdN__7C5kl4)

------
hasahmed
In regards to all the talk of not being able to verify your video source;
perhaps we'll go back to film for things that need to be probably legitimate.
Though perhaps that has the same issues.

------
ionwake
Is there any open source software like this available anywhere ?

------
hkon
Looking forward to the autogenerated youtube videos about various topics.
Perhaps they would be interesting to watch rather than a few images and a
robotic voice.

------
apsharma
This is gonna revolutionize the animation industry.

------
mzs
lead author's page with links to other researchers' pages, the paper itself,
and nearly 200MB of supplementary materials:
[https://www.ohadf.com/projects/text-based-
editing/](https://www.ohadf.com/projects/text-based-editing/)

------
readingnews
“Unfortunately, technologies like this will always attract bad actors,” ---
uhmmm, what is the "good actors"?? I want to take video, and delete what you
said and make you say something else. uhmmm, can't actually see the "good
actor" point of that.

~~~
pault
Making edits to scenes in movies during post production, cutting out mistakes
in broadcasts... I'm sure video production professionals could easily come up
with dozens of use cases.

------
ozychhi
Was I only one who thought of Jim Carrey in God almighty?

~~~
mnw21cam
Bruce almighty?

------
polygot
Reality TV is probably going to have a hay-day with this

------
sabujp
youtube and other video sites need a framework for content verifying videos
that haven't been digitally produced using steganography or keys injected into
every frame. This could software could be embedded in video recording devices
(keys could be updated OTA so hard hacks don't matter). If videos haven't been
"reality" verified then people can just enjoy them as fake/fiction works.
Video editing/compression software would need to be aware of the location of
the key bits and maintain them within each frame.

------
html5web
S __*. It’s scary

------
devin
people choose to believe things. The kind of people who will be duped want to
be duped as it serves their own ideology. You can’t change that.

------
visarga
Now we need cameras that watermark (using steganography) original videos so
they can be authenticated and a blockchain solution for registering originals.
Video sharing sites will need to process all their uploads to check for
modifications and serve the original file (not re-encoded) as well, for third
party checking.

