
The Rise of Synthetic Audio Deepfakes - ajay-d
https://www.nisos.com/white-papers/rise_synthetic_audio_deepfakes
======
eigenvalue
My all-time favorite audio deepfake is Nobel prize winner Milton Friedman
reading the lyrics to the 50 Cent track "PIMP". It really captures Friedman's
tell-tale cadence and idiosyncratic lilt:
[https://www.youtube.com/watch?v=4mUYMvuNIas](https://www.youtube.com/watch?v=4mUYMvuNIas)

------
yalogin
There is already a large problem with political ads cherry picking and slicing
audio and video to cheat viewers. I really worry that deep fakes will take it
to another level completely. I fully expect the current administration to
eagerly adopt it if available.

~~~
acruns
I just assume if a politician is speaking they are lying, at best misleading.
Add in marketing, I assume it is a lie, regardless of political party.

~~~
praptak
You still lose. There are players who don't need you to believe them, as it is
sufficient for them if people cannot trust each other.

------
kharak
This might be paranoid. But I've established a protocol with some people in my
life. Should someone with my voice ever contact them and ask for money
(because emergency bla bla), nothing is to be done until a passphrase is
mentioned. It's only a matter of time, until someone gets significant voice
data and related contact numbers and proceeds with using those voices to train
a model. Afterwards, that model will be used to real-time fake the original
voice in a scamming attempt.

~~~
grumio
That's good practice. Sadly these kinds of scams already happen without the
effort of synthetic voices. Scammers call an older person and say "Hey
Grandma, it's me your grandson. I need $500 for bail right away!" With the
help of Facebook they can learn names and details to sound more convincing.

------
paul7986
Recently a friend changed her number and told me via text. Before adding her
number i asked her a question that she and only I would know like who sat next
to you at the old office.

Think im going to keep doing this type of verification. It may annoy friends
and family, but not sure how a hacker could ever know such small details
between you and another.

~~~
Wistar
Very good idea. A few years ago, I had a "friend" text me asking for emergency
funds. It seemed wonky but within the realm of possibility. I asked a question
about a mutual friend from the past revealed it as a scam.

------
blueblisters
There is an annual challenge for synthetic voice detection, ASVSpoof, that
evaluates submissions on different types of attacks to speaker verification
systems: text to speech, voice conversions and replayed attacks.

The conclusion from the 2019 evaluation [1]: _known_ synthetic deep fakes are
fairly easy to detect using simple models with very low error rates (even
high-fidelity techniques with Wavenet vocoders).

[1]: ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection speech
([https://www.isca-
speech.org/archive/Interspeech_2019/pdfs/22...](https://www.isca-
speech.org/archive/Interspeech_2019/pdfs/2249.pdf))

------
nshm
> Deepfake technology is not sophisticated enough to mimic an entire phone
> call with someone.

With modern voice conversion technology it is perfectly possible actually.

~~~
crazygringo
Voice imitation isn't just timbre.

It's prosody, rhythm, accent, word choice, and so on.

By the time you've mastered all those, you're practically halfway to becoming
a professional voiceover artist.

Remember, trained voiceover artists have been mimicking voices for a long,
long time. Their timbre isn't always perfect, but faking voices doesn't need
deepfakes.

~~~
Enginerrrd
If you're looking for it sure. But I'm willing to bet existing technology is
sufficient to catch an awful lot of people off guard. Hearing a familiar voice
is usually quite disarming.

------
phjesusthatguy3
Audio "deepfakes" have been worked on much longer than ones for video,
although video deepfakes have the added issue of deep-faking synchronized
audio. Today's consumers don't seem to be bothered by video deepfakes if they
play to the beliefs of the audience.

------
motohagiography
Useful example is how the Joe Biden Burisma phone call that bubbled up through
Russian media was fabricated. I pulled it apart with ffmpeg and there were a
number of artifacts that showed editing and splicing.

If you're handy with ffmpeg and python, you can assess their veracity pretty
easily. Of course, if I were on a political ratf'ing team, I'd use the same
tools to add those artifacts to a copy of an offending (real but off message)
stream and amplify the distribution of that fake-faked version with a
debunking press release handy, so YMMV. While the Biden thing wasn't a
deepfake directly, (shallow fake?) we're going to see tons of actual deepfakes
around the election.

IMO, elections are no longer between candidates, they are a war on truth for
domination of the narrative - office is the effect. A campaign that focuses on
what happens once the war is over is daydreaming about the future and
distracted from the present and this will lose them key battles. For this
reason, I think deepfakes are going to be the biggest weapon in campaign
arsenals for the near future. Interesting times.

~~~
pantaloony
I’m legitimately not sure democracy will survive modern, sophisticated
propaganda techniques, plus an open, international Web, plus losing the
ability to more-or-less trust audio and video recordings that we’ve grown used
to over the last hundredish years. Between state actors and, eventually if not
already, transnational corporations waging info warfare, I kinda doubt the
institution can take it. Too much info, too fast, from too many sources.

~~~
baconforce
I wonder what encryption and key based techniques can be used to verify the
authenticity of audio and video records in the future.

~~~
ardy42
> I wonder what encryption and key based techniques can be used to verify the
> authenticity of audio and video records in the future.

None, since encryption isn't the answer to this problem. Take Romney's leaked
"47%" comment [1] or Hillary Clinton's leaked "deplorables" comment, how would
encryption have been useful to either verify the recordings' authenticity or
reject them if they had been a deepfakes? It wouldn't have, as those comments
were meant for private audiences, so neither of them would have officially
signed them. If the encryption could trace the recording back to the
individual that made it, then the leaker might decide never release the
recording (since they don't want to be outed). And if all the encryption can
do is trace back to a random device, why not just get a random device to sign
your deepfake?

[1]
[https://www.npr.org/sections/itsallpolitics/2012/09/17/16131...](https://www.npr.org/sections/itsallpolitics/2012/09/17/161313644/leaked-
video-purports-to-show-romney-discuss-dependent-voters)

[2] [https://www.npr.org/2016/09/10/493427601/hillary-clintons-
ba...](https://www.npr.org/2016/09/10/493427601/hillary-clintons-basket-of-
deplorables-in-full-context-of-this-ugly-campaign)

------
sidthekid
The images of spectrogram analysis between the real and fake voices seemed to
be distinguishable by the human eye. Can a image model be trained to detect
fake voice spectrograms based on pitch and tone choppiness?

~~~
mensetmanusman
The issue is that if you can measure it, you can train an AI to beat the other
AI detecting it.

As Pilate said: ‘Quid est veritas?‘

~~~
deepnotderp
Generation is a much harder problem than discrimination though.

------
seesawtron
This makes me wonder how would one go about adding an authentication key to
audio? We have seen in the past encryption for text shared via email and
watermarks embedded in images but I haven't come across something for audio.
Happy to hear if someone has worked in this field.

~~~
nine_k
> _adding an authentication key to audio_

I'm afraid that it's not going to work. Not more than adding an authentication
key to handwriting.

The voice _itself_ was an authentication key, as was the hand. But now these
keys are too weak and easy to forge.

I'd say we should completely stop trusting the authenticity of any recorded
voice, and maybe all voice by phone. Trust the content, the style, have a set
of agreed-upon keywords with your partner and/or your parents to check that
the other side is indeed who it alleges to be in an extraordinary situation.
Maybe not today, but tomorrow it may happen to be useful to thwart a scam
directed against you, with the scammer perfectly imitating the voice of your
loved one(s).

------
leptoniscool
How long does it take to train the model?

------
inasio
Next milestone I'm waiting for: Trump audio to Sarah Cooper video on the fly

