
DolphinAttack: Inaudible Voice Commands [pdf] - ColanR
https://endchan.xyz/.media/50cf379143925a3926298f881d3c19ab-applicationpdf.pdf
======
anfractuosity
I guess you could also use
[https://en.wikipedia.org/wiki/Sound_from_ultrasound](https://en.wikipedia.org/wiki/Sound_from_ultrasound),
which would mean unless you're in it's line of sight you shouldn't hear it.

I'm specifically thinking of the parametric array type.

I'd love to see how directional that is sometime, as there are cheapish
implementations of hardware for it.

Edit: Originally I thought a human might be able to hear their sound, but
reading a bit more I don't think that's the case since it's exploiting non-
linearity of the microphone

Also it seems ultrasound can be used to affect other MEMS devices:

[https://arstechnica.co.uk/gadgets/2017/07/sounds-bad-
researc...](https://arstechnica.co.uk/gadgets/2017/07/sounds-bad-researchers-
demonstrate-sonic-gun-threat-against-smart-devices/)

I wonder if there's any acoustic metamaterial that you could place over the
mic etc. to filter ultrasound out before it reaches it.

~~~
Animats
Those sound-from-ultrasound systems have been around for a while. The
principle is similar - both exploit nonlinearities in the receiving medium.
The sound-from-ultrasound people seem to have finally brought up the audio
quality to an acceptable level for music. The nonlinear process by which two
ultrasonic signals combine in air to produce an audible signal introduces
distortion. So the signal-generating software has to compensate somehow.

(Why is backspace so slow in this text box? Firefox 54, Ubuntu 16.04.)

------
sjbase
Seems easy to defeat with frequency bounds checks.

But presumably there exist adversarial sounds that to a human seem like music,
or gibberish, or a different phrase, but sound like "pay joe 1000 dollars" to
the machine.

The sonic equivalent to these:
[https://arxiv.org/abs/1703.08603](https://arxiv.org/abs/1703.08603)

~~~
TeMPOraL
Not sure what you mean by "frequency bounds check", but if you mean a bandpass
filter, then it is not going to save you. The attack, as I understand the
paper, works by having the AM-encoded ultrasound demodulate to the audible
frequencies _at the microphone_ , before frequency filtering and ADC. This
would imply you could use this method to inject voice even to simple recording
software (or even analog hardware!) - it's not limited to machine-learning
voice recognition systems!

The paper describes possible ways to mitigate the attack, though. One of them
is to _let the whole signal pass_ , and then try to detect ultrasonic AM in
software, demodulate it and subtract from the audible frequencies. This is
pretty much correcting for the physical structure of the microphone in
software.

~~~
sjbase
Wow, yeah you're right. I gave the feasibility analysis a more thorough read
and it seems like they're exploiting the amplifier inside the mic itself to
down-convert the frequency on portions of the signal. I don't fully understand
what nonlinearity is being taken advantage of, but to as an acoustics layman
this seems pretty impressive!

~~~
AstralStorm
Which means they're not using the membrane but the crummy amplifier
electronics. Fixed with one or two capacitors.

Alternatively they're using almost deadly signal levels to overdrive a
microphone at a distance.

------
o_____________o
Demonstration on Youtube:

[https://www.youtube.com/watch?v=21HjF4A3WE4](https://www.youtube.com/watch?v=21HjF4A3WE4)

~~~
rayuela
Found a similar system off a recommended video from that one.

[https://www.youtube.com/watch?v=wF-
DuVkQNQQ](https://www.youtube.com/watch?v=wF-DuVkQNQQ)

Accompanying paper:
[https://arxiv.org/pdf/1708.07238.pdf](https://arxiv.org/pdf/1708.07238.pdf)

This stuff is pretty damn cool!

------
Upvoter33
Thus begins the wave of security attacks on machine learning. Buckle up -
we're going to see a lot of papers like this in the next months and years.

~~~
cool_username
I agree that this and the 'adversarial road signs' work indicate a growing
interest in security attacks on ML.

[https://medium.com/self-driving-cars/adversarial-traffic-
sig...](https://medium.com/self-driving-cars/adversarial-traffic-signs-
fd16b7171906)

However, in this case, it's not just machine learning. It will affect non-ML
audio-processing systems just as much.

------
sailfast
Can this level of signal be broadcast over TVs? Could this be embedded in an
advertisement that triggers a bunch of devices to read from somewhere?

I too, eagerly await the pre-processing software that blocks all inaudible
frequencies. Not sure why this was not done in the first place.

~~~
TeMPOraL
> _I too, eagerly await the pre-processing software that blocks all inaudible
> frequencies. Not sure why this was not done in the first place._

This won't save you. By the time the signal leaves the microphone and reaches
the ADC, it's already in the audible range.

------
nayuki
This is somewhat related to the recent discussion against the distribution of
music at a 192 kHz sample rate, which would allow inaudible ultrasonic
content:
[https://news.ycombinator.com/item?id=15127633](https://news.ycombinator.com/item?id=15127633)

------
rfreytag
"Hidden Voice Commands" prior art (2016): [https://www.georgetown.edu/hidden-
voice-commands-sherr](https://www.georgetown.edu/hidden-voice-commands-sherr)

~~~
burkaman
[https://www.georgetown.edu/sites/www/files/Hidden%20Voice%20...](https://www.georgetown.edu/sites/www/files/Hidden%20Voice%20Commands%20full%20paper.pdf)

This is a little different, it's about audible commands that don't sound like
speech to a human.

------
acd
Clever attack! Then the phone AI voice command system must start to recognize
that the speakers voice is the original speaker via user finger printing. Then
the attacker will counter attack with another new system that emulates the
original speakers voice and speech rate via deep learning. You will then as a
counter measure have to press a hardware button to use voice commands. Seems
like microphone proximity sensing and frequency filtering are easy first
starts as counter measure as already mentioned in this thread.

------
bsenftner
This is like the altered eyeglass frames that make some facial recognition
systems misidentify. Carefully constructed 'inaudible' could also mean
'transparent to our perception' for this class of exploit. I could envision an
exploit that monitors the ambient noise in a location and mimics A/C startups,
vacuum running, blender, whatever could be used to cover almost anything.

------
powrtoch
I'm surprised that this works. I always assumed Siri etc. would do some bare-
minimum pre-processing of the audio input, if only to reduce noise, of which
the simplest kind would be cutting frequencies that cannot be produced or
registered by humans. Any insight into why this isn't already the case?

~~~
virgil_disgr4ce
This was my question. Seems bizarre to not bandpass everything as the first
stage!

~~~
TeMPOraL
From skimming the paper it seems they're making the signal demodulate itself
_directly on the microphone_ \- by the time it hits a low-pass filter and ADC,
the audible frequencies are already injected.

------
_jal
Eagerly awaiting audio firewalls for our phones.

[https://en.wikipedia.org/wiki/Acoustic_coupler#/media/File:A...](https://en.wikipedia.org/wiki/Acoustic_coupler#/media/File:Acoustic_coupler_20041015_175456_1.jpg)

------
greesil
If they are not using a strong enough anti-aliasing filter the emitted audio
can appear in-band looking just like normal speech.

------
yborg
Inaudible to most human beings. Your dog won't like this.

~~~
johnhenry
Just because a dog can hear something doesn't mean that it will be bothered by
it.

~~~
gnicholas
True. But isn't this basically like a dog whistle (which is at high
frequencies and is used as a deterrent)? If this were super low frequencies,
I'd be less likely to make the leap from [inaudible to humans but audible to
dogs] -> [annoying to dogs]. But if we already use high frequencies to
annoy/discourage/train dogs, isn't that evidence of what dogs think of these
frequencies?

~~~
johnhenry
It's noise from the whistle itself that drives dogs crazy, not it's pitch. The
reason this isn't basically a dog whistle is because a normal whistle produces
a bunch of random annoying noise. If you continually blow a normal whistle,
most things that can hear it will be annoyed. So, a dog whistle is simply a
normal annoying whistle that humans can't hear. This, on the other hand, isn't
a bunch of random annoying noise. If humans could hear it, it would just sound
like someone speaking. There's no reason to believe that dogs would be
especially annoyed.

~~~
logfromblammo
I'm pretty annoyed by Munchkins, and enraged by Alvin and the Chipmunks, so
maybe there is a reason.

