
Audio Fingerprinting using the AudioContext API - gootel
https://iq.opengenus.org/audio-fingerprinting/
======
randomwalker
Just to clarify, this is not a new finding, but an explainer of a study from
2016.

Note that this fingerprinting technique exploits differences in the behavior
of the AudioContext API, but does not (and cannot) actually record audio.

Paper:
[https://webtransparency.cs.princeton.edu/webcensus/index.htm...](https://webtransparency.cs.princeton.edu/webcensus/index.html)

Demonstration (test your own audio fingerprint):
[https://audiofingerprint.openwpm.com](https://audiofingerprint.openwpm.com)

Discussion from 2016:
[https://news.ycombinator.com/item?id=11729438](https://news.ycombinator.com/item?id=11729438)

Full list of websites where audio fingerprinting scripts were found (in March
2016):
[https://webtransparency.cs.princeton.edu/webcensus/audio_fp_...](https://webtransparency.cs.princeton.edu/webcensus/audio_fp_scripts.html)

Source: I'm an author of the research in question (but unaffiliated with this
blog).

Note to mods: article title is "Audio Fingerprinting using the AudioContext
API". Submitter title is "Sites are using audio (no permissions needed) to
track users", which may violate the site guidelines.

------
Thorrez
Some users on Stack Exchange/Stack Overflow saw that ads were messing with
audio APIs and thought the ads were trying to play audio. It turns out the ads
were actually doing audio fingerprinting.

[https://meta.stackexchange.com/questions/331960/why-is-
stack...](https://meta.stackexchange.com/questions/331960/why-is-stack-
overflow-trying-to-start-audio)

~~~
OrgNet
do we really need another reason for blocking all ads?

~~~
badrabbit
We can block them but what is needed is laws. They'll always find a way. I
mean just using your IP enough ad retargeting can be done, their hostility has
no consequence.

~~~
catalogia
Even if they were legally forbidden from tracking people, ad blocking would
still be necessary. Untargeted ads are only _marginally_ less unethical than
targeted ads. Even untargetted ads exploit peoples insecurities to sell them
things they're better off without. (e.g. cola advertisements that depict young
attractive socially active people drinking their product.) That's no less
manipulative on a billboard on the side of the road than it is on your
personalized facebook feed.

~~~
eru
Are you sure you are not proving too much here?

~~~
catalogia
Can you rephrase that question? I did not set out to prove anything, so I'm
not sure what you mean.

~~~
eru
Sorry for the technical jargon, I was referring to
[https://en.wikipedia.org/wiki/Proving_too_much](https://en.wikipedia.org/wiki/Proving_too_much)

Basically, your argument seems fully generalized:

> Even untargetted ads exploit peoples insecurities to sell them things
> they're better off without.

Eg I assume you do not want to demonize bakers selling bread to hungry people?
Or more generally, anyone selling any product you do not agree with to someone
who has a need you don't like?

So where do we draw the line, and why?

------
teddyh
For those who might only have read the headline:

“ _This process doesn 't require access to the device permissions like
microphone or speakers. No audio is recorded, collected or played by any
means. It gathers the audio signature of a user's device and uses it to create
an identifier to track that user. It simply relies on the difference in the
way these generated signals are processed on each device._”

~~~
ehsankia
I would like to know

1\. How consistent the output is on a given device (is it 100% deterministic
and will always produce the same fingerprint?)

2\. How big is the variance across devices (how good of a fingerprint it is).

The article doesn't really do much at digging into that, which really is the
most important thing if you want to use it as a fingerprint.

I'm not quite sure why different devices would generate different output,
other than differences in floating point computation.

~~~
o-__-o
An AudioContext provides properties of your machine's audio stack to
JavaScript as an API

[https://developer.mozilla.org/en-
US/docs/Web/API/AudioContex...](https://developer.mozilla.org/en-
US/docs/Web/API/AudioContext)

Everything is different for every version of firmware, hardware, driver
revision, etc. it’s like enumerating fonts, it combined with other
fingerprinting techniques provides a very unique snapshot of you.

Eff’s panapticlick provides an audio fingerprinting example. There exist
plugins which will provide some entropy to your fingerprint which also changes
over time. That helps a little, but any changes you make really makes you
stick out like a sore thumb (your audioprofile becomes even more unique)

~~~
sriku
I think this is mistaken. The device API supposedly provides this, but the
info you get is always generic due to fingerprinting protection. Also,
AudioContext is expected to be active only when created within a user action
like a button click. Otherwise it doesn't run (implemented in Safari).

This is a pain - on the one hand, the browser vendors and w3c are locking down
API capabilities to prevent fingerprinting and timing based security hacks,
but these are interfering with genuine needs to provide audio functionality.
For example, you can't determine whether the user has 3 audio devices
connected and prompt them to select one for output. So you really can't build
desktop quality audio systems with webaudio and siblings.

~~~
Mirioron
> _For example, you can 't determine whether the user has 3 audio devices
> connected and prompt them to select one for output._

Frankly, this really should be handled by the browser. JavaScript really
shouldn't have any business messing with that.

~~~
sriku
That's reasonable for just output selection. So shouldnt be able to send audio
to all the devices? .. and different audio streams? .. which is all
commonplace in the desktop world.

~~~
Mirioron
Sure, but the browser should handle that logic. Even if the website prompts a
user to select an audio device it should be handled on the browser's side
without feedback to the website.

------
masswerk
Is there sufficient information in this to identify individual machines? E.g.,
consider a family acquiring notebooks of the same type for each family member
and operating them over the same LAN (outward IP address). There may be
differences depending on components picked from a supply pool, but in general,
there isn't much variety in a specific make and model running on the same OS
revision (and the same generation of drivers, according to automatic updates).

Further, is there sufficient isolation in the audio stack so that fingerprints
are independent of the software currently running on the machine? (Another
comment regarding DACs and insufficient isolation from the north bridge and
induced harmonics indicates otherwise.)

I guess, while this may provide some indications, on its own, it provides
insufficient information for identification of a specific hardware device.
(Edit: Which is, of course, bad enough.)

~~~
jacquesm
There doesn't have to be. All it takes is to extract a few more bits. When
combined with all the other bits it isn't all that long before you have 33 of
them an that is enough to uniquely identify you.

~~~
masswerk
Yes, more like another piece to the puzzle.

------
tinus_hn
Ultimately the only way out of all this tracking is regulation.

~~~
cestith
You have proposed a social solution to a technical problem. Why does
JavaScript offer a way to enumerate things about the client cat the server has
no need to know?

~~~
runarberg
I think tracking is pretty much a social problem. The fact that it is
technologically possible is a necessary but not sufficient to make the problem
technical.

As a comparison, think about speeding. It is technologically possible because
there are cars that can pass most speed limits. I’m sure we can find a
technological solution that eliminates speeding. But for the most part,
regulations and enforcement is a sufficiently good measure.

~~~
cestith
You can absolutely govern a car's speed electronically.

You can also make JavaScript able to figure out how to play sound without
letting it disclose what it knows about your sound system to a remote machine.

------
rrix2
Is this why Firefox for Android will occasionally have non-dimissable media
control notifications load on web sites which don't have audio? The amount of
times I've had to force-close after opening a NY Times link is infuriating

~~~
daveoc64
No, that's just a bug in Firefox.

------
liveoneggs
is this why I can hear my headphones click when browsing websites?

~~~
o-__-o
No that is because your computer and all parts within must accept
interference. And your motherboard’s $0.07 embedded DAC really wasn’t designed
with isolation in mind. So you “hear” your north bridge bus because as your
processor speed changes its harmonics induce interfere with the DAC.

If this is a laptop, you could disconnect all external power sources (such as
charger, external HD’s etc) and those clicks and noise will be lowered (but
probably still noticeable at high gain). If you purchased a power conditioner
and used a high quality USB3 audio interface (read: isolated DAC) you wouldn’t
hear any clicks.

~~~
liveoneggs
it's a macbook pro with airpods so none of that stuff applies

~~~
o-__-o
You think your brain magically decodes digital signals? No, the DAC in your
AirPods do that. Your AirPods are isolated since they are battery driven, but
I’m sure it’s at the mercy of whatever noise is in your atmosphere.

Also you should lower your audio expectations with AirPods, they are hardly
audiophile quality

~~~
Dylan16807
Wireless interference to headphones isn't anywhere near the 'audible click'
level. Especially when they're supposed to be idling.

~~~
o-__-o
I don't think you wish to hear that your airpods suck because of the price you
paid. However Apple is well known for taking products and charging a premium
for. I assume you have this problem on all headphone types (bluetooth or
hardwired) and across multiple USB audio devices? Or is it limited simply to
the airpods which should work noise-free like other headphones of its class
including Sennheiser, Bose, and Dr. Dre Beats.

------
kjlriouee
I don't quite understand how this attack works.

They are constructing a simple audio synthesis graph and rendering it for a
few seconds.

But since all the audio processing happens in the browser, without OS
involvement, how can the fingerprints be different between say 64 bit Firefox
browsers running on 64 bit Windows?

I can understand differences between different browser binaries, since
optimizations can slightly change the output due to floating point order of
operation, but can a particular browser binary generate different
fingerprints?

~~~
MauranKilom
Example: On Windows, a sin() call usually goes into msvcrt.dll, and can thus
return different results on different machines for the same binary. To name
one possible reason: Different vectorization (due to different SSE support).

Corollary: Don't link to the CRT sin/cos/etc. if you want your binary to give
the same result on different machines.

------
VonGuard
This also happens on TV, and you probably have no idea. There are applications
on iPads and phones that work with ratings firms, like Nielsen and so forth.
These firms work with content producers, and they embed inaudible tones into
TV shows and even advertisements. The phones and iPads pick up this tone and
report back that the user is, indeed, watching said TV show or ad. It's a way
of confirmed tracking of views.

~~~
thenewnewguy
That's a pretty incredible claim, you got a source on that? My searches can't
find anything.

~~~
mikewhy
While not this exact claim, years ago I wrote an app for a TV show to detect
ultrasonic tones embedded in episodes of the show. While not solely for
tracking (it was to sync a clock to the episode you're watching), there were
special tones to trigger ads.

I could easily see this being expanded upon and included in a tracking SDK.

~~~
rimliu
How did you get around the permission to use the microphone restriction? And
if it is in use the status bar turns red on iOS.

~~~
layoutIfNeeded
>How did you get around the permission to use the microphone restriction?

An app can already have recording permissions for legitimate reasons.

>And if it is in use the status bar turns red on iOS.

It only turns red if a background app is recording.

------
neiman
I'm asking out of curiosity, not in order to take a stand. What's the harm of
making fingerprinting illegal? Which good use-cases exist for it?

~~~
pmiller2
I’ve read that financial companies use it for fraud detection. If a known
fingerprinted client tries a bunch of different logins, they can flag that.

That said, I’m not convinced full on fingerprinting is necessary here. I
suspect you could do the same thing using IP addresses and it would work at
least 80% as well.

~~~
mdominguez
I work on fraud prevention for a web company that provides financial services.
My last job was the same thing. From the more technical side, not only there's
a need to understand business, but to think about hard problems like these.

An IP address is not a good indicator and wouldn't replace fingerprinting. IPs
may change over time, there bay me non-static IP addresses from residential
connections (so, not only data centers) and today, in our mobile world, change
much more frequently than in the past.

IP is just another marker that can be useful, sometimes. Even the subnet may
be useful. But unfortunately, for fighting fraud we have to rely on techniques
such as a device fingerprinting with the canvas exploit. There's a much
simpler approach, though, but it works only on some occasions: a cookie.

So, you just check if the cookie is present and it matches the previous cookie
from the same user. Done, the device matches and you're good to go (keep in
mind that if someone owns your device and credentials, there's not that much
we can currently do - although the behavioural biometrics proponents would
have you believe otherwise).

But what if there's no cookie be cause the user logged out or opened their
browser using incognito mode, or just changed browsers. In that case, we would
have a false positive for the user having and using a new device. Which, from
our point of view, highly correlates with fraud. This is industry-wide, from
the fraud prev POV and not just some specific business (like, for example, an
ecommerce website), at least most of people I have spoken with over the years
have mentioned why fingerprinting is really important, and I've seen it first-
hand.

So, we don't sell your data. We're not looking to match you with... whatever
you can come up with in terms of a fingerprinting-data-matching-nightmare. In
most cases, the only people that have the fingerprinted data are from the
fraud prevention team. And we generally hate bad players, both from outisde
and the inside of the company.

What we wanna do (and, again, this is generally) is try to create a better
user experience for our good users. So we may relax some rules if your device
is known. Or we may give you access to some features that other users don't
have (let's say, a beta for a new service that we start offering).

This works by collecting as much data as possible from the device and then
trying to differentiate small changes (let's say, your internal storage free
memory in MBs) from big changes that could in fact mean that the user is using
a new device.

So, for example, we could force you to go to account verification to login to
a new device vs relaxing some rules about login from a good, trusted device
for that user.

I'm sure there are exceptions, and that there may be some bad players abusing
their fingerprinting capabilities. But at the same time, I'm pretty sure that
most people are not OK with using that data with another purposes - even the
execs. And even if we did, let's say, track our ads in a way that when you
sign up we get an ID related to a particular ad that we ran - we can see that
although you're a new user and by extension you have a new device, you still
came to our business because we placed an ad. Which we couldn't do another
way, and then the UX suffers because of decisions made to deal with that.

What I'm trying to get across with all of this is: fingerprinting is, in fact,
very useful for fraud prevention, and I would argue that disabling the Canvas
API exploit would affect most, if not all, machine learning models for fraud
prev running on production.

EDIT: and, BTW, most companies that are trying to buy data from other
companies are trying to get user behavior. What your users are doing in your
app, maybe involving their product in some way (i.e. you're Spotify and are
trying to get data from Shazam in order to understand user behavior with
regards to the type of songs they've shazamed in the past). Again, I'm NOT
saying that there may be companies tying data from outside sources that are
iffy at best. And at least the more modern companies I've work at, they're not
cool with merrily sending data over to another company, even if they pay. It
seems like everyone is starting to understand that their data is as important
as their intellectual property.

~~~
XCSme
What if you just use a remote device and change it each time (eg. get a VPS)?
If someone really doesn't want to get detected, they will always find ways
around it. It's a never-ending race.

~~~
mdominguez
Yeah, they have to setup a completely different environment every time, or
delete all cookies and then change the environment so as to fool the "feature
change / new device" model. This can have different consequences depending on
the company and how they model user behavior. One could be that the user is
treated as a risky user - always having new devices. Another could be that the
user is treated as an outlier and nothing more than that - not risky, not
safe. And then, maybe if the user has a good previous history, you let them do
their thing and see what happens. Maybe you're uncovering new fraudulent
behavior or maybe you have new false positive example.

Nevertheless, the amount of users that go to these lenghts to mask themselves
in the general population (i.e., all users of a 50m monthly active users app)
is so miniscule that's not even a discussion, the opportunity cost is huge vs
just focusing on your 99.998% (number I just came up with, not a real metric)
of users and understanding their behavior and how to model a "good user". New
users have stable device behavior? Well, then that VPS customer is probably
gonna be traced frequently. This is how some banks do things as well (not
fingerprinting, but transaction monitoring in general).

EDIT: as an aside, I think the most important point to understand about how
companies and spaces like the ones I have experience in use fingerprinting is
- it gives you outliers and only works as long as you have a nice mass of good
users. These users are not trying to game you, so they don't tamper with our
fingerprinting. The ones that do tamper are either tech savvy or fraudsters.
But if everyone tampered with it... You see where I'm going with that.

------
dmurray
Archived version:

[https://web.archive.org/web/20191103204659/https://iq.openge...](https://web.archive.org/web/20191103204659/https://iq.opengenus.org/audio-
fingerprinting/)

------
rubyn00bie
I was just wondering about this the other day after reading comments on here
about the Web Audio APIs having very high resolution (or accurate is probably
a better word) timing available. Probably just some kind of fingerprint using
them, I was going to go dig around through the MDN to figure out if I could...
but just haven't had the time yet.

------
duxup
I really wish we had physical switches that cut power to mics and cameras on
all devices, along with indicator lights when in use.

~~~
yathern
That's definitely a useful privacy feature - however, unrelated to the
concerns from the article. It doesn't listen in to you or anything - just the
type of API that's exposed can be use as some bits for fingerprinting.

~~~
duxup
Thank you, good point. My mind sort of raced ahead of the article ;)

------
timvisee
Website seems hugged to death by Hackernews, here's a mirror:
[https://web.archive.org/web/20191103204659/https://iq.openge...](https://web.archive.org/web/20191103204659/https://iq.opengenus.org/audio-
fingerprinting/)

------
DEADBEEFC0FFEE
I wonder how many folk on HN, spend time figuring out how to track users who
don't want to be tracked. Not cool.

~~~
pmiller2
Getting these techniques out in the open is the only way we’re going to be
able to devise means to counter them.

------
NovWorkThrow
As the article fails to load at the current time from my location I suppose
all I can put in is my thought on how many times I've seen my mobile phone
browser asking for microphone permissions on news sites and wonder if some
mobile browsers have wised up to this already and defer to the operating
system?

------
steveharman
Whilst I'd be the first to cheer for the end of ads and targeting, what
monetisation model(s) are planned to replace advertising and fund free email,
messagingz photo sharing and social media platforms?

------
leonlag
If someone is interested in blocking this, the firefox addon CanvasBlocker has
an option to spoof this. Although it's not enabled by default, it doesn't seem
to brake anything.

------
fencepost
More and more the only thing that makes sense is blocking all resource loads
not from the primary site or directly affiliated servers, which unfortunately
breaks a lot of things.

~~~
kjlriouee
That wouldn't solve anything.

The moment that gets widespread, what would happen is that the primary site
would just start proxying for the ad networks.

~~~
mdominguez
This already happens. On one of our web fingerprinting solutions we provided
our certificate to the vendor so they could use our domain when we loaded
their resource (the fingerprinting code) on our app running on the user's
browser (sorry - more of a data, slightly backend dude over here, so maybe
networking doesn't even work like I'm describing) and the ad blockers wouldn't
block the script execution - even thought the script is NOT an ad

~~~
thefreeman
So you work for a financial institution and you provided your private key to a
third party company so they can impersonate your server to host fingerprinting
code?! That sounds really irresponsible.

~~~
mdominguez
I wasn't working in a financial institution, no. We did provide some services,
but it isn't qualified as a bank. And maybe it wasn't they impersonating our
server, but yeah, it was fishy.

------
codedokode
You can get better fingerprints from WebGL because it provides information
about your video card. Within a year I think I saw the site that really needed
WebGL only once. It is almost useless feature that is suited only for
fingerprinting. You better go to your Firefox settings and disable it right
now to prevent tracking.

To prevent tracking using Canvas it would be good if there was a single
drawing library for all browsers or at least they used only internal code and
didn't rely on OS or hardware acceleration.

Regarding Audio API, it also would be nice if it provided less details about
audio hardware or OS audio stack.

~~~
pmiller2
Disabling WebGL just gives the fingerprinting code a different data point to
fingerprint you with, kind of like “do not track.” I’m not convinced it would
make any difference.

~~~
saagarjha
Turning off WebGL is one bit of entropy, though, while allowing access to the
canvas may allow much more.

~~~
pmiller2
Not necessarily. Because disabling WebGL is probably relatively rare, while
the API itself is fairly ubiquitous, that one bit probably has a high
surprisal value, so it carries more information than it would seem to.

~~~
cinquemb
Yeah, much better to spoof readouts to Canvas, Audio, etc. And this would only
be on sites that one would even consider allowing JS to run.

------
davidmurdoch
Brave browser, somewhat annoyingly, as it breaks some sites that expect it to
work "normally", blocks some AudioContext features by default.

------
_trampeltier
As a "non-java-script-programmer" I wonder, if there is a full list, what
information a website can get from your device.

------
lookdangerous
The title of this post is misleading. Perhaps better would be Sites are using
Audio Stack fingerprinting to track users.

~~~
pmiller2
Agreed. I had hoped this would talk about techniques that Shazam and similar
apps use.

------
maremp
AudioContext fingerprinting is not new. Libraries like Fingerprintjs use it,
among other device fingerprinting techniques. It's in there since 10 Jun 2018
([https://github.com/Valve/fingerprintjs2/commit/caf14daa9e2de...](https://github.com/Valve/fingerprintjs2/commit/caf14daa9e2de105a8a55ae6d9c80667856f4afd)).
Combined with other methods, you can get very good accuracy.

If there is an OSS lib to do it, you can bet the adtech companies are doing it
even longer.

Also, IANAL but I think it's legal under GDPR, Recital 29:

> In order to create incentives to apply pseudonymisation when processing
> personal data, measures of pseudonymisation should, whilst allowing general
> analysis, be possible within the same controller when that controller has
> taken technical and organisational measures necessary to ensure, for the
> processing concerned, that this Regulation is implemented, and that
> additional information for attributing the personal data to a specific data
> subject is kept separately. The controller processing the personal data
> should indicate the authorised persons within the same controller.

The fingerprint value is not PI because it can't identify one specific person,
but only a device at best (if accurate enough between runs). With enough smart
people and good incentive (=adtech), I bet this can be abused to identify the
person.

------
eyeball
Is This is why “safari audio” is the primary battery drain on my iPhone?

Goddamnit

