Note that this fingerprinting technique exploits differences in the behavior of the AudioContext API, but does not (and cannot) actually record audio.
Demonstration (test your own audio fingerprint): https://audiofingerprint.openwpm.com
Discussion from 2016: https://news.ycombinator.com/item?id=11729438
Full list of websites where audio fingerprinting scripts were found (in March 2016): https://webtransparency.cs.princeton.edu/webcensus/audio_fp_...
Source: I'm an author of the research in question (but unaffiliated with this blog).
Note to mods: article title is "Audio Fingerprinting using the AudioContext API". Submitter title is "Sites are using audio (no permissions needed) to track users", which may violate the site guidelines.
It is just so obviously wrong to treat your users like this, the only way to cope with it is literally to take on a mindset that devalues the users (”If they are stupid enough, they deserve it”) or to downplay the impact assymetrical information practices have on the society (Post-Privacy: “You have nothing to hide anyways”).
What it all boils down to is, that you make someone consent to something they don’t understand and that they might not consent to if you put it to them in clear understandable terms. Add nudging and other dark patterns on top and you are going into evil territory.
But no one wants to be evil, so they try to paint an utopia that fits their behaviour, rather than making their behaviour fit an utopia. This is a problem because as the russian writer puts it:
“Above all, don't lie to yourself. The man who lies to himself and listens to his own lie comes to a point that he cannot distinguish the truth within him, or around him, and so loses all respect for himself and for others. And having no respect he ceases to love.”
Basically, your argument seems fully generalized:
> Even untargetted ads exploit peoples insecurities to sell them things they're better off without.
Eg I assume you do not want to demonize bakers selling bread to hungry people? Or more generally, anyone selling any product you do not agree with to someone who has a need you don't like?
So where do we draw the line, and why?
I'm not saying thay laws shouldn't exist though. I just believe that browsers that are actively hostile to tracking are more effective than laws.
For a contrived eg, serving an ad for a niche product instead of the targeted ad and the ad still getting tons of clicks signals pretty highly that the data coming in is being gamed in some way.
It's called click fraud and it's pretty heavily researched
“This process doesn't require access to the device permissions like microphone or speakers. No audio is recorded, collected or played by any means. It gathers the audio signature of a user's device and uses it to create an identifier to track that user. It simply relies on the difference in the way these generated signals are processed on each device.”
1. How consistent the output is on a given device (is it 100% deterministic and will always produce the same fingerprint?)
2. How big is the variance across devices (how good of a fingerprint it is).
The article doesn't really do much at digging into that, which really is the most important thing if you want to use it as a fingerprint.
I'm not quite sure why different devices would generate different output, other than differences in floating point computation.
Everything is different for every version of firmware, hardware, driver revision, etc. it’s like enumerating fonts, it combined with other fingerprinting techniques provides a very unique snapshot of you.
Eff’s panapticlick provides an audio fingerprinting example. There exist plugins which will provide some entropy to your fingerprint which also changes over time. That helps a little, but any changes you make really makes you stick out like a sore thumb (your audioprofile becomes even more unique)
As a major side benefit, websites will stop randomly playing audio without permission.
In other words, make it more declarative and less verbose.
The browser could always increase the apparent audio latency by buffering, but that reduces the ability of music apps to perform well.
Move responsibility for the syncing to the browser and then the app doesn't need to know anything. In short, I can put it together and send it to you, or I can send you the bits and tell you how they should be put together.
A graphics frame appears on the screen at a specific time. (For VR, it is a definite time, and this is critical. For normal video or games, a little bit of slop, maybe a few ms, is probably okay.)
For audio, humans are sensitive to 10 ms deviations or even less.
Any API that works decently will need to synchronize audio and video, so there needs to be a way for a program to say “this audio sample should play are the same time as this video frame is shown”. But an API should also allow programs to react as quickly as possible to user input. And Bluetooth headphones, in particular, have very, very high latency.
So designing an API that performs well without revealing the latency is hard.
I do think it would be good to cleanly separate normal web pages and games, though. For pure content, none of this matters except that video needs to maintain synchronization. But normal content does not need clicks to translate quickly to video changes.
The browser is the presentation layer, it needs to know the latency of your headphones (or the system does). Why does the content provider need it? What's wrong with "here is frame A, please play audio A at the same time (while taking into account the latency that only you know about)" as a request?
That's ok. Then I have the option to weigh whether the risk is worth it or rather not. 9 out of 10 times it is not, in rare cases I will allow the js to execute.
How do you explain to one of these "I'm not good with computers haha" types what a CDN is and why Taboola CDN shouldn't be allowed but Akamai should be otherwise things Won't Work Right. Even if they're capable of learning  why should they care, when all these security measures actively make their life harder?
Blocking audio and other fingerprintable surfaces by default, with a "click here to enable audio / video" and a "remember my choice on this site in future" is the only way it can possibly work, because 99% of users will only ever go for the laziest option. We need to have protections for them that work regardless of (or in spite of) their skill level.
But hey some of us still run around with MS-DOS 5.22 on that new Spectre-ridden, rowhammer-ready i7 because DOS/4GW is hella stable.
True if we stopped thinking at 1996. I don't know how to tell you but browsers become application platforms not document delivery vehicles in the last 20 years.
This is a pain - on the one hand, the browser vendors and w3c are locking down API capabilities to prevent fingerprinting and timing based security hacks, but these are interfering with genuine needs to provide audio functionality. For example, you can't determine whether the user has 3 audio devices connected and prompt them to select one for output. So you really can't build desktop quality audio systems with webaudio and siblings.
I really hate the user-action restriction as a security measure, because it's ridiculously easy to bypass, particularly on Chrome. It's privacy theater -- annoying for legitimate developers, but easy for malicious actors to get around.
Putting this stuff behind a permission or sandboxing it somehow would be better for both end-users and developers. As it stands, we just get fewer webaudio applications online from legitimate developers, and they all move to native platforms where it's even easier to fingerprint users. And the malicious sites that don't move to native just fingerprint anyway because we're using bad, opaque metrics for consent.
will give you that list, allowing you to have a menu for a user to choose one and then the client code must use setSinkId to assign the chosen device id. https://developer.mozilla.org/en-US/docs
I never knew the device ids were randomized like that, thanks for the info.
Nor should anybody even be trying. We're better off without any of this sort of horseshit.
Further, is there sufficient isolation in the audio stack so that fingerprints are independent of the software currently running on the machine? (Another comment regarding DACs and insufficient isolation from the north bridge and induced harmonics indicates otherwise.)
I guess, while this may provide some indications, on its own, it provides insufficient information for identification of a specific hardware device. (Edit: Which is, of course, bad enough.)
For example if there is an exploit to read your local files and upload them to the server, that would clearly be illegal.
What if you just got the names of the files?
What if you just got the count, e.g. 33413 and stored it against a name?
What about the count, but just used as a semi-anonymous tracking token?
What about system information, as a tracking token?
What about audio information specifically, as a tracking system? (Like this appears to be. Didn't read the article because it won't load for me).
As a comparison, think about speeding. It is technologically possible because there are cars that can pass most speed limits. I’m sure we can find a technological solution that eliminates speeding. But for the most part, regulations and enforcement is a sufficiently good measure.
A social solution and technical solutions are needed. Firefox has put in a huge effort to prevent tracking but many useful features can also be used for tracking so a browser will not be able to determine useful and non useful code.
The site shouldn't be able to do anything without user's permission.
I want a browser that can get the task done with the least friction possible and a legal system that prevents abuse. The website should be the one to inform me that they are actually using the audio api for tracking and not just playing audio. With a large penalty for lying.
If we were talking about the issue of people being stabbed you don't suggest "Why should the stabber be penalized? The victim should have been wearing a Kevlar vest that doesn't allow them to be stabbed. If they weren't then they should simply expect that people will stab them."
The website can't do anything without your browser cooperating. The website can request the audio API as much as it wants but if your browser had a same configuration then it would just ignore it.
You're asking for legislation on a problem with a global scope. Countries that ignore your laws will still do all of this as long as your browser lets them.
>If we were talking about the issue of people being stabbed you don't suggest "Why should the stabber be penalized? The victim should have been wearing a Kevlar vest that doesn't allow them to be stabbed.
This is such a poor analogy. The server doesn't do anything without a client asking for something. In your case the victim went to the stabber and started the interaction. Then the victim's electric scooter (browser) malfunctioned or was poorly configured and drove into the stabber's knife.
Of course the server isn't blameless, because they are most likely knowingly exploiting this. But to suggest that the user has no control over this is foolish. It implies that the user can't be blamed for anything their computer does.
We need an automated approach. We need the ability to not only disable browser APIs but also make them return fake data to fool the website. We need the ability to inspect the data websites are sending back and redact data we don't want them to have just like corporations perform deep packet inspection to prevent data exfiltration.
> If we were talking about the issue of people being stabbed
In this case we handed them the knife. We thought they'd use it to cut something for us and they stabbed us instead. They even made the ability to stab us part of their terms of service. This cannot go on.
Browsers have been trying this for a long time but its not simple. Chrome has had multiple attempts at making it impossible to detect incognito mode but people keep finding ways. The old way was checking if the API to store data was available so chrome made it so that there is an emulated data store for incognito mode and then websites just check how fast the local data store is to check if it is on disk or in memory.
It would be an extraordinary effort to remove all of these issues since many of them are outside of the browser like what exact gpu and driver is in use could slightly change how the page is rendered or how long certain things take.
I didn't say it was gonna be easy. I said it was necessary. This is a perpetual arms race. If content blockers can maintain their huge blocklists, browsers should also be able to maintain their own counter-measures to abusive websites. Browser vendors created and standardized the APIs that websites are now exploiting. I expect them to clean it up.
> then websites just check how fast the local data store is to check if it is on disk or in memory.
Making both calls equally slow should prevent this particular side channel attack.
This is unacceptable under the GDPR. You must allow users to opt out. If this affects your business model than the legal option is to remove the free plan and charge a fee for the service.
This must be some new use of the word "simple" I was previously unaware of.
I don’t know if security / street cams actually record anything for more than a day, and I don’t know anyone with home security camera, but friends are always there!
I don't see how you solve this with regulation without breaking the internet. As far as I can tell, the solution ultimately is to consider privacy first as a user and a developer. Browsers and their users need to start providing less information to websites by default and prompt the user to share information.
If you try doing this through regulation then you end up with GDPR pop ups. What we need is for the browsers (and OS!) to stop leaking all of our information.
For me, when my iphone is connected to my car's bluetooth, the kind of ad/tracking-infested websites make it beep like crazy, same sound as turning my volume up and down, closing the tab stops the beeping. I assumed it was some ad or video being glitchy about auto-play, but it happens even without a visible media player...
If this is a laptop, you could disconnect all external power sources (such as charger, external HD’s etc) and those clicks and noise will be lowered (but probably still noticeable at high gain). If you purchased a power conditioner and used a high quality USB3 audio interface (read: isolated DAC) you wouldn’t hear any clicks.
Also you should lower your audio expectations with AirPods, they are hardly audiophile quality
They are constructing a simple audio synthesis graph and rendering it for a few seconds.
But since all the audio processing happens in the browser, without OS involvement, how can the fingerprints be different between say 64 bit Firefox browsers running on 64 bit Windows?
I can understand differences between different browser binaries, since optimizations can slightly change the output due to floating point order of operation, but can a particular browser binary generate different fingerprints?
Corollary: Don't link to the CRT sin/cos/etc. if you want your binary to give the same result on different machines.
I could easily see this being expanded upon and included in a tracking SDK.
An app can already have recording permissions for legitimate reasons.
>And if it is in use the status bar turns red on iOS.
It only turns red if a background app is recording.
The app asked for permission as usual.
Not Nielsen related per se, but this technique has been circulating for at least a few years. That's a 2015 article.
I would imagine to do such tracking, you'd need to have an app open which already had those permissions for some reason, such as Skype.
PilferShush is an experimental F-droid app that is supposed to prevent this kind of tracking by blocking your phone's microphone and jamming the Audio frequencies used for tracking. F-droid page:
Edit: Wikipedia says there are 234 android apps that use ultrasound audio tracking, but the article it references is from 2017 and behind a paywall...
I couldn't find much new information on Ultrasound Cross-Device Tracking (uXDT). This could be good, meaning that it's not being widely used, or it could be bad and mean that simply not much is publicly known about the extent to which it is used. (or my google-fu is weak)
To clarify my post form earlier. The "list" only has the SDKs used for uXDT, not the apps that make use of them.
This is the list of SDKs possibly used for uXDT:
acrcloud, actv8, alphonso, axwave, beatgridmedia, bitsound (soundlly), chirp, cifrasoft, copsonic, cueaudio, digimarc, dv (dov-e), fidzup, fluzo, gracenote, hotstar(zapr), hound, inscape, instreamatic (VIA), lisnr, moodmedia, mufin , prontoly (now sonarax), redbricklane(zapr), shopkick, signal360, silverpush, sonarax, soniccode, sonicnotify (now Signal360), soundpays, tonetag, trillbit, zapr
I got it from PilferShush's project page which is an experimental F-droid app researching uXDT. It's worth a look, but a bit messy:
Wikipedia says there are 234 android apps that use ultrasound audio tracking.
If you google around, you will find this number repeated in a lot of articles. It's an old number from a research paper from 2017:
Here's good summary of the research:
Only a few specific apps were mentioned in the research paper. It probably only scratches the tip of the ice berg. The researchers looked for SilverPush in an 8 TB dataset of 1,320,822 apps submitted to VirusTotal.
These were the only apps that were mentioned:
(Apps with 1,000,000 – 5,000,000 downloads)
100000+ SMS Messages, developed by Moziberg
Pinoy Henyo, developed by Jayson Tamayo
(100,000 – 500,000 downloads)
McDo Philippines, developed by Golden Arches Dev. Corp.
Krispy Kreme Philippines, developed by Mobext
(50,000 – 100,000)
Civil Service Reviewer Free, developed by Jayson Tamayo
Here's a List of Apps from 2015 (Mostly from India and Phillipines) that use SilverPush:
And here a list from 2016 with Apps containing Signal360:
It looks like you can use this Addon Detector to find at least a couple of the uXDT apps:
The Addon Detector only seems to be able to find three uXDT app SDKs from the whole bunch mentioned by PilferShush:
lisnr, signal360 and silverpush
SoniControl is also an interesting project worth looking at.
It's goal is to detect acoustic tracking information.
Audio Tracker Demo - It's a link on PilferShush's project page:
That said, I’m not convinced full on fingerprinting is necessary here. I suspect you could do the same thing using IP addresses and it would work at least 80% as well.
An IP address is not a good indicator and wouldn't replace fingerprinting. IPs may change over time, there bay me non-static IP addresses from residential connections (so, not only data centers) and today, in our mobile world, change much more frequently than in the past.
IP is just another marker that can be useful, sometimes. Even the subnet may be useful. But unfortunately, for fighting fraud we have to rely on techniques such as a device fingerprinting with the canvas exploit. There's a much simpler approach, though, but it works only on some occasions: a cookie.
So, you just check if the cookie is present and it matches the previous cookie from the same user. Done, the device matches and you're good to go (keep in mind that if someone owns your device and credentials, there's not that much we can currently do - although the behavioural biometrics proponents would have you believe otherwise).
But what if there's no cookie be cause the user logged out or opened their browser using incognito mode, or just changed browsers. In that case, we would have a false positive for the user having and using a new device. Which, from our point of view, highly correlates with fraud. This is industry-wide, from the fraud prev POV and not just some specific business (like, for example, an ecommerce website), at least most of people I have spoken with over the years have mentioned why fingerprinting is really important, and I've seen it first-hand.
So, we don't sell your data. We're not looking to match you with... whatever you can come up with in terms of a fingerprinting-data-matching-nightmare. In most cases, the only people that have the fingerprinted data are from the fraud prevention team. And we generally hate bad players, both from outisde and the inside of the company.
What we wanna do (and, again, this is generally) is try to create a better user experience for our good users. So we may relax some rules if your device is known. Or we may give you access to some features that other users don't have (let's say, a beta for a new service that we start offering).
This works by collecting as much data as possible from the device and then trying to differentiate small changes (let's say, your internal storage free memory in MBs) from big changes that could in fact mean that the user is using a new device.
So, for example, we could force you to go to account verification to login to a new device vs relaxing some rules about login from a good, trusted device for that user.
I'm sure there are exceptions, and that there may be some bad players abusing their fingerprinting capabilities. But at the same time, I'm pretty sure that most people are not OK with using that data with another purposes - even the execs. And even if we did, let's say, track our ads in a way that when you sign up we get an ID related to a particular ad that we ran - we can see that although you're a new user and by extension you have a new device, you still came to our business because we placed an ad. Which we couldn't do another way, and then the UX suffers because of decisions made to deal with that.
What I'm trying to get across with all of this is: fingerprinting is, in fact, very useful for fraud prevention, and I would argue that disabling the Canvas API exploit would affect most, if not all, machine learning models for fraud prev running on production.
EDIT: and, BTW, most companies that are trying to buy data from other companies are trying to get user behavior. What your users are doing in your app, maybe involving their product in some way (i.e. you're Spotify and are trying to get data from Shazam in order to understand user behavior with regards to the type of songs they've shazamed in the past). Again, I'm NOT saying that there may be companies tying data from outside sources that are iffy at best. And at least the more modern companies I've work at, they're not cool with merrily sending data over to another company, even if they pay. It seems like everyone is starting to understand that their data is as important as their intellectual property.
Nevertheless, the amount of users that go to these lenghts to mask themselves in the general population (i.e., all users of a 50m monthly active users app) is so miniscule that's not even a discussion, the opportunity cost is huge vs just focusing on your 99.998% (number I just came up with, not a real metric) of users and understanding their behavior and how to model a "good user". New users have stable device behavior? Well, then that VPS customer is probably gonna be traced frequently. This is how some banks do things as well (not fingerprinting, but transaction monitoring in general).
EDIT: as an aside, I think the most important point to understand about how companies and spaces like the ones I have experience in use fingerprinting is - it gives you outliers and only works as long as you have a nice mass of good users. These users are not trying to game you, so they don't tamper with our fingerprinting. The ones that do tamper are either tech savvy or fraudsters. But if everyone tampered with it... You see where I'm going with that.
How is that determined? I didn't know that was possible
imagine if you were able to put a camera outside the entrance of the bathroom, and record who is going in and out.
This is recording a public space (the entrance), but it reveals who is in there, and for how long.
I m sure you can guess why people won't like being known to have gone to the bathroom for X minutes (vs X seconds). It's the same online, just slightly less obvious.
The moment that gets widespread, what would happen is that the primary site would just start proxying for the ad networks.
To prevent tracking using Canvas it would be good if there was a single drawing library for all browsers or at least they used only internal code and didn't rely on OS or hardware acceleration.
Regarding Audio API, it also would be nice if it provided less details about audio hardware or OS audio stack.
If there is an OSS lib to do it, you can bet the adtech companies are doing it even longer.
Also, IANAL but I think it's legal under GDPR, Recital 29:
> In order to create incentives to apply pseudonymisation when processing personal data, measures of pseudonymisation should, whilst allowing general analysis, be possible within the same controller when that controller has taken technical and organisational measures necessary to ensure, for the processing concerned, that this Regulation is implemented, and that additional information for attributing the personal data to a specific data subject is kept separately. The controller processing the personal data should indicate the authorised persons within the same controller.
The fingerprint value is not PI because it can't identify one specific person, but only a device at best (if accurate enough between runs).
With enough smart people and good incentive (=adtech), I bet this can be abused to identify the person.