Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How prevalent is non-cookie-based web tracking today?
140 points by ryeguy_24 on Nov 3, 2019 | hide | past | favorite | 63 comments
I just started reading about things like Header Enhancement and SuperCookies and find them to be quite egregious. Does anyone know how much of this activity is being used by big known companies?

For example, I just found out that my account settings at Verizon Wireless were allowing them to use Header Enhancement (UIDH) adding a unique identifier on every http request I sent. So, if I log in to a site, they can associate the UIDH with my account so next time I’m in browser incognito mode, they already know who I am (or have a good guess).

Trying to circumvent tracking at the browser level is hopeless.

The only effective approaches that I know of are 1) using Whonix (best in Qubes) to connect via Tor; and 2) using multiple OS-level VMs that connect via different nested VPN chains.

And even then, there are risks from fingerprints that depend on GPU and virtual graphics drivers in VMs.

So when compartmentalization really matters, it's necessary to use different host machines, on different LANs (or at least vLANs).

Using Tor is rather painful, given all the CAPTCHAs. And the learning curve for Qubes is a little steep.

But using multiple VMs with different nested VPN chains is actually quite convenient, once you've set it up. I use a pfSense VM as the gateway router for each VPN service. So creating nested VPN chains is easy: You just create virtual networks of the pfSense VMs, with Linux workspace VMs wherever you like.

With a decent host machine, I can work ~seamlessly as a few low-isolation personas via nested VPN chains, and another few high-isolation personas via nested VPN chains and Whonix instances.

Not sure about the priorities here. Fingerprinting should be the main concern, so steps 1,2,3 are obscuring IP address and disabling cookies and JavaScript. Even beyond that, not sure how chaining VMs etc helps much more than a single VPN.

Although an ideal solution would spin up a random VM, browser, screen size, and tor connection for each new web page visited...

> an ideal solution would spin up a random VM, browser, screen size, and tor connection for each new web page visited...

Sure. I wouldn't limit it to Tor, though. Because there are just too many CAPTCHAs. That's much less of an issue using VPNs.

Chaining VPNs is arguably overkill for avoiding fingerprinting. Still, trusting a single VPN service is risky. If they were complicit, you'd still be tracked. But if you at least chain two VPN services, there's less risk. Overall risk is more or less the product of all the individual risks, if they're independent.

Edit: If exploit resistance is less of an issue, it's easy to chain VPNs at the Linux OS level using ip route and iptables. Iptables rules drop everything on enp0s3 except traffic to the first VPN server, and drop everything on tun0 except traffic to the second VPN server.

You set enp0s3 as the route for the first VPN server. After connecting to it, you set tun0 as the route for the second VPN server. After connecting to it, you check for leaks using tcpdump.

How do you pay for your VPNs in a non-trackable way?

If you pay via credit card, the VPN account has your name attached to it, no? So what good will chaining VPNs do when all the VPNs have your name attached to them?

Bitcoin is also supposedly not anonymous (or so I've heard.. I really don't know much about bitcoin, so please correct me if I'm wrong here), so paying with it sounds like it won't be any better.

Also, I have to ask: Who are you trying to prevent being tracked by? If it's by advertisers, I don't see how chaning VPNs would be any better than using a single VPN.

I have been using Mullvad[1] for a year or so now and am really happy with the extent they go to avoid storing payment information. One of the things that caught my eye about them initially is that you could literally mail them an envelop containing your account number and $5 cash and they will add that time to your account, which seems to solve what it was you were asking about (although I am sure physically mailing something comes with it's own issues of privacy).


Yeah, love Mullvad. They've been around for about a decade. Along with AirVPN, Insorg and IVPN. IVPN may still accept cash as well.

I'd prefer to be tracked by nothing. Or at least, by nothing short of a targeted TLA attack.

And yes, Bitcoin is not anonymous. Just the opposite, in fact. Some cryptocurrencies arguably are anonymous. But that's never really been tested, and in any case, they're not widely accepted.

Any VPN service that I connect to directly, I pay with credit cards. Because there's ~zero anonymity for them, no matter what.

But for the rest, I pay with well-mixed Bitcoin. I anonymize Bitcoin using wallets in a series of Whonix instances, via Tor. Using a different mixing service for each step. With N steps, I have somewhere between N/3 and N-1 ~anonymous Bitcoin pools. Depending on how many mixes I consider adequate.

So then I select Bitcoin pools for paying VPN services in proportion to the degree of separation.

> So when compartmentalization really matters

Just curious, for most normal people, when does compartmentalization really matter?

When getting pwned is an existential threat.

If you define "normal people" to exclude those who take existential risks, then compartmentalization never really matters for them. But that's a sad way to live, in my opinion.

Compartmentalization would have mattered a lot for Chelsea Manning. Or, arguably, for Julian Assange. Or for various public figures whose careers have been destroyed after pwnage. Or for DPR.

I don't want to go into much detail, but I work for a major company in this space and nost companies in the industry can track you with reasonable success even if you are logged out over multiple devices. Your (approximate) location, browsing habits and patterns are good enough data to predict what kind of stuff you buy.

If you want to not be tracked, turn off JavaScript for a start.

> good enough data to predict what kind of stuff you buy.

Then why are the ads that _do_ sneak through my adblockers or onto Instagram, etc., such hot, moist garbage? Is it just a lack of people wanting to advertise at my demographic?

I have been suspicious for a while that all of these companies that claim to know everything about us, or people who are afraid that these companies know everything about us, are wrong.

I don't feel like I am unusual in any way, as in I can't see how I have any natural, dumb-luck defense against any of this tracking. If companies like Google, Amazon, FB, etc, are in any way really trying to use what they think they know about me to get me to buy stuff or influence my thoughts or behaviour, then they seem to be doing a really, really bad job of it.

As far as privacy goes, my concern is far more focused on apps/programs stealing my photos of my kids, or tracking me around town, or knowing who I meet and talk to.

As far as predicting my future behaviour, I have not been impressed so far.

There was a point a year or two ago where everyone was recounting these stories of how AI targeted advertising was terrifyingly accurate, it could tell you were pregnant before you found out for yourself, etc. At the time I was routinely opening YouTube with cookies disabled and getting suggested YouTube generated categories with a super low hit rate, stuff like "Recipes" that would be filled with nothing but general interest food videos (i.e. "you won't believe what we found at this market", "we made the largest burrito ever", never with any instructions), "Metal Music" containing a single acoustic cover a metal song and half a dozen EDM tracks or "Role Playing Games" stocked to the brim with Fortnite videos.

I just checked now and it's better than it was but there's still a "DIY" category containing one DIY video, a Japanese cooking tutorial and a video of somebody putting iPads in a bucket of slime. I'm not sure I have very much faith in their ability to deeply infer things about me from a limited dataset when they can't identify videos with ingredients in the description or sort music into genres given the set of all music (and millions of comments on it) to work with.

If you use Twitter, you can see the list of your supposed interests they have inferred from your activity on their own site. They have so much firsthand user data, but my results at least are so bad. Of almost a thousand topics, less than half are even remotely relevant to me. My supposed interests include people I've blocked, keywords I've muted, and non sequiturs like "cake", "water", and "hello".


If you use Chrome, check out the "How your ads are personalized" page. It gives you a glimpse of how accurate Google's targeting is based on all the data they've collected.

I went to a great Google Fireside Chat with Michal Kosinski on data privacy. He said that the only way to get around big data is to spread misinformation. Chose the opposite of your tastes, search for things you don't like, etc. https://www.youtube.com/watch?v=VUwBcTgzbtU

Several reasons:

Many large, higher quality ad buyers (agencies and direct sold) can get enough return from non-greyarea inventory like this. There's no point in them taking (even a small) perceived risk of buying this grey market inventory.

However, they likely do use this tracking data second or third hand via their third party data providers brokering and including it in larger data sets they use for cross site targeting via DMPs.

Finally, there's simple economics: people who go through enough effort to block trackers and other things are (perceived) to be less likely to engage with ads (e.g. they see it as you do, when one 'slips through'). So there's no point in paying $$$ for those eyeballs.

All of this in combination means you get low quality, barrel bottoms ads.

People forget (or just never knew) that tracking and ad-serving isn't necessarily about giving every screenviewer the best possible ads you can all the time. Inventory and return on ad spend are both big factors in determining where to spend. The tracking enables ad placement companies to know that even with the best possible ad (which cost inventory to place), specific users just will not engage. And if that's true, why spend that ad on that person?

There’s a whole chain of companies who have financial incentives to maximise spend through their platforms. They’ll place creatives in ad units with impressions they know are low quality based on needing to fulfil fill rates and volume.

It’s pretty shitty and unethical, but welcome to to add tech!

Ad re-targeting (showing products you browsed) is generally cookie based I would think and adblockers block that. Anything other than re-targeting is anyway too broad and generally crap for me - even on Facebook which knows a lot I would think. Other data vendors are totally crap.

You can check your BlueKai profile here https://datacloudoptout.oracle.com/registry/

probably. Also - lazy ad buys.

There's human effort involved in planning a micro targeting campaign. Way easier (and short-term cheaper) to do some broad strokes ads with a couple ok-ish proxies for your target market than pay people that know what they're doing. So your conversion rates are garbage and CAC higher than it could be, but your spending is on the action instead of front-loaded on salaries/agency fees.

> good enough data to predict what kind of stuff you buy.

the only time these are ever close is when they are based on users' previous search terms

> Your (approximate) location, browsing habits and patterns are good enough data to predict what kind of stuff you buy.

Missing component here is "when compared against massive existing data sets". That info alone isn't very powerful.

Easier for some people but not for the masses. Not everyone will know how to do this.


This is definitely true to an extent. However, NoScript allows you to whitelist domains, or temporarily "allow all" for a site. Sometimes it takes a little bit of trial and error, but for most sites, you can get full functionality with only first-party scripts and common CDN's (cdnjs, jsdeliver, etc).

Fat sites do break though, and I don't think there's really any way around that other than temporary whitelisting.

The Web is a lot faster for me now, though.

I've solved Google CAPTCHAs without JS, so I don't think it's a requirement.


Could you go in to detail on how you did this?

I thought JS was an absolute requirement for them.

Whenever I've solved them without JS, it reverts to standard HTML forms. Since it's in an iframe, when you click submit it doesn't reload the whole page, only the frame. When you've been validated it gives you a code which you can copy and paste into the host site. Note, I've only seen this on Cloudflare pages so far.

Disclaimer: I work for Google, but not on reCAPTCHA.

Every time I've tried one of these non-JS versions, it gives me no evidence that it's working or progressing and just keeps setting up new comparisons for me to copy and paste the code. So I get tired of doing that and stop using whatever website is demanding it.

HN had a google captcha it subjected me to for a while. It did not need JS[0]

At the same time I have come across captchas that do need JS. I guess google offers both.

[0] if it had, I would have stopped posting here as I never accept browser JS outside of a VM.

Totally not true.

It does limit you somewhat, but a large part of the web, in HN, slashdot, bbc, most news sites, TheRegister, wikipedia, most of stackexchange, and a shedload more all work fine.

it is how almost all of the data collection happens though

By far the biggest tracking offender is Javascript. Enabling it could reveal your operating system, cpu/gpu architecture, screen resolution, draw a precise and unique canvas fingerprint, etc. There are also mutable browser headers like user-agent and of course your IP address. However, the more advanced and insidious tracking is based on your behavior - what time you're active, what wifi networks are in range, who you communicate with, what is your writing style, and so on. Most of that collection happens on mobile phones, so I strongly advise against signing in on Android/iOS devices if you don't want to be tracked across the Web and beyond, or using telemetry-free open source mobile operating systems altogether.

You give a USA specific example, so I'll give one from where I live: aside from a few (like Google, Facebook, LinkedIn) that I suspect do things like recommendations or friend suggestions based on our static IP address, in the Netherlands it's virtually nonexistent. And illegal, at least without telling us that they do tracking (no matter if it's through cookies, the law never even mentions cookie). Header injection (MITMing traffic) is something I only hear about from far away and seems very invasive to me.

Same in Germany, but there they have rotating IP addresses (which is both a pain (hosting) and a blessing (privacy)).

Hmm, although, would MAC address tracking count? That happens here and there (by roughly the same amount in any EU country, as far I can tell, which is not very much), mostly with WiFi captive portals where you sign away your soul in the terms of service. I'm not sure about the legality (hiding GDPR consent in the TOS) but it happens. From experience, I can say that if you find out and you send them a letter with a copy of your ID, they'll happily give you all the data they have on any MAC address you claim.

I caught Vodafone and Telekom in Germany injecting headers. The former for shady cooperations and tracking, the latter for cache screwery. The latter did dns modification to turn nxdomain into a navigation help, a site that looks like google but everything is ads.

Vodafone CPE equipment saves all mac adresses ever present in the local network and unassociated wifi client macs in the air and sends them back as part of diagnostic data.

Edit: They also DNS Censor popular warez sites and libgen

> They also DNS Censor popular warez sites and libgen

This happens in some other European countries, too, like Italy.

Regarding MAC tracking, I found a sticker in Centraal Station Utrecht that said 'wifi tracking' with 'voor meer informatie: www.ns.nl/privacy'

Edit: the relevant privacy page is: https://www.ns.nl/privacy/op-en-rondom-het-station.html which also has an image of the sticker I saw.

Apple’s operating systems have been randomizing MAC addresses since late 2014 (or before). So the MAC address has not been a unique device identifier for users of Apple devices.

Don't they use the same MAC for the same network? So when it reconnects to the same network, doesn't it adopt the same MAC as before? Otherwise one could not give a device a static IP in DHCP.

I'm not sure but I think I remember reading this. And it would mean that hotspot tracking (the most prevalent form as far as I've noticed) still works.

Use browser plug ins

* ublock origin

* no script

* cookie auto delete plug in, deletes cookies if tab is closed

* (I use also I don't care about cookies for the EU cookies clusterfuck)

* Canvas blocker

* Privacy badger

* Glyph detection blocker

* Decentral eyes

* Privacy settings

* Privacy-Oriented Origin Policy

* WebRTC leak protection

* https everywhere

* I have a browser spoofing plug-in too but don't think it works so well.


use different browsers for different purposes.

use startpage.com instead of google

Here, try your luck:



Does not work so well. Instead of preventing canvas, fonts, browser ID etc., the plug-ins should randomize it.

You may want to re-evaluate your use of startpage.com: https://news.ycombinator.com/item?id=21371577

It seems really concerning that Startpage is still recommended on privacytools.io. Instead the forum post discussing the delisting is hung up on getting a statement from the new CEO and giving them a chance, which seems insane to me. Either being owned by an advertising company is a problem for all privacy-focused services or it isn’t a problem for any of them. In which case they might as well redo the entire site. I don’t see why this CEOs word would be worth any more than nothing nor why he should be treated any differently from the owners or CEOs of other questionable businesses/services/addons/programs/etc.

This questionable judgment on their part seriously puts their reputation at risk in my eyes.

Are you aware of any way to bundle setting for Firefox (or whatever) that include these kinds of changes?

I know you can export the about:config and share that, but I have always wanted a kind of ansible for setting up a browser with plugins and other changes for my personal use.

Additionally, If I could tell my friends and family: Hey just use my Firefox Playbook and feel safe on the internet, thereby reducing the cognitive load of figuring out how to do that, I'd probably have a lot more success helping curious but busy people take control of their privacy.

You don't want your family to use my setup. I breaks many things.

A person not in IT is probably just fine if you install ublock Origin.

Or you would have to train your family to use different browsers for different things and you want to have at least one "vanilla" browser on your system. Just recently my US CC website stopped working with my browser. For such things you want to have one major browser without any plug ins.


1. Google Chrome (Vanilla, no plug ins). Used when needed (recently to pay my CC). 2. Chromium: Facebook, Gmail 3. Firefox: buying tickets etc. 3. Vivali Browsing the internet

Again my setup does not work so well against fingerprinting. My plug-in combination is so unique that I can be tracked via my plug ins.

A nice benefit to that is it may make fingerprinting harder if many people are using the same configuration.

Relevant discussion from not that long ago: https://news.ycombinator.com/item?id=20783339

Any advice on how to prevent mobile browser fingerprinting on Android and iOS without jailbreak or root permissions?

Apart from the usual canvas / webrtc in-browser shenanigans, the most surprising one that I found was using a dns cookie to track users across browsers and devices discovered/invented/disclosed by u/DanielDent: https://news.ycombinator.com/item?id=20219878

> As with traditional HTTP cookies, DNS cookies can be used to track users on the web. They have no concept of "first party" or "third party" and can be read across different websites or from a different browser. They can also be used outside the web environment, for instance to track a web conversion which occurs after reading an email but not clicking on a link, or to track a sign-up in a mobile application after viewing a website. They also have application in DDoS mitigation - especially on IPv6 networks.

I am curious what other techniques are in active use to track a user across devices / software...

DNS cookies are nasty. The POC tracks you between normal and private tabs, even with the Tor browser.

Excellent reading for anyone interested the subject from a technical and business/enterprise point of view. This gets rid of the FUD 'browser fingerprinting' and uses actual industry terms.



This is mostly talking about creating a probabilistic ID graph - creating a unique ID across devices. This is technically not same as browser-fingerprinting. Latter is much simpler

Yeah that is kind of the point I wanted to get across. Fingerprinting isn't some major secret that big tech is using.. its a small tactic used by some companies that gets more attention than it really deserves. There is a section in part 2 specifically about fingerprinting under the subtitle "DENTIFYING A DEVICE WITHOUT AN ID".

Browser fingerprinting is a single piece in creating an id graph. Fingerprinting would be 100% useless without a graph, unless you're doing it to individuals which would be NSA-level acting.

I would think Header Enhancement is not widely used (only few ISPs or so use it) but Browser fingerprinting must be quite wide-spread. It is hard to detect from the client-side so hard to say how wide-spread is it

Here is a study of fingerprinting effectiveness. Not what you wanted but a worthwhile read.


So, how does browser fingerprinting work. Does it basically look at the 1) IP and 2) Browser Agent pair for near uniqueness?

Lots of things from browser settings to what plugins you have.

These guys will tell you how unique is your browser fingerprint


This should be mentioned here, too https://panopticlick.eff.org/

fingerprintjs readme does a good job of explaining it: https://github.com/Valve/fingerprintjs2

These days over 90% of traffic is encrypted so the amount of requests they can touch is pretty limited.

you can spoof a browser pretty easily in the request. you can tell the server you are w/e you want the server to think you are.

I can't provide stats on your questions, but as per your example, your ISP can only add headers to non SSL traffic. Any website you access with HTTPS is safe from this type of privacy violation.

So as "Encrypted web traffic now exceeds 90%" [0] I'd guess at least this type of tracking is gone.

[0] https://news.ycombinator.com/item?id=21421195

The most common non-cookie based tracking are cross-device graphs that are registration based (reg based) and run by facebook/google/linkedin/pinterest/etc. If you've ever logged in to facebook (or haven't logged in) and a site has a fb pixel or share button, its much easier for them to track you.

These all have cookie/nonreg-based components, and there are plenty that don't rely on reg based data at all.

Google Captcha only works if webgl canvas is available. If its not available they give me infinite captchas and never let me through.

FYI about these two websites that demonstrate the various data your browser shares:



Here's an option for adblock + vpn. https://ba.net/adblockvpn

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact