Everyone is watching what you do online. How user tracking with cookies works

kop316 · on June 13, 2018

At work I am forced to use Internet Explorer, and by using it I found a surprisingly useful feature: I can not only clock all third party cookies, but it prompts me as to whether I want a first party to store any cookies. The prompt allow allows me to automatically blacklist a site from providing me any cookies. I really enjoy this, as if I know there is a site I will never log into, I can permanently blacklist it with one click. I tried to see if I can do the same but I did not find this feature on Firefox.

I have also noted that certain sites will be very user hostile if you do this. Reddit will load the site and actually overlay a white screen to make it appear like it never loads if you block its cookies.

swebs · on June 13, 2018

In Firefox you can just install the uMatrix extension. It not only allows you to block cookies, but also javascript, frames, and images. You can choose to block only third party elements, third party elements from known tracking/ad agencies, or even first party elements.

daphneokeefe · on June 13, 2018

In Chrome, you can also use uBlock Origin, which is potentially a little easier to use if you're not an advanced user.

https://chrome.google.com/webstore/detail/ublock-origin/cjpa...

JackCh · on June 13, 2018

For what it's worth I found uMatrix easier to use than uBlock Origin's "Advanced user" mode. Both took a few minutes to get a hang of since neither have particularly discoverable interfaces, but I find the uMatrix interface just a lot faster to use once I knew both.

rch · on June 13, 2018

I should have to install a plugin if I want to see targeted ads, not the other way around.

moksha256 · on June 13, 2018

> Reddit will load the site and actually overlay a white screen to make it appear like it never loads if you block its cookies.

That's CRAZY. Couldn't reproduce in Edge though.

blacksmith_tb · on June 13, 2018

This isn't uncommon for sites that sniff adblockers too, though it varies how much of an obstacle it is. Often you can Inspect the div they are covering things up with and just delete it (or block it for good with uBlock etc.) - clever implementations won't fetch the actual content you wanted to read, so you'll only uncover an empty page, though.

KozmoNau7 · on June 13, 2018

That's why I use Cookie AutoDelete. It lets the page set cookies, but as soon as you navigate away or close the tab, they're gone.

kop316 · on June 13, 2018

Did you already have cookies installed? I found that if I let it load a cookie then block it, it would load (but it already had a cookie on there, defeating the point). Try clearing out everything (or try on IE)

propogandist · on June 13, 2018

replicated and confirmed on Chromium

preventing the Reddit script from executing (uMatrix) will prevent the white screen overlay, also.

marcosdumay · on June 13, 2018

Take a look at the "selfdestructing cookies" FF extension.

The problem with simply blocking cookies is that you'll break session handling at web hosts. That's probably why Reddit fails to work. And you don't need all that to avoid cookie tracking.

EDIT: Yep, looks like it's not compatible with the latest versions.

derf_ · on June 13, 2018

There used to be an option in Firefox: network.cookie.lifetimePolicy=1

It has been broken for a long time and was eventually removed: https://bugzilla.mozilla.org/show_bug.cgi?id=606655

aerotwelve · on June 13, 2018

What's the best way to circumvent this? Is it even possible?

I'm no expert (which is why I ask), but I assume that blocking third-party cookies in your browser won't prevent situations like the tracker example the author provides.

That is, since you visited tracker at least once, their cookie would have been set during that visit as a first-party cookie, and therefore the http requests to retrieve the 1x1 transparent image from their server will contain the data they're after, right?

pdkl95 · on June 13, 2018

> What's the best way to circumvent this? Is it even possible?

1) Get rid of the misfeatures that allow the problem to exist. Change the browser to never send headers that leak information by design (Referer, Cookie, Etag, User-Agent, etc).

1.1) (Optional) Fix stateful sessions that previously depended on cookies with a new HTTP session+authentication feature (that doesn't have the problems that made the Authorization header mostly useless).

2) Strip most of the other HTTP headers that leak bits of entropy so the browser fingerprint is too small (~16 bits max?) to be a unique id.

2.1) (Optional) Add some of the removed functionality back as a single header that reports a single "browser class" out of a handful (<32, 4-5 bits max. ~8 would be better) of predefined classes (e.g. "Standard Desktop with screen size between H1xW1 and H2xW2 with >=2 channel audio output. Supported codecs: audio=[MP3, AAC], video codec [...]", "mobile with multitouch screen with size ...etc...").

3) Disable Javascript. Running Turing complete code from potentially malicious remote hosts will always be dangerous, because it isn't possible to answer any question about the behavior of a program without running it (halting problem in general; Turing machines with >=7918 states cannot[1] be proven with ZF set theory). A safe web of documents is possible. Software needs to be handled separately.

Of course, none of this will happen because the people with the power to make most of these changes derive a lot of their income from surveillance.

[1] https://www.scottaaronson.com/blog/?p=2725

dahart · on June 13, 2018

I'm very much all for improving the security and privacy of the internet and my computer, but this seems pretty over the top to me.

> Get rid of the misfeatures that allow the problem to exist. Change the browser to never send headers that leak information by design (Referer, Cookie, Etag, User-Agent, etc).

The internet is the problem. If you want to get rid of being tracked on the internet, you have to stop using the internet. If you remove user agents & cookies & tags, you don't solve the problem and you lose some useful features. None of those things keeps your ISP from watching, nor do they stop web sites from noting your IP & requests and storing them on their end. And for anything you have to log into, there's no point to hiding headers.

> Of course, none of this will happen because the people with the power to make most of these changes derive a lot of their income from surveillance.

That's probably not true now, and it's definitely not representative of the reasons the features we have were invented in the first place. Some people really did want custom features to identify a computer's capabilities. Without headers, we'd gimp caching, and we can't differentiate between mobile & desktop, for example.

> Disable Javascript. Running Turing complete code from potentially malicious remote hosts will always be dangerous

This simply isn't possible to avoid in any practical way. Windows, MacOS and Linux run on code from a potentially malicious host, as do all applications you didn't write yourself. I mean, disable Javascript if you want, but you're also cutting yourself off from all web apps by doing that. And Javascript may have more security and oversight than anything you download from any app store, it's more sandboxed by design than binaries are.

I'm not sure why you're talking about the halting problem, that just isn't a serious concern in practice, it's a CS theoretic issue irrelevant to this thread or privacy. The major browsers will all let you kill stray JS processes.

Furthermore, because of browser sandboxing, it is possible to answer some questions about Javascript, unlike binaries you download from the internet. Frontend Javascript is not allowed to access arbitrary paths in the local filesystem without the local user's permission, just for one example. Nor can they read all cookies.

koolba · on June 13, 2018

> What's the best way to circumvent this? Is it even possible?

Set you browser to clear all cookies on close, use a separate browser for anything that requires authentication (ex: gmail), and never mix the two types of browsing. If they create a profile on you the cookies it's tied to disappear when you close your browser.

It's feels like a minor pain when you first start out but you used to it quick. Plus since you're not logged into anything by default there's a slightly higher barrier to ordering needless crap online.

It's not foolproof as you can be tracked by a combination of other factors (see: https://panopticlick.eff.org/) but it's much better than the alternatives.

jedimastert · on June 13, 2018

There's also Facebook Multi-account Containers (https://addons.mozilla.org/en-US/firefox/addon/multi-account...), which might do what you're looking for

gaius · on June 13, 2018

If they create a profile on you the cookies it's tied to disappear when you close your browser.

If they see you with an IP address and a cookie and a moment later see that same IP with the same browser etc does something else they will correlate them. There is a whole industry around tracking people who explicitly do not consent or have withdrawn their consent to be tracked. That’s why we need GDPR.

plurgid · on June 13, 2018

One of the reasons I love HN is that the commenters here usually have a much deeper understanding of this sort of thing than I do.

Which is why I'm left wondering why nobody has mentioned Firefox Incognito mode (chrome too I think).

At least on firefox, incognito mode does not store cookies on disk. They persist for the duration of the tab/window you logged into.

this would circumvent cookie tracking, I think. I mean I guess not if you opened one icognito window and did all of your browsing inside of it, and never closed it?

am I missing something?

dahart · on June 13, 2018

Incognito, aka private browsing, aka guest profile, is a great way to avoid permanent cookies (and local storage too!). This feature exists on all major browsers.

This doesn't solve all tracking, but it will stop some cookie abuse. Choosing to use it also comes with the downside that you can't stay logged in to sites, and you may lose context & history you wanted to keep.

Incognito is super useful for web development precisely because you can very quickly get a fresh profile with no cookies in it.

propogandist · on June 14, 2018

private browsing won't stop browser fingerprinting, which is an increasingly common tactic. Your browser fingerprint then is linked to other attributes (including other devices you may own, where say an IP may be shared) allowing firms to build profiles that are not-linked to cookies, which is harder to block.

Blocking the canvas fingerprint also enables easy identification, so you'll need a free add-on that generates noise.

dahart · on June 14, 2018

Yep, all true. Incognito doesn’t protect you from tracking. The sooner cookies become useless to sites & advertisers, the sooner they come up with something else we can’t block. We might be mostly past that point already.

davidhyde · on June 13, 2018

> What's the best way to circumvent this? Is it even possible?

I set my browser (firefox) to clear all cookies on exit but I let my browser save passwords whenever possible. That way you have to log in every time you use a service but at least you don't need to type in the login info every time. It's quick. Of course, this does not work nicely for two factor stuff but you can use another browser for those.

taurine · on June 13, 2018

No it is not possible. Other users similar to you provide the data to help track you, so you'd have to circumvent this all together.

The industry is moving to cross-device tracking to track you over multiple devices, without using cookies. This is probabilistic, not deterministic: There is 88% chance this is user A. But with huge amounts of data still useful.

airstrike · on June 13, 2018

I vaguely remember using a Firefox extension a long time ago that allowed one to whitelist / sticky a handful of domains that would be spared from the usual "delete every cookie", giving the user a renewed sense of control over what the web knows about them.

Nowadays with online fingerprinting¹ this may amount to nothing more than placebo, but I do miss it.

__________

1. https://arstechnica.com/information-technology/2017/02/now-s...

lucb1e · on June 13, 2018

Called self-destructing cookies. It broke with web extensions and cannot be replaced (like many other add-ons I use) because the web extension APIs to provide the functionality do not exist. I'm still on Firefox 55 though, so I can still use it (like firegestures, quickjava, no close buttons, vertical tabs, and others that are labeled as legacy).

I always find it very creepy when I looked something up on someone else's laptop and use it again half a year later, only to find that it remembers my last visit and (for example) centers the map where I last left it. I'm so used to having things be cleaned up against tracking, I don't even really experience what the web is like these days.

ealhad · on June 13, 2018

> cannot be replaced

Or can it?

https://github.com/Cookie-AutoDelete/Cookie-AutoDelete

https://addons.mozilla.org/en-US/firefox/addon/cookie-autode...

lucb1e · on June 14, 2018

Ah, they did finally implement an API for localStorage then. Good to see!

That just leaves hiding the tab bar, gestures that work in all windows (e.g. also in the view-source URL windows) and that work before the target page has loaded, etc.

ealhad · on June 14, 2018

Right. And maybe it should be done before supporting VR.

severine · on June 13, 2018

Happy user of Cookie Autodelete here, I agree it is a good replacement.

a_imho · on June 13, 2018

Browser vendors are very much complicit in this abuse, imo cleaning up cookies should be opt-out if they were serious about privacy.

gerbilly · on June 13, 2018

Firefox has firstparty isolation, can anyone comment on how much protection this offers against being tracked liked this?

dstjean · on June 13, 2018

Thank you! Great vulgarization...

I'll share that with my non-IT colleagues!

die_fault_user · on June 13, 2018

Where is this information stored on my computer? Is there a central location for the information that I can look at or software to read the cookies?

zeta0134 · on June 13, 2018

The information is stored within your web browser, so the instructions to view it will depend on what OS and browser combination you use. In Google Chrome for example, you can view cookies in the Developer Tools (F12, or Menu -> More Tools -> Developer Tools), under the Applications tab. This will show you the cookies visible to the website in your current browser tab. Firefox's developer tools have similar capabilities; I don't know the instructions for other browsers offhand though.

Cookies are sent to the website by your browser automatically when you visit pages. This is usually limited to the cookies belonging to the domain that set them, but the rules allow some flexibility for cross origin sharing. When you hear about tracking cookies, these are most commonly set by an embedded iframe; these can use a different domain from the page that embeds them, and in the case of ad networks this domain is often shared among many sites. These cookies present the largest potential danger to privacy, as they allow a third-party domain to track some browsing behaviors on the host sites in a way that isn't obvious to the user, and this can be used to build up a profile about the sites that user visits most frequently.

If you clear your history in your browser, the website will see no cookies from your browser on the next request. Most sites will simply set a new set of cookies immediately, treating you as a new visitor. You can instruct most browsers to automatically clear your cookies when you exit. Browsers which use a "private browsing" mode also typically use a separate cookie store, so they won't send any cookies from your regular session. From a tracker's point of view, this creates sort of a second user, and in theory should separate that activity from your main accounts. (In practice this can be easily circumvented with browser fingerprinting if a tracker is particularly determined.)

Not all cookies are bad, mind. They're one of the earliest widely adopted implementations of "local storage" for websites, and for a time they were the only reliable way a site could remember a visitor between requests. The most visible effect of clearing your cookies is usually logging you out of everything, since most sites still store your session this way.

bogomipz · on June 13, 2018

>"Not all cookies are bad, mind. They're one of the earliest widely adopted implementations of "local storage" for websites, and for a time they were the only reliable way a site could remember a visitor between requests."

Could you elaborate on what you mean by "for a time they were the only reliable way a site could remember a visitor between requests"?

Isn't this still the dominant/primary way websites add state to a stateless protocol? What other way is there for managing se? Is there something that has supplanted cookies for "remembering" or managing sessions?

antsar · on June 13, 2018

One approach that doesn't rely on cookies is HTTP Basic Authentication.

The first request to a protected page will produce an authentication prompt[0]. Subsequent requests to the same site will automatically send the same set of credentials (in every browser I'm familiar with. This part of the spec seems to be optional [1]).

Using HTTP Basic Authentication, the server can track the user across different pages. All other state can be maintained on the server side, keyed to the user.

[0] https://i.stack.imgur.com/QnUZW.png

[1] https://tools.ietf.org/html/rfc7617#section-2.2

bogomipz · on June 13, 2018

Sure the base64 encoded authorization credentials in the header are the unique identifier in this case. I guess I don't view this so much as an alternative to cookies for general internet browsing much as I do adding a thin layer of security for resources on things like corporate LANs.

eli · on June 13, 2018

Why is this better than a session cookie? Basic auth is a pretty wonky user experience. Hard to customize the prompt and "logout" is awkward.

antsar · on June 13, 2018

I didn't say it's better :) Just an alternative.

One way to handle logout (without closing the browser) is to have a logout link with a destination of "https://bad_username:bad_password@example.com". I believe this causes the browser to forget the original (valid) credentials and attempt authentication with the invalid credentials. This will fail, and produce a new login prompt. Then you have to close the prompt, and close the subsequent "401" page.

So yeah, it is awkward.

eli · on June 13, 2018

Fair enough. I think modern browsers warn on links with username/password. The UX for basic auth is so bad it is not really a usable feature

throwawayjava · on June 13, 2018

> What other way is there for managing se?

In the 90's and early 00's I used to see the session token in the URL of every request.

For example, instead of:

        <a href="https://news.ycombinator.com/threads?id=throwawayjava">Comments</a>

you write:

        <a href="https://news.ycombinator.com/<token>/threads?id=throwawayjava">Comments</a>

or more commonly:

        <a href="https://news.ycombinator.com/threads?id=throwawayjava&token=<token>">Comments</a>

And when making a JSON request, instead of:

        post_with_session_cookie("/auth/api/<et cetera>", ...)

you write:

        $.post("/auth/<token>/api/<et cetera>", ...)

This has other major problems; the most obvious is that it's extremely easy to accidentally session hijack ("oh here's the link to the completed order form: www.yoursite.com/orderForm?token=<my token>"). Also, the attack surface for session-hijacking XSS is a lot larger. There are other security problems.

You can mitigate some of these problems by changing the token on every request, but now your security problem is only a (massive) usability problem.

None of this is the default for any major web framework, which is probably why this style of authentication completely disappeared in the mid 2000's when people stopped rolling their own backends from stratch.

bogomipz · on June 13, 2018

Sure. I guess I didn't do a very good job articulating my question. What I really meant to ask is what other alternative exist on the "modern web" to manage session state without cookies. Cheers.

edjroot · on June 13, 2018

Adding to the other replies, there's also IndexedDB, which can store considerably larger amounts of information cross-session.

bogomipz · on June 13, 2018

Is there a difference between using the LocalStorage API and IndexDB? Are these similar? The same?

eli · on June 13, 2018

The LocalStorage API is a more efficient way to store data client side with a a cleaner API

bogomipz · on June 13, 2018

But this is where cookies in modern browsers are stored - in the LocalStorage API no?

Aren't they complimentary instead of mutual exclusive?

die_fault_user · on June 13, 2018

Thanks!

airstrike · on June 13, 2018

You can visit chrome://settings/siteData?search=cookies if you're using Chrome

limonkufu · on June 13, 2018

I really don't understand why this is a bad practice. I know it is horrifying to give your web history to total stranger for god knows what purposes they will use. But going extra mile to implement privacy so that no site/some sites could talk behind your back (looking at you firefox multi account containers) seems like an equally horrific act that cripples websites not ad providers.

When I used these kind of precautions I saw that analytics got no access and I believe most of the site-owners need these information to operate/develop their sites and it seems like a lot of work to implement those in-site tracking features yourself. Or I started to see random ads all over the place like early 2000s, I do enjoy targeted ads because when I am looking for something those ads could help a lot, only if there is a way to stop them after I made a purchase though.

So, if anyone could simply explain why this is SO bad or send me to correct discussion (I do believe these matters discussed previously a lot).

zAy0LfpBZLC8mAC · on June 13, 2018

Because there is no consent.

If I were to start following you whenever you are going anywhere, sit next to you whereever I can, and write down as much about your life as I can, without asking you for permission first, would you also agree that that would be acceptable if I claimed that I need that information to operate or develop my business?

Also, obviously, noone "needs" that information, that's just bullshit. It may sometimes be helpful, but that doesn't mean you need it--just as any other business might be able to learn something from surveilling my non-online life, but that doesn't make it a need for them to spy on me, especially without my consent.

Whether you like targeted advertising is completely irrelevant, as noone is telling you that you may not agree to being spied on. That's like saying that there is nothing wrong with forcing everyone to walk around naked because some people enjoy appearing in porn.

throwawayjava · on June 13, 2018

> it seems like a lot of work to implement those in-site tracking features yourself

Aren't there libraries/frameworks/products for exactly this? E.g., when I google "website tracking framework" amplitude.com is top ad result, and it seems to cover the business uses. And http://google.github.io/tracing-framework/ is the first non-ad result, which seems to cover the legitimate technical uses.

> I believe most of the site-owners need these information to operate/develop their sites

Can you give an example of a piece of user-relevant functionality that cannot be implemented without Google Analytics?

IME especially Google Analytics is mostly useful for business reasons, not technical reasons.

It's certainly fair to say that it's difficult to operate a profitable web business without Google Analytics. But that's a very different claim. And the difference is important because...

> So, if anyone could simply explain why this is SO bad or send me to correct discussion

Legitimate customer-business relationships should always involve informed consent. Cookie blockers and Firefox containers provide the technical tools that enable me to make an informed decision about whether to use your site. Without those technical mechanisms, it's very difficult for me to constantly monitor whether you are tracking me.

You/Google are free to deny me access to your products/content if I choose not to be tracked. But I should be allowed to make an informed decision about whether to use your site. The tools you're complaining about enable that informed decision.

limonkufu · on June 14, 2018

As you've said in today's world it is difficult to operate a 'free' web-service without ads and ad-tracking, and I always thought since I am using their service I could give my data for ads since their business depends on it. But after thorough research about tracking I agree with you that these decisions must be informed and if this is not acceptable, either I should be denied service or the business model should be changed (after enough users like you and now I chose not to be tracked, I believe it will change)

JackCh · on June 13, 2018

> "I do enjoy targeted ads because when I am looking for something those ads could help a lot"

You said could instead of do. Have they ever actually? Do you really click on ads? I don't think I've ever encountered somebody who admits to willingly clicking on ads. The only ad clickers I've seen are people who do it by accident or people who don't realize they're clicking on an ad (usually older folk with poor computer skills.)

limonkufu · on June 14, 2018

I occasionally click the ads when I am shopping online, by 'could' I meant just seeing (not clicking) another options/alternatives of the same type of product helps.

m90 · on June 13, 2018

Clicked on multiple ads for a multitude of reasons in the course of the last 38 years. I'd be surprised if I'm the only one in here.

mikestew · on June 13, 2018

What were you clicking on in 1980?

mkirklions · on June 13, 2018

So a cookie only knows the website that referred me?

So if I copy paste the website in the address bar, they dont learn anything about my last browsing habit?

wierd0 · on June 13, 2018

> So a cookie only knows the website that referred me?

Not really. Instead, each time you return to a site that has set a cookie on your computer, that cookie is included in the request header.

That same site will also know about your last visited page, even if it's outside of their domain, because of the "referer" frpm the request header.

> So if I copy paste the website in the address bar, they dont learn anything about my last browsing habit?

If you do that, then the referer will be empty and whatever site you visit will not know what you did last.

Cookies are just one thing. Web beacons i.e. tracking pixels, and the fact that companies utilizing those to suck up data about web users sell it feely to others for the sake of targeted marketing, is the reason you see peronalized ads all over the internet whenever you've finalized an online purchase.

sandworm101 · on June 13, 2018

This revelation should be front page on every newspaper. That IT companies have been hiding these things inside our computers is a violation of our privacy, even our property rights. How muck electricity has been used by these things, electricity I pay for. Either Google needs to reimburse me for hosting their "cookies" or we need to ban cookies altogether.

https://torproject.org/

bcoates · on June 13, 2018

How do you think HN logins work? Cookies are the basis of session management. If you don't want to store cookies for Google, don't. It's a feature right there in your browser.

There are lots of shady tracking systems in the world and cookies aren't one of them: they are clear, user-visible, and in the user's direct control both in theory and in practice.

Tor isn't relevant to this. If you're using Tor to block cookies you're Doing It Wrong.

gaius · on June 13, 2018

Cookies are the basis of session management.

They are one technique. In, oh, 1996, we did this by simply generating a unique URL for each user. If you wanted to stay logged in you bookmarked it, and if you didn’t you... didn’t. It was right there to see in the address bar as well, no sly hiding it in HTTP headers.

dahart · on June 13, 2018

FWIW, cookies started being used for session management in 94. The privacy debate about them was going strong by 96.

> In, oh, 1996, we did this by simply generating a unique URL for each user.

That's certainly one way to do it, but you're not saying it's convenient or great for privacy, right? If the URL is the auth token, then there's no security. Typing URLs, sharing URLS, and bookmarking (logged in, logged out, shared links, server side rendering), all get problematic.

sandworm101 · on June 13, 2018

bcoates · on June 13, 2018

So are you always simulating someone who knows nothing about anything but still thinks something should be done about it, or just in this post?

giggles_giggles · on June 13, 2018

Since you're proposing banning cookies altogether and I've written a few authentication pages in my time and cookies seem to me to be rather important for managing sessions for users so that they can log in successfully to a web page, can you propose what we should use instead of cookies for boring old session handling?