Hacker News new | past | comments | ask | show | jobs | submit login
A Potential Privacy Model for the Web (github.com)
52 points by dedalus 50 days ago | hide | past | web | favorite | 37 comments



> The identity "Me while I'm visiting nytimes.com" is distinct from the identity "Me while visiting cnn.com".

Trying to solve this through purely technical means is futile. If you block it at the user-agent, sites will share data at the back-end to create a super-profile.

Right now it's really convenient for advertisers to run an ad auction right in the user's web browser because all the context is there -- take that away and you'll see user data aggregated on the back-end instead.

Absent some type of regulation and enforcement, I really don't see how this puts a dent in the "reads a lot of articles on NY times about dogs, sees a lot of ads on cnn.com for dog food" profile aggregation.


> If you block it at the user-agent, sites will share data at the back-end to create a super-profile

This needs a bit more technical detail. If you mean they'll combine IP + other fingerprinting, we can work on mitigation techniques there too.

> I really don't see how this puts a dent [...]

It does as it asks sites to more explicitly install something server side with their HTTP server instead of embed this one-line script tag. Changing from the browser being to store of cross-site identifiers to the backend has a chance to shine more light on the practice and increase the burden of tracking. It can make a real dent.

Regulation/enforcement are orthogonal to technical solutions. There are also varying levels of support for the former vs the latter and we shouldn't mix them nor should we blindly say "regulation and enforcement" without nuance. Many, including myself, are against most regulation/enforcement approaches due to implementation incompetence (intentions notwithstanding). But regardless of that debate, it shouldn't muddy the technical debate.


> This needs a bit more technical detail. If you mean they'll combine IP + other fingerprinting, we can work on mitigation techniques there too.

Yeah, but instead of playing cat and mouse, just make it illegal and fine anyone caught violating it.

Honestly, banning tracking would end the race to the bottom and be good for publishers and consumers. It probably won't affect FB & Google because they are too big to be displaced. It may kill a bunch of middlemen, but they are leaches and should die anyway.


> just make it illegal and fine anyone caught violating it.

I agree that this should happen. Unfortunately, this requires a revolutionary political movement which has so far failed to materialise. When there isn't adequate appetite from the rest of society for the protections you want, your solution has to be a technical one.

Maybe another way to kill the middlemen is to develop technology which undermines their business model, e.g. DTube; YaCy.


Fortunately, they won't do it on the back-end. The ad industry has massive fraud problem, and the lack of trust prevents them from accepting traffic data they haven't seen themselves.

If you really force the industry to switch to "trust me, I've seen these users, now pay me" APIs on the back-end, it'll be a massive shake-up of the entire business model.


> take that away and you'll see user data aggregated on the back-end instead

OK, but at least then it's not polluting the user's experience and burning the user's CPU cycles. Still a strictly positive change IMO.


The negative side is that you can no longer see what sites are doing it, what they're doing, or block it in your browser


Do people 'see' what the sites are doing, and which? Does it matter if you just prevent it from happening?

> The negative side is that you can no longer [...] block it in your browser

If they're not doing it in your browser then you don't need to block it in your browser, because they're not doing it, because they're doing it in their back-end (which is not your browser) instead of in your browser (which is).

Honestly, what are you trying to say?


> Do people 'see' what the sites are doing, and which? Does it matter if you just prevent it from happening?

Sorry if I've not been clear enough here. Let me explain my thought process.

uBlock Origin shows me this https://i.imgur.com/Vv2xyIL.png when I visit cnn.com.

In contrast, when I go to news.ycombinator.com, nothing is blocked. It gives me some idea of what companies respect my privacy and what companies are happy to sell my internet browsing history out to advertising networks and data brokers.

Yes, I'm blocking it as much as possible regardless, but I think it's still valuable to be aware of which sites are good actors and what sites are not. The little number on the uBO toolbar icon is a rough reminder of this.

> If they're not doing it in your browser then you don't need to block it in your browser, because they're not doing it, because they're doing it in their back-end (which is not your browser) instead of in your browser (which is).

The problem is not that the tracking is in my browser. The problem is the tracking.

If the tracking all happens server-side I have no idea what sites are tracking me and I can't do anything to prevent it. I can't even avoid it because I can't see what sites do it.

This is - from a perspective of not wanting to be tracked everywhere I go on the internet - worse than having javascript trackers on each page which my browser can choose to not run.


That was helpful.

The original comment that you seemed to be replying to was "Right now it's really convenient for advertisers to run an ad auction right in the user's web browser because all the context is there". I thought you were saying blocking that crap in your browser didn't make a difference.

I can't see your pic because I never allow JS outside of a VM.

If tracking is enabled in a browser it becomes vastly easier for them to assign unique cookies to follow you. OK, now then can do it with etags and browser fingerprinting - mitigating the latter is possible, I don't know about the former.

But this...

> If the tracking all happens server-side I have no idea what sites are tracking me and I can't do anything to prevent it.

...is dubious. Etags and fingerprints aside, tracking non-cooperating (cookie declining) browsers has to be harder. I agree with you about tracking being the problem though.


Harder on shared internet connections, for sure. But my apartment's internet connection is for the most part my own traffic, or guests who bring their phone over. Any traffic coming from that can be trivially tied to me.

I can use a VPN to hide my IP on most of my devices, except for when I'm trying to watch Netflix/Amazon/whatever. But I wish I didn't have to.


There's a big difference between your ISP knowing stuff and the river of scum that is advertising.

> But I wish I didn't have to.

One way or other you will always have to. Perhaps the most important way of destroying the ad industry online is to have an alternative means of funding sites. Maybe that would work.


Of course my ISP knows my identity, but my point is it's also probably not hard for an advertising company to get one piece of data that links my real identity to my cable modem's IP address, and then any tracking data they've previously accumulated can potentially be tied to that after the fact. They just need to tell when IP was assigned to me, which is probably easy to infer from a sudden change in what websites an IP is visiting.

On a related note, remember that time when AT&T and Verizon were just giving out the cell numbers associated with their customers' IP addresses to whoever asked, because they're complete fucking morons who thought that was a good idea?

https://medium.com/@philipn/want-to-see-something-crazy-open...


You still don't know what sites are doing right now, even with all of your ad-block extensions.


One thing that might counteract this is that right now, violating user's privacy is normalized. Regulation exists (GDPR), enforcement is missing. Everyone is doing it, most sites have cookie notices/consent forms that blatantly violate GDPR, and because everyone is doing it, nobody is willing or motivated to change.

Break the platform and force everyone to re-engineer, and you can start severely hitting the companies that continue violating GDPR in the "new world".


A similar idea has occurred to me. I imagine a browser plugin that allows third-party cookies, but associates them per-domain visited. That is, the cookie that google analytics gets would be different when I'm visiting siteA.com vs siteB.com.

I don't share the author's optimism that dialogue will result in "a new identity end state that works for everyone." I believe on-line privacy has to be protected through non-negotiable mechanisms, against the interests that stand to profit from taking it away.


> That is, the cookie that google analytics gets would be different when I'm visiting siteA.com vs siteB.com.

Isn't this part of what Firefox containers do?


I think of Firefox containers (and Chrome "people") as providing a little isolation, but not enough. If I browse HN in a container, I'm likely to follow links to a lot of different domains. I'd have to diligently select the right container each time.

I'm suggesting instead the browser never mixes a cookie I was assigned while browsing say nytimes.com with a cookie assigned while browsing washingtonpost.com. Even if I regularly browse these domains in my "news" container.


Few problems that Web has for privacy:

- IPs don't usually rotate often enough.

- Browsers can share user data however they want.

- User cannot by default choose what website is allowed to run or download. There are adblockers and such, but normal user doesn't know what needs to be allowed so that site works and it doesn't leak your data. And if developers choose to pass analytics data with actual content requests there is no way of preventing that while keeping site functional. - User agents and other metadata (resolution, browser features, cookies, latencies to servers, etc.) are shared without user consent.

Browser vendors could make lot of information available only after explicit consent, but that would break a lot of websites. And its hard to say when and what should you consent. This is same problem as for Android and iOS.

There are also valid reasons to share data between services and domains: SSO, social media, etc. How to make that easy?


Some of that seems OK, but a lot of it still seems unacceptable to me. Particularly, I disagree with these assertions:

> It is reasonable for the browser to relax its identity-sharing controls within that expanded notion, provided that the resulting identity scope is not too large and can be understood by the user.

> It may be OK for a site to learn the fact that a user has earned trust on another site

But, as always, my attitude about this sort of thing as that everything hinges on informed consent. If I have not given my explicit informed consent, then there is no sharing of data about me that is acceptable.


I seriously can't see any dialog working while one of the sides has a strong vested economical interest in keeping the status quo.

Even with all JS and cookies disabled, servers can still collect your IP and infer if it is indeed you visiting by analysing your usual visit times -- and likely a lot more other metrics.

Unless the browsers use Tor-like visitor source obfuscation, I don't see anything changing in favour of privacy.


With JS and cookies disabled and using a VPN you can get a decent browsing experience without tor slowness and blocks. At least your browsing is mixed with thousand people.


Sure. But lately I keep hearing that these users are implicitly under attack by constantly requiring them to fill CAPTCHAs.


Which is the advantage of implementing it by default in common browsers. Sites can afford to neglect the experience of the 1% of users who are using Tor but not the 80% of users who have installed the latest browsers.


I haven't experienced any captcha so far. I did when using tor.


That's exactly what I was saying: that many websites' algorithms punish you for choosing privacy (Tor) by making you fill captchas all the time.


Oh, now I get it. Great idea. Also implement Namecoin support would be great.


> Third Parties can be allowed access to a first-party identity

The problem with this, is that third parties can also be first parties and have their own data. The obvious examples are Facebook and Google today, when you use any service where they act as a third party, they may mix your identity with their first party id.


The first bullet point below that title is "First parties have a way to delegate access to a user identity to specific 3p's, as long as that delegated identity remains sharded by 1p." The model is proposing that if you allow Google to have access to user identities on your site, and I allow Google to have access to user identities on my site, the browser should prevent Google from joining those identities and detecting that the same user visited both of our sites.

(Disclosure: I work for Google, like the author of the article. Speaking only for myself.)


On top of all the things people mention here there is a huge vulnerability in Chrome that is undermining the whole thing. They mine data at the browser level and sell it to advertisers. This is in addition to what tons of advertisers do on individual web pages. So irrespective of what we do as long as Google and other browser manufacturers mine user data there is not much progress.

I should add that Safari and Firefox (?) seem to be the only exception.


> They mine data at the browser level and sell it to advertisers

Go you have any actual proof of this?


Read Google's own proposal for privacy on the web. The first thing they talk about is differential privacy when collecting user data in chrome.


Can you point to the specific part? I really don't see what you're talking about.


I am on my mobile and the chromium blog doesn’t quite work. So go here and click on the link in the page that says “we have announced a plan”.

https://www.blog.google/products/chrome/building-a-more-priv...


Any progress > No progress?


This all seems like part of the tracking arms race.

The way to end it is to stop tracking in the first place.


we designed www.abine.com's Blur to enable users to implement some of these "compartementalization" techniques in practice. web traffic, fingerprinting, and tracking is one layer, but stateful registration, login, and payment is a whole other layer. tldr: it is a tough problem to deliver a simple experience on.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: