I'd much rather see a browser with some kind of built-in distributed cache, something along the lines of FreeNet, but trading perfect anonymity for performance. Given a large chunk of disk space, and a handful of browsers talking to each other in a local area (e.g. same ISP), it should be viable to concoct a scheme where after a handful of browsers request a particular page, the remaining browsers are confident enough that the data cached in their local group is representative of the data sourced from the origin network.
There are a million issues to iron out with a scheme like that, e.g. bad actors injecting crap into the cache, handling staleness, interactions with dynamic content and API endpoints etc., but I think something like this would have a much greater privacy benefit by denying at least some traffic to the origin networks, or simply by keeping some of that traffic within the boundary of the local ISP's network (and if the local ISP is evil, requests between the nodes could be encrypted as in FreeNet).
(Also, the mere fact such a scheme would have to jinx with the mechanics of a fundamental browser security mechanism should be enough to indicate how difficult it would be to implement safely!)
Consider that SSL is largely used for connection encryption, though it has the additional side-effect of site authentication.
If you can still validate the site, and rely on a hash of the content to detect changes, then you're starting to get toward a cacheable, secure, autenticated, system.
I need to think more about this.
That said, the web site offers great docs with lots of config samples to start from. If you're starting from scratch, take a look at CARP configuration. Squid also speaks ICP and HTCP.
Squid is one of the rarely-sung heroes of web content delivery.
Still, it's a neat idea.
unfortunately a lot of abuse is performed via PIA so some sites (especially financial and forums) block PIA ip's.
Combine that with a script that takes all non-sensitive browser history and cookies and does the same -- swap out with a random stranger.
Neither is practical and the former would require (at least for my provider) that I go through a sometimes cumbersome process of registering the new device's MAC address.
As others have pointed out, it's not about you getting the data, it's about the Analytics services you're using getting the data.
I'll also point out that if all you want is to know how much traffic you're getting, you can do it with far more ease by simply looking at your server logs. Why server-side analytics aren't used more I have no idea.
I'm really not opinionated (I think the plugin is interesting and am completely ok with being tracked) I just found it a bit hypocritical to create a product against tracking while simultaneously providing tracking data to Google.
Thanks for raising this point.
Also read: https://www.google.com/policies/privacy/partners/
This is all or nothing. You can not separate the "good" use of data vs the "bad" use. It's all the same data. Only the usage is different.
It estimates since I installed it that I've cost advertisers $460 in clicks.
I wasn't convinced it was very useful and thought it would only take some easy heuristics to filter out when I first saw it, but if Google removed it from the store, it must have been a pain to deal with.
Maybe I'll even end up using it.
Generating click fraud on your favorite web comic's website or other small websites you may be visiting is pretty sad. Adsense bans for life.
(I'm not sure what the tipping anecdote is supposed to mean, but I don't understand why US customers put up with tipping either, the business is responsible for it's operating budget, and there is no legitimate reason to bother the customer with it.)
These ideas are useless from a technical level (for all the reasons that have been mentioned already.)
Where they are useful is at a social level. People are energized and ready to fight. Many of them didn't know about this issue. Many of them didn't know that there are things they can do as individuals to fight back. Your tool (and mine) are getting attention because they open eyes and tap into pain.
As useless as noise might be, people understand the idea and that makes it accessible. That means people will try it, get it, and share it.
We need to leverage that attention in order to teach those people things they need to understand about privacy. Our tools should be seen as a gateway into impactful approaches like Tor, VPN, HTTPS Everywhere, Privacy Badger, and the EFF at large.
Tooting my own horn: that's what I've been doing with https://slifty.github.io/internet_noise/index.html
In all interviews I make sure to explain that while this is an amusing form or protest, it is not effective and people who care need to go take the steps outlined on the project page.
A website can do this. A chrome plugin, however, risks being harmful with minimal benefit. It minimizes the potential for communication to your audience, it is also harder to access which means you are touching a more narrow audience.
Here's the good news! The project I linked to is open source -- https://github.com/slifty/internet_noise/ -- you could contribute to it directly and then update your plugin so that instead of generating noise and hijacking their browser information you just direct them to the website version of the concept.
It doesn't click on anything, because it would be awkward if this by chance started sharing things on logged-in Facebook profiles etc. I plan on adding sequential requests over the weekend so the traffic is more realistic.
It's open source and on Github, so you can download, install and modify it from there if you wish .
Edit: Yes, I know this is supposed to mask your actual viewing habits. But security through obscurity has never really panned out for anyone in the end.
It's also not really "security by obscurity", which I believe refers to situations where techniques uses to secure a system are kept quiet in the hopes no one will figure them out. Here we have a system that is already breached and the point is to defeat analysis. Most analytic techniques I can think of would be defeated by the right kind of "noise", despite the fact that there may awareness that the noise is in there.
By the way, I once got blocked by Google after installing a plugin that did automatic random searches in the background.
While I applaud the authors for trying to solve a problem, I will not install this plugin unless its open source. There is no way I'm going to grant the above permission to some random plugin just because they tell a compelling story.
You can do it with curl or similar to download it, but there's a chrome extension that does the work for you: https://chrome.google.com/webstore/detail/chrome-extension-s...
.. Or even a combination of the above.
Has anyone else ever thought about doing this?
A long time ago I tried to delete my Facebook and realized it only deactivates until next login. You have to specifically request that they permanently delete it or something. And, it's not even clear if FB still doesn't store your info or not after that whole process (little sketch to be fair...).
So I came up with this idea that I'd make a service called socialfacewash.com that just completely trashes your digital profile (liking random things, changing your info, and just basically obfuscating what FB thinks they know about you).
I never built it though, but I kind of wish I did. Still own domain if someone wants it.
Trend seems to be that we're not going to have protections over our own digital privacy/data for a very long time. Maybe a service that could at least mask, lie, trash, obscure our footprint for everyone else would be nice to have.
The even got a cease and desist letter from FB.
People are creatures of habit. Example: I read HN with my coffee every morning. Interjecting meaningless data doesn't prevent them from finding my patterns.
If this anonymized my tracking data for sites I visited, that would be useful, but sending a bunch of random hits doesn't seem like it will keep anyone from tracking my activity which is what I want to shield.
Returning fake responses to tracking cookies is more efficient. For phones, returning bogus info to app requests for contact lists and location is especially effective.
I understand that perhaps they _could_ ask the VPN provider to give logs (depending on the provider, if they keep logs or not). But that would not work for the algorithms used to target you as an individual.
Could someone please explain that to me? :-)
First is cookies. Say you log in to Facebook with your browser from your own IP address and a hour later you fire up VPN to browse the web. One of the websites you load has Facebook's tracking pixel and - unless you cleared your cookies - boom: Facebook knows you have visited that site.
Even if you clear your cookies, you may have some long-living stuff (like Flash cookies etc) that can be used to track your browser.
And even if you clear everything or use incognito, sites can use some clever heuristics (CPU power, enabled plugins, timezone, browser version, webrtc, etc.) to track your browsing and to match that it's you.
The last part of which you wrote is incredibly interesting. I had no idea they could find out the CPU power or the enabled plugins.
Though, to be fair, I think I'd fall in a pretty large pond to match. And either way, that information does not seem very useful (to me) for targetted advertisements.
Thank you for your answer :-)
You have to fake your browsing behavior with pretty high accuracy but if you do that, you defeat the entire purpose, you just created a copy of your browsing behavior. Maybe if there is no prior knowledge of your browsing behavior and the fake traffic is human-like enough, but even in that case it seems not too unlikely that one could separate the two traffic sources. If you want to hide your browsing behavior, the fake traffic must be different from yours. But if there is a difference, the difference can potentially be exploited to separate the sources.
It's important to conceive of this as an arms race, and I think one of the underappreciated things about arms races is that if you successfully anticipate the next three things your opponent will do and counter them, you can end up discouraging them from even fighting in the arms race. I don't know how random the page visits are, but if they are random in, well, almost any sense, eventually they will be something the trackers can filter out. "Well, this person shows a really strong signal for football, and weak signals for anime, robotics, accounting, and a hundred other things. Show them the football ads." Even today that wouldn't necessarily pose much of a challenge.
By contrast, if you get profiles from volunteers that are real, and then start mixing them up so that everybody downloading this extension shows ten or fifty equally strong signals of interest, of which only one is real, that'll scramble the data being collected something fierce, and require the data collectors to jump straight to very sophisticated teasing out of what's really true, which initially won't even be worth it until a lot more people are using this stuff. There's a lot of elaborations you can think of from there (temporal correlations, i.e., college basketball interest should be spiking now, vs. football), etc.
I have no idea what this extension is doing, because I can't seem to find any data about what it is doing, so maybe it's doing some of this stuff, but I expect they'd be talking about at least the data they'll need to collect to make this really work if it was going to work this way, so I assume without proof that it is not this.
In general, it's worth pointing out that a lot of systems can't be fooled by uniformly random data, because all real-world systems already have to be able to filter that out because all real-world systems experience noise, of which at least a significant component is probably more-or-less uniformly random. If you really want to scramble a system you need to be more clever.
But, wouldn't collecting viewing habits and then using AI to define (and emulate) real-looking behavior immediately put the developer(s) in that moral grey area that so many algorithms occupy? Technically it could be done, and it would be fascinating to work on, but we'd have to start with a huge browsing dataset (creepy) and then process it to figure out the patterns (exactly what this tool is trying to subvert), and then feed that back as output from within the user's browser (probably feeding back indistinguishable-but-AI-driven data and creating a loop). It's a murky space to wade into, and one that needs a lot more conversation.
Instead I decided to just keep it simple. The first page is chosen randomly from a list of (user-approved) sites. A link on that page is chosen randomly from the list of links that open in the same window and point to the same domain, and that's clicked. That's repeated a somewhat-random number of times, usually about 2-7 times, before a new site is chosen from the user-approved list, and the process starts over.
Check out Cathy O'Neil's definition of "Weapons of Math Destruction" (good overview of her book here: http://money.cnn.com/2016/09/06/technology/weapons-of-math-d...) - I'd love to hear your thoughts on that framework for determining the morality of algorithms.
This would also increase your traffic ten or fifty fold in a first approximation. You would make a single household look like a family with several people sharing the Internet connection. Netflix for example can separate several people using the same account, I read about that during the Netflix Prize .
But overall I agree with your point, you probably can throw a wrench into the machinery maybe even to the point that it is not worth to pursue tracking any longer. But it certainly is not easy and randomly clicking on links will definitely not do the trick.
I suspect if you worked at it you could mathematically prove that will be a necessary condition for any effective data collection scrambling technique, under an assumption that we can't use proxying to other nodes. And I do mean "mathematically" fully literally.
I'm eliminating the possibility of proxy, where you try to set up a situation where you create a P2P network and trade page views around, because I think that only works with a really abstract view of how the tracking works. In practice, as soon as you want to use a site in an authenticated fashion you're getting tracked via that authenticated account, so I think I can argue the only real possibility for scrambling the data is for it to source from the same network location as the real data you are generating.
On that note, it occurs to me this plugin probably ought to be automatically creating authenticated accounts on services you don't care about; the authenticated status of an account creates a shining signal that may be too bright to mask. Considering that a lot of the data you're trying to mask would be coming from Facebook, that would be a problem.
And now that I think of that, use of this plugin really ought to result in your Facebook profile getting pretty badly scrambled, too.
Man, this is a big challenge. There's a part of me that is actually quite sad I can't drop everything and start going to work on this right now for pay. This sounds wicked fun, but way bigger than I could ever dream to take on.
This smells like an adversarial problem, where one AI makes fake traffic and one tries to detect it. Perhaps the fake session creation bot could mingle on the same sites as the real sessions to make it harder to distinguish.
What if an American ISP had a branch in Europe, could they sell our data, or the history must have been generated on American soil, or something like that?
If the overall history data is aggregated you won't really need to filter out the noise. Random browsing will just disappear under the threshold of noise and the 'real' browsing will stand out.
That would really screw up things for any location tracking.
Edit: it was a 24-carrot job.
creates a massive botnet which will eventually be used for some nefarious PPC scam or worse
you decide ...