Hacker News new | past | comments | ask | show | jobs | submit login
I Visited 49 Sites. Hundreds of Trackers Followed Me (nytimes.com)
209 points by uptown 54 days ago | hide | past | web | favorite | 145 comments

And all in the service of vacuuming up tiny fractions of pennies in advertising revenue.

What a silly, fragile business model whose days are numbered.

It's amazing to consider how much revenue has been generated, predicated on completely unnecessary and (now) easily disabled browser features.

I have to imagine that like the cigarette manufacturers of yore, companies whose lifeblood is based on this kind of nonsense are kept up at night wondering where the money will come from when this house of cards collapses.

I think Brave's business model of blocking all these harmful ads that track you is awesome. Users can then opt-in to get paid to view notification-based privacy preserving ads. Websites and publishers can make up for the loss of revenue from these ads through creators.brave.com

But does it also block ads like this comment?

From my view as a person who owns a retail brand and absolutely needs to pay for advertising to get page views and resultant conversions. The current state is pure extortion. I need to pay over three dollars for a frigging click? Almost $20 per sale in advertising fees just to get page views on HoneyGear.com from both google or amazon. I’d like to sell the products for half the current list price, but in today’s market with advertising expense, it’s impossible to lower prices.

What if it's the case that advertising (placing unwanted audio/visual solicitations in front of people) as a business model just goes away completely? And businesses have to instead find other ways to get people to buy their stuff?

I, for one, have no trouble finding things to buy via search, rankings, or research, and give not one care if I never see another advertisement for the rest of my natural life.

I do not hold to the notion that advertising, in any form, is a necessary evil of capitalism. An inevitable parasite, perhaps.

> I, for one, have no trouble finding things to buy via search, rankings, or research, and give not one care if I never see another advertisement for the rest of my natural life.

How do you think those search results, those research rankings, are generated? People pay to advertise their sites at a certain level in search results too. Just because they are not immediately obvious advertisements doesn't mean they are not in fact advertisements.

The question is not how they are generated now, the question is how they should be generated if the interests of both buyer and seller in making successful deal, and not the parasites in sucking off transaction costs and extracting rent by gatewaying access. When I'm shopping for something, search results are useful for me, but dozens of intermediaries that intervene in the process usually aren't. Actually, they often make it worse - e.g. when I am shopping for insurance, it's super-hard to find a site that gives you just straight competitive quotes, instead of collecting all your private info, reselling it to 20 other parasites, and then dumping you on a generic landing page of a provider telling you "call our 1-800 number and we'll give you best quote ever!". There's literally negative value in that - I wasted my time and got nothing. Nobody is better from the existence of these parasites - not me, not the insurer (I could call the 1-800 number before, and at least I wouldn't be pissed off like I am after dealing with the parasites).

My rule of thumb is the first couple of results are absolutely worthless. I don't even read the titles. SEO is a plague as well and should also be done away with. Right now there are no good search engines at all (I appreciate ddg but it needs another decade of polish imo).

Sorry to disappoint you my friend, but the ads and paid subscriptions are just a small fraction of their income. The real money maker is in tracking user behavior and selling that aggregated data.

If worst comes to worse, they can just track your behavior on their website alone.

I keep seeing this claim. Where to sell this data and how much for example 1 million unique visitora / month worth?

User data for a single website is probably not worth much. User data for every single website a user visits in a day? A bit more. Hence, advertising networks.

It helps determine the price of the ad they are showing.

So first you need to join an ad network. Then you could be paid more for higher revenue visitors.

That's one of unsolved mysteries for me - who exactly buys that data and why? What do they do with it that is worth paying money for? Let's say you've learned I like reading sci-fi. So what? You'll serve me more relevant ads? I have an ad blocker anyway, so you won't, and even if I did not, I haven't seen an ad relevant to my interests in ages, except on the topic sites that I already visit anyway. What are they doing with it that is worth clogging all the internet and making each website 10x of size it should be? I have a nagging suspicion it's all hype and one day advertisers will wise up and this business model dies the long deserved death.

There are companies that act as data brokers for other parties and there's also companies like Nielsen which buys media consumption data in bulk. It's not just about the ads.

It's also about the way in which to split the pie. (how much should go to this creator?)

Or consumer profiling. (It seems our product is enjoyed by men in their mid 40s who like sci-fi)

Or marketing research (it seems our product sells the most in winter).

There's plenty of ways to make use of this data. That's why it's so valuable. It's not uncommon to see 7 digit contracts for high profile news/gossip outlets, for example.

> There's plenty of ways to make use of this data. That's why it's so valuable.

That's what I keep hearing. Never heard what are those ways, specifically, and where the value worth the money comes from.

> Or consumer profiling. (It seems our product is enjoyed by men in their mid 40s who like sci-fi)

OK, let's say you learned that. Let's say now you know there's more demand for beard trimmers among mid-40 males with interest in sci-fi than among teenage females with interest in Rihanna. Now what? I still fail to see what a bunch of random (and possibly entirely spurious) correlations costs in money.

> Or marketing research (it seems our product sells the most in winter).

You don't need to track me for that. You need to track your inventory. That's basic stuff every merchant having a functioning brain is already doing.

What else is the data being sold for besides advertising?

It's still all about ad revenue though.

What we need is a way to jam these trackers with generated data to the point that they are useless. Something like https://trackthis.link, but running 24x7 on countless computers, phones, raspberry pi, hacked routers, and virtual machines.

This might sound childish but it is EXTREMLY powerful. Uncertainty will make the collected data worth far far less.

For years people used custom ROMs to block Android gps data collection and Google didn't care. Then someone added the option to send random data instead and Google filed a C&D within days.

You need it to be smarter. It needs to look just like someone using a real device - visiting at sensible times with sensible durations - but slowly mixing in to the visitor's actual profile and visiting enough of the "wrong" sites to begin poisoning the data well. And you can't poison everything the same way. You need to make different users have their data poisoned in different ways, so that you can't just filter it into background noise but it becomes impossible to meaningfully filter the data to draw any one conclusion. I would love it, if I could build such a thing.

You can install the AdNauseam browser extension - it's a fork of uBlock Origin that can automatically click links for you. The idea is that you click on every single ad link while you browse the web, and completely muck up all of the data that trackers have on you.


I really don't understand how that tool is useful. So AdBlock blocks an ad, then this tool sweeps in and undoes any privacy benefit you get from not loading the tracker, AND wastes your bandwidth at the same time.

Edit: Yes, I understand that it is annoying for the ad networks to have to register lots of wasted clicks, but that doesn't benefit me at all. It just makes my own browsing slower. I figure at least 20% of the people browsing the web would have to use the extension for it to make any difference to the ad networks, and it's probably higher than 20%. That's never going to happen.

This paper by the creators of AdNauseum explains their approach and philosophy - http://ceur-ws.org/Vol-1873/IWPE17_paper_23.pdf

Individual people opting out of tracking doesn't meaningfully attack the ad tracking infrastructure, and will always be limited to a small percentage of technically proficient users who care about privacy.

A collective approach feeds bad data into their infrastructure, making their data less meaningful for ad tracking and also helping protect the privacy of everyone.

AdNauseum is the an act of protest and a tool at once.

Yes it compromises on privacy compared to just plain adblock. But if it hides all ads anyway, does it matter if nobody is making money off of you? The noise that the ad-clicking introduces must mess with your ad interest profile at least a little.

You don't see the ads, so you are blocking them and dilute the value of ads to begin with. Yes, it does waste your bandwidth.

> I figure at least 20% of the people browsing the web would have to use the extension for it to make any difference to the ad networks, and it's probably higher than 20%. That's never going to happen.

They have tried though:


So it means, they are worried about at least enough to bother.

Now, looking at my stats I just don't see as many clicks as before. So perhaps, Google just fixed their backend code to detect these clicks and moved on without triggering the Streisand Effect by fighting against this project publicly further.

I'd put it at >20% data to be an issue, not 20% users. If generated data was realistic enough (way more than it currently is), even 1% of users should be more than enough to generate say 50% garbage clicks and data.

Uhh, the ad network heads don't give a shit if you actually buy the product, they only give a shit that you've clicked and tbey can therefore command a higher price for the ad space. This sounds like it helps advertisers more than it hurts them.

This strategy would be great in browsers, to break fingerprinting. Change the permission model such that access is always perceived to be granted, so there is no distinction between providing legit data and random but well-formed fuzz.

The tracking will still be there, but the tracker will never know if you granted access or not, it would always look legit.

Unfortunately, I imagine that would be tough to pull off in practice because the problem isn't just with web browsers, it's with every piece of software that embeds telemetry SDKs.

Heya. I work for an analytics company, and this kind of thing is pretty easy to detect and mitigate. It’d take a pretty huge scale to work at all, which is maybe what you’re suggesting. At scale, I'm pretty sure the companies you most likely associate with unethical tracking would be able to mitigate it, but smaller companies would suffer, only further embedding the established giants.

> Heya. I work for an analytics company, and this kind of thing is pretty easy to detect and mitigate.

Already happens. I don't see nearly as many clicks accumulated in AdNauseum as I used to a few years ago.

Google tried to ban the extension https://adnauseam.io/free-adnauseam.html, but they've probably been reminded about the Streisand effect and just fixed their code to check for it and ignore it.

If it isn't a trade secret, can you pls outline how it is detected and mitigated? Thanks.

Heap is pretty small compared to Google, so I think that what I’d describe would be too elementary. For us, the client has some state that is hard (not impossible) to imitate, and we have a good sense for what “real” activity looks like. Someone hitting an analytics endpoint blindly will not look natural so is easier to detect. The classic heuristic is to just ignore outdated browser versions since most headless rigs that are naive aren’t on the most up to date user agent or lie about it in a haphazard way. There are a lot more sophisticated techniques that people trying to game ad networks use.

Really appreciate the info you've given here so far.

Followup question off of GP's: if networks know to filter click-fraud out based on metrics as simple as browser version, then aren't those same metrics exploitable to avoid tracking? Firefox's resist fingerprinting setting locks the reported browser version to the latest ERS. If I have that turned on, will every ad click I make be ignored?

Keep in mind, the big goal I have with misinformation is to confuse user profiles; not to generate fake clicks or waste money -- it's to make is so that my actual data profile is either unreliable or outright ignored.

If this kind of filtering is so ubiquitous, then would it be feasible defense to get user browsers to act like bots instead of getting bots to act like users? How bot-like would I need to be before advertisers started assuming the data they collected from me was untrustworthy?

Nothing is perfect, but ultimately everything comes down to aggregates. Most systems don’t care about you or your profile (it’s all automated), and generating noise will make your profile quirky for sure, and your ads will match who ever else has accumulated a similar aggregate behavior. I’m not in the ad business so what I’m saying is only based on my proximity to it, so take it with a grain of salt.

One thing that ultimately breaks through the noise isn’t what ads you see or click on, but what you actually buy. If you buy things and aren’t completely blocking every ad network, you’ll massively boost the signal. Also any real browsing routines will still be in the data, and I am assuming a lot of ad networks are informed by out of band data they purchase and things like ISP data they buy.

Ultimately the proof is in the pudding. You should experiment with various approaches and then see if you are served relevant ads, if you aren’t, you’re winning :)

FWIW I get a ton of irrelevant ads because of retargeting. I visit a lot of my customer’s websites in doing my job and they’re not usually sites I’d go to, so most of the ads I see on FB and Twitter are based on retargeting from those sites.

The way CORS and other cross domain tech works this is actually pretty easy to accomplish. Though many sites track IP, because IPv6 is out it's getting pretty tricky to filter without losing some data. Plus an IPv4 address costs less than a penny per hour to rent.

But at the end of the day it doesn't really change much. They're going to detect signal somewhere or another. Where they detect it, they keep the data and source locations, where they don't they update their spam ML models. Unless you start going to the illegal side of the spectrum (DDOS, widespread fraudulent ad engagement, etc) you're not even going to piss them off, really. They won't even notice you.

Is making a bot click on ads illegal, if you're not one of the parties financially benefiting?

Probably against the terms of service of most websites.

Ok but breaking a website's terms of service is not illegal, according to the Ninth Circuit: https://www.eff.org/deeplinks/2018/01/ninth-circuit-doubles-...

This add on does something similar, visiting sites automatically to add noise to your browsing


I imagine an alternative history where browsers simply never had the feature of cookies and similar tracking mechanisms available to servers or domains other than the primary one in the URL bar. Or even more severely, where all assets and scripts had to be loaded from the same domain. That would have various downsides but would also have created a much less tracking prone web.

The most likely scenario for that is that the browser would not allow any form of cross-domain linking.

And in that parallel universe, what ends up happening is that the ad companies provide you proxies to run on your server, and tracking is accomplished via the variety of "no-cookie" tracking options that already exist.

It's slightly better, because the bar for tracking would be raised a bit, but since there's money as a motivation for getting over it, mostly it would be passed.

The alternate reality I'm interested in is the one where the net was slightly less idealistic in the beginning and offered fewer free services, and people got used to paying for things rather than expecting them for free. I've banged this drum before, but one of the most shocking things to me is in general just how little money advertising is making per person, and how little money it would take to make it so that advertising wasn't even remotely worth it to anybody on the net if we paid directly for things.

(I worked out earlier this year that at most, Facebook makes $17/year/per user, and that's revenue, not profit: https://news.ycombinator.com/item?id=19459604#19462402 That means if you paid them $5/month, or $50 as a bundle for a year, you'd be increasing more than doubling their revenue for you. And who knows what Facebook could be if you were paying for it, and all those engineers were working on making Facebook better for you, instead of working so hard on tracking you and serving ads. You'd be paying $5/month for that Facebook instead of this one.)

I think that most people, if offered $17/year for Facebook, would quit. I don't think the value to the user is that great. That's not a problem for the company as it exists today. It feels like a flawed assumption that users have a committed interest in sites where they share content. People have a passing interest, and different parts on their brain conflict. Their lizard brain might want cat gifs, while the planning part of their brain knows they should stop procrastinating and get to work. The planning part is the one who gets out the credit card. The last thing that Facebook or ANY content-oriented site is to deal with the responsible parent, as opposed to the distracted child. Having the parent make the decisions is better for us. Much of the internet depends on that never happening.

The value prop likely varies greatly -- if you're making a living off being a "Facebook Influencer", or are running brand awareness through social, you'll be willing to pay a lot, possibly thousands, even millions, of dollars.

For the long tail at the bottom, a few bucks is too much.

And if you're just socialising with a small group, ultimately, the next available free-tier service, or Frank or Francine spinning up a Friendica node or Mailman instance is a viable alternative (or if not that than something else).

Keep in mind that FB started as a small exclusive network (a few hundreds of Harvard undergrads), and grew largely through cache and aspirational appeal. (danah boyd has developed this idea at length.) Now, it has at best neutral appeal other than it's where everybody is, which, if they go somewhere else, it instantly loses. And sticking everyone with a high fee will do that.

Also, the costs of revenue -- of simply billing for and collecting on services -- will almost certainly exceed all other costs of service, as will new-user recruitment. Which is why "free" keeps on winning (until it doesn't).

I think doubting the value to the average user is a little uncreative -- a quick search tells me the average internet user in USA is paying $600 / year for internet access (sounds right to me, I pay $50/m for a fiber connection limited to 100mb/s, gigabit is $90/m)

For a lot of people, facebook is basically the internet. It's their messenger, their photo album, their event calendar, and their community church group.

Having the cost somehow bundled with bandwidth would make it seems a lot less significant a fee.

Otherwise I agree, an internet where every community wants $5/m here and $5/m there is basically the multi-streaming-tv hell that people complain about.

Facebook and Google are the monsters on the scene. Most sites make even less money. So the problem you get into isn't even the $5/month-too-many-places problem, it's the microtransaction problem, where broadly speaking the fixed costs of the transaction, both monetary and cognitive effort, exceed the value of the transaction. We haven't fixed that even in 2019, where transaction have gotten a lot cheaper than they were in 1995-ish. Hence, "alternate universe". I won't quite say it's an impossible alternate, but it sure isn't a likely one. Even if I could travel back in time with the power to mandate it in 1995, by 2000 someone would have had the bright idea of being advertising-based and it would have a very good chance of strangling the for-pay industry.

Probably more interesting is the question of what's stable in 2030. There are steps governments could theoretically take to really turn the tide of things. One of my favorites is a 1-cent-per-impression advertising tax. I am well aware of how expensive that it relative to a normal ad impression today; it's part of the point. Let the really lucrative stuff like luxury goods and mesothelioma ads continue, but kill the massive surveillance industry sprung up to wring literally .001 cents per impression more out of you at the cost of creating 80% of a ready-made police state. I think the costs the advertising industry are externalizing on to us are hard to overstate; I would quite literally and with full knowledge of what I am saying put them as on par with environmental externalities.

This is an enlightening comment.

I have always been intrigued by the idea of micro transactions. In a modern formulation, duplicate Reddit, but replace upvotes with 0.001 cents donations (to who? I don't care). If you hold shift and click the up arrow, it'll do 0.01. Hold control shift and click for 0.1 cents. Then have larger options you can drill into. People would throw around a lot of mili-cents. But at some point, "whales" would be sure to emerge as well.

I have no idea why this isn't happening, thus I must assume there are legal and financial barriers to implementing it.

I know it doesn't help "the 99%", but you can set up your browser to block cookies by default and always block third party cookies.

What I've found is that a small subset of sites don't work without third party cookies (PlayStation store login being the only one I care about) and that a lot of sites don't expect localStorage access to ever fail (eg Codepen, I've had to fix some of my own sites that assumed localStorage access never throws).

The oddest issue I've come across was that LiveJournal would use JavaScript to immediately reload the site if a cookie wasn't detected so I had to disable JavaScript for all of LiveJournal to stop it from getting stuck in a reload loop.

Opera Browser (Presto engine) used to default to "Block third-party cookies", rather than "Accept all cookies". If I remember correctly, this subtly broken enough sites that they relented and changed the default.

I think that would be a good option. Allow first-party cookies for session tracking (logins, etc) but block third-party cookies.

Another way is to not use cookies at all and rely on local storage to store an authorization token. In fact, a site that I work on does this. We don't store your session in a cookie at all. Your token gets passed to API calls which return data for your user.

I wish there were a way to enforce this client side.

uBlock Origin and uBlock Matrix can both enforce it. But you won't be able to read the news or check the weather.

I block all Javascript, cookies and third party sites by default. I still get news and weather. Even nytimes.com works.

That's why you use AdNauseum instead. I read news, check the weather, and generally live my life. Combine it with ExplodingCookies and life is better than it was.

In my understanding, it's trivially easy to identify and filter out those fake clicks.

Also I don't want to screw with advertisers' perception of who I am. If anything, I really like how ads are actually relevant to me -- it's a lot better than the days of penis pill banners. I just want them to stop following me around so closely and putting my "anonymized" data at risk of exfiltration by hackers or governments.

Pretty sure the Brave Browser can do this if you change a couple of settings :)

uMatrix on firefox might be a good start

Heya. I work for Heap (mentioned in the article). If anyone has questions for me. Let me know.

Heap doesn’t sell or share data to third parties, we don’t do any cross site identifiers, or fingerprinting. We aren’t in the ad business, that’s Google and FB.

In other threads folks have said “but you can’t control what might be done with Heap data in the future” that’s right. I’m happy and pretty secure, and will fall on a sword if Heap ever becomes an unethical company.

(I’m commuting for the next hour but will get back to reply soon)

> In other threads folks have said “but you can’t control what might be done with Heap data in the future” that’s right. I’m happy and pretty secure, and will fall on a sword if Heap ever becomes an unethical company.

This doesn't help me one bit, though :/

What would?

At the very least, a credible precommitment that Heap won't transfer data to other companies in the event of a merger, and that if Heap goes out of business that data will be destroyed.

Here's what Heap's privacy policy says:

> We may share or transfer your information in connection with a prospective or actual sale, merger, transfer or other reorganization of all or parts of our business.

You're banking on Heap being an ethical company forever, yet your privacy policy basically gives you carte-blanch rights to sell my data to any other company in the event of a merger. Heap runs into tough times and Oracle/Google buys them out? Any privacy guarantee you make here is immediately out the window.

You're asking people here to trust you, while your privacy policy explicitly states in legal language that you're allowed to stab us in the back. If you're not planning to stab us in the back, then why is that language necessary?

This is a great idea. I’m going to try to find if any other companies have language like you describe.

I understand that you are assuming any acquisition will lead to some malicious or unethical intent, but I’m not so cynical, that said, it’s be nice to have some protections.

FYI: many other companies do have this exact language in their privacy policy and treat their data as an asset during acquisition negotiations.

not being tracked.

Or, specifically, my data not being stored in a place that I have no control over.

Makes sense. I think that individuals are much more empowered to block tracking on the web than in other places (usually). Banks and credit card companies are massive sellers of personal and behavioral data, and the FCC rolled back consumer protections which prevented ISPs from selling your data - which is much more broad than just what sites you visit (think applications, DNS, TV watching behavior). Add the pervasive CCTV, mobile phone location selling by carriers, private companies recording license plates and selling the data about your physical whereabouts and patterns, and I agree we have a problem, and it’s less about whether you clicked a call to action and more about the normalization of pervasive surveillance.

In the US we’re especially screwed because ISPs have largely blocked off competition that could offer privacy.

Most analytics tools that aren’t coupled with ad networks aren’t trying to get around ad blocking extensions, even though it’s pretty easy to do. Like, surprisingly easy. Unethical companies are doing that and more - see the aggressive stance Apple has taken with ITP. They’re not reacting to general cookie sharing, but to companies that are attacking the browser’s storage mechanisms to expose data from other sites.

Long story short, vote for privacy forward leaders if you live somewhere that allows you to.

Laws like GDPR and the CCPA are moves in the right direction. Heap is already compliant with both and I hope more protections continue to make their way to the public.

Edit: I know we are GDPR compliant and intend to be CCPA compliant but I’m not sure we are yet since it isn’t yet in effect.

The question the article raises is whether collecting user data, fingerprinting and cross-site tracking isn’t unethical in itself since it happens without the user’s express knowledge and consent

Can’t wait for ad tracking to be regulated. The rising awareness about how tracked we are suggest the Wild West days are coming to an end.

> Can’t wait for ad tracking to be regulated

Might want to pack a lunch.

You're talking about crippling the business model of two of the US' most profitable companies (Alphabet and Facebook), not to mention the lifeline of many digital startups whose business revolves around packaging and selling user data.

And the people who would be in charge of putting forward such legislation would presumably be the same who depend on this level of tracking to hold their congressional/Senate seats the next time an election rolle around.

It can't happen soon enough.

And the legislation will also be developing outside the US, notably in Europe and Asia.

Europe's digital legislation is controlled by the publishing lobby and the music industry (cf. Article 13). Publishers have zero interest in destroying their business model (ad-supported news sites).

Giants don't battle on behalf of dwarves. But they do at least battle amongst giants.

That's not ideal. It's a start.

Sadly, I can't see that happening anytime soon. The organizations who have a vested interest in all this are just too powerful and can afford to pay for better lobbyists.

EDIT: If anything I can see publishers and advertisers trying to lobby for DMCA style laws that will make it an offence to circumvent tracking and profiling. They can spin such techniques as just an attempt to generate revenue and to circumvent them is as bad as music or software piracy.

All it would take is an extension to the laws protecting video rental records.

With the right changes to how we talk about privacy, we could start treating it like a security issue. Make data sharing a flaw that needs to be corrected. Start to talk about ad-tech as malware and lump those that would work on these systems in with those that would write remote access trojans or ransomware.

Congress only did that to protect themselves. The data broker industry is too big and important in modern politics for them to take action against it.

It is regulated in the EU. In theory you should just be able to click "deny" on sites that track you.

Unfortunately nobody in the EU seems to be interested in punishing sites that bend the law by making "deny" difficult to click or even blocking the content if you click deny.

Data protection offices are interested, but investigations take a long time, esp. when your adversary is a fast-moving adtech giant like Google or Facebook.

Don't hold your breath. Ad tech is what keeps the internet free. You might be willing to pay for content to keep from getting tracked, but consider the billions of folks out there that can barely afford to get online in the first place.

It didn't have to be that way. I wish hosting was something that was completely free for creators, funded by a % of the visiting user's monthly bill from their ISP. If you get a lot of views, you therefore get a lot of resources. Everything scales and maybe you don't have to rely on stalking people online and exploiting psychology to make people loose money and/or health.

Fund publishing as a public good.

Publishing of what? Public funding of publishing isn't a complete non-starter, and can work very well in narrow cases, but there are serious problems involved in the state spinning up an infrastructure where it decides what speech in general is worthy of state funding. It's disheartening to see how common it is to see people throw out a vague "make the gov't do it" without acknowledging or grappling with the deep fundamental questions about how this would be implemented.

Publishing of everything generally public. "To publish" is literally "to make public": https://www.etymonline.com/word/publish

I'm aware this is an uphill battle. It may well be a hill I choose to die on.

For further thoughts / arguments:

Many of the arguments for Sci-Hub generalise to all information. This piece also specifically invokes the arguments of the CUNY Graduate Center and Joseph Stiglitz (Nobel laureate economist) on information as a public good:

"What the academic publishing industry calls "theft" the world calls "research": Why Sci-Hub is so popular" https://old.reddit.com/r/dredmorbius/comments/4p2rwk/what_th...


"Why Information Goods and Markets are a Poor Match" https://old.reddit.com/r/dredmorbius/comments/2vm2da/why_inf...

"The Medium Is the Message: how the technological and revenue environments shape content" https://old.reddit.com/r/dredmorbius/comments/278e2o/the_med...

"Forbes asks: Why do programmers hate advertising so much?" https://old.reddit.com/r/dredmorbius/comments/24107v/forbes_...

"A Modest Proposal: Universal Online Media Payment Syndication" https://old.reddit.com/r/dredmorbius/comments/1uotb3/a_modes...

"Specifying a Universal Online Media Payment Syndication System" https://old.reddit.com/r/dredmorbius/comments/2h0h81/specify...

"Richard Stallman's "Internet Sharing" content syndication proposal (2012)" https://old.reddit.com/r/dredmorbius/comments/3p0bp6/richard... https://stallman.org/articles/internet-sharing-license.en.ht...

A general problem of advertising, not otherwise addressed, is that it tends to produce shit content. Though this essay doesn't directly address that, it's very much a Tyranny of the Minimum Viable User dynamic: https://old.reddit.com/r/dredmorbius/comments/69wk8y/the_tyr...

Another is that advertising tends strongly toward oppressive rather than liberating informational regimes: https://old.reddit.com/r/dredmorbius/comments/6b32jo/what_ma...

And problems with other proposed payment alternatives, such as micropayments:

"Repudiation as the micropayments killer feature (Not)" https://old.reddit.com/r/dredmorbius/comments/4r683b/repudia...

A general bibliography on publishing and media:

"Media, Advertising, Sustainability, Externalities, and Impacts: A light reading list" https://old.reddit.com/r/dredmorbius/comments/7k7l4m/media_a...

TL;DR: I've been thinking about this for a while.

Mind: getting to public goods payment is going to be difficult. I don't deny that in the least. Partial approaches may well be a viable path there. Sci-Hub, ZLibrary, Library Genesis, the Internet Archive, libraries (public, offline, online), file-sharing, samizdat press, #pdfme, and other measures are appropriate.

And "how do authors/creators" get paid: UBI/GMI would be a good start. Performance/lectures are an option. Publishing-as-a-shingle (in the professional advertising sense) is an option. Patronage and grants are presently used and have a long and storied history. As discussed in the essays above, both technology and business model effect the forms and types of works created. Advertising has been tried and found wanting.

What content do you feel like we will miss out on? Genuine question. Clearly, Wikipedia would keep operating. Universities would continue to operate web presence, but their highest quality content (academic papers) is already paywalled. As for news, there's a spectrum. There is news about global events, and then there is news about entertainment. Entertainment, itself, (Netflix, Hulu) is already paywalled.

The "awareness" is for a tiny fraction of people. Except in HN & some other super paranoid groups, most people doesn't really mind. My grandma is not worried about 20 trackers and leaked db as long as she can post her recipe article. :) My irl friends are more interested about "exposure" because it is "age of (social media) influencers/celebrities". The concept of tracker, adblock are alien to most folks. Those who knows a bit gave up because they reason that it is now integrated into the system and there is no escape, so why even try? :/

Yeah, you'll have to agree to one another popup. Probably I'll start looking soon for autogdpr+autocookie extension, I got so tired from those popups (especially when they have nothing with my country laws).

You'd hope so, but I'm not quite so optimistic - given that the business model of the internet seems to rely entirely on ads, and a notion that the better you're understood, the better you can be targeted. The thing which really concerns me is the fingerprinting techniques, which really cannot be avoided, even if you are technically literate and privacy concerned.

So ironic that the NYTimes, where this article is published, is itself one of the worst offenders. The article clearly states this as well:

> Among all the sites I visited, news sites, including The New York Times and The Washington Post, had the most tracking resources.

So hat-tip for the self-awareness. Now how about "sweeping your doorstep" ?

This comes up every time there's a NYT article. However, note that the management and the journalists are different people and the fact that th website of the NYT itself is reported as having trackers is a proof of this.

> management and the journalists are different people

That's clear and not being disputed. When people mention workers calling out management, it's sort of in a praising way. They're commending management for at least allowing workers to call them out.

Many institutions would not allow that to happen, so it's good to point out the Times when they do.

And yet nothing changes, so to feign self awareness is the height of theater.

They do it in the open. They say they do it. They tell you they know it's bad, this thing they do. And they do it anyway.

It doesn't make them better than others to pay lip service and not change. If anything, they lead by example and it makes them worse for having done so.

Shop lifting is a crime, but hey, everyone does it, so watch me pocket this candy bar. And some beer. And a TV. And maybe I'll just grab some money from the cash register. And yeah, I think I'll steal the assistant manager's car to get away. See? This is just how the world works!

Pray tell me how exactly should a news publisher survive without targeted advertising. Public should pay for stuff or accept targeted advertising

Subscriptions did not help save the 1,800 local newspapers that have shut down in the US since 2004.


You mean like this?


I think it's more important to say that in most newspapers there is a firewall between journalists and the marketing department. So marketing decisions are, by design, completely separate from the journalism.

Upper management can interfere but usually this is done via the editor and usually only when they really don't want a story to run (e.g. they think it'll bring too much heat or it accuses one of their friends of paedophilia).

heinrichhartman-- I was next in line to make the top comment that NYT has ad trackers. You weren't supposed to go until mid-September.

This is really frustrating because I've had my comment ready to go for weeks.

Similar thing with their article a while back on how privacy policies are too long and written at too high a reading level for most people to understand. They included their own privacy policy in their chart, in an unflattering location.

NYT published a story/video recently featuring interviews with children and parents whose parents have overshared pictures of them on social media. I couldn't finish it because of the irony.

This article may not have been written with NYT in mind as the publication that would eventually run it.

The times and post are nothing compared to the number of trackers on the McClatchy papers.

McClatchy is a dumpster fire of a newspaper company.

Does NYT still include trackers if you subscribe?

Yes, and ads. I don’t feel bad blocking them because I subscribe.

Sure does

Are there any reasonable, self-hosted alternatives that can provide me with information on how visitors use my website without submitting them to cross-site tracking?

Check out Matomo [1] (previously known as Piwik) or goaccess[2]. Matomo offer a web dashboard, while goaccess is terminal-based.

[1] https://matomo.org/

[2] https://goaccess.io/

Actually, GoAccess has web interface as well.

Matomo (previously Piwik) is a decent alternative

I particularly like Matomo as it offers the option to honor the browser DNT (Do Not Track) preference.

Curious: Is that what you use for mailhardener? How much does it cost to keep it running?

Yes, Mailhardener uses Matomo to track basic statistics on our website. Mostly just visitor counts and mapping where traffic comes from.

In terms of costs I'd say it's neglectable, it runs on a container in our existing infrastructure. Required resources and maintenance are minimal.

As a business owner, it bugs me to have Google and Facebook trackers in my code. But I feel like I need to buy ads, and then I also need to use these trackers to attribute purchases to ad installs. So using some sort of self-hosted tracker just doesn't seem like an option.

Is there an answer to this dilemma that doesn't involve foregoing ad purchases, which seem pretty important to growth and revenue?

> I also need to use these trackers to attribute purchases to ad installs

Why? What would happen if you didn't have that data?

An aspect of the many "trackers" on the sites, etc is that much of the tracking isn't about tracking you (the person browsing around) it's about tracking the company doing the ads.

Wrapping back around to your question: "What would happen if you didn't have that data?" The answer is that Facebook/Google/Any Given ad service could just make up numbers about how ads performed.

The attribution data gives you two hard data points:

1. How much you paid for the ads 2. How many sales/leads you received

Everything else like views, clicks, video views expanded with sound on in Guatemala, etc. are all prone to manipulation and mis-reporting.

I find it ironic that NYT objects to my reading articles in private mode, but publishes pro-privacy articles.

Silly Question: If online advertising is a near duopoly, why are there so many trackers?

Ad analytics and optimization goes far beyond Facebook and Google, although most of the resulting ads are run on their ad networks. Numerous established companies and startups in the space.

EFF's Privacy Badger and Firefox's (experimental) first-party isolation go _some_ way to mitigating this bad behaviour. But taking special measures shouldn't be necessary.

Question - how do they compare to "Cookie Autodelete" extension?

I recently reformatted, switched to Firefox, and installed Multi-Containers, Auto-Delete, and uBlock Origins; open to suggestions on what other robust, stable, mainstreams extensions I should try :)


You'll also want an addon to clear "local storage" and "session storage", which is basically a shadow database where trackers can store identifiers.

Thanks! Any particular ones to recommend? "Cookie AutoDelete" extension does have a "LocalStorage Cleanup" option, but it looks to be somewhat experimental and temperamental...

Decentraleyes to defeat CDN tracking. Random user agent to make fingerprinting harder.

uBlock Origin goes a long way to mitigating such behaviours: https://github.com/gorhill/uBlock/

But I agree that this level of privacy invasion should be illegal.

I like the Brave Browser. It's based on chrome but it allows you to control which 3rd party scripts you want to load and other privacy related stuff.

I think if regular people start to care about this the next step will be pi-hole or some other filtered dns in the router. It makes a massive difference. Sites load in half the time, use half the CPU and some sites like anandtech, techradar, and just about any other news site trying to get a wide audience actually becomes usable.

The flip side is something like trying to sign up for an Instagram account when you have a pi-hole running: completely broken experience, a cryptic "Something went wrong, try again later." message followed by an insta-ban for the account/email, with no indication that is was because you're blocking ads/trackers, nor any indication anywhere that you should not try to sign up whilst running any blockers.

The pi-hole is too aggressive when it blocks stuff. For example, on the Fandango mobile app, the payment gateway is blocked by the pi-hole. A user should use the white list functionality to add the domain or two to get back the site's functionality.

I have had that happen a lot when using ad blockers, but not as a result of pi-hole.

That's great whilst you're at home but as soon as you go out and switch to your mobile network you're back to being tracked again surely.

Is there any easy way around that?

Thanks for the replies all. I do appreciate the time taken out of busy lives. However, I did specify "easy" and as soon as you start talking about creating your own VPN and the like you've lost about 99% of the people out there who are perfectly intelligent in there own right but when it comes to computers they barely know what a sub-directory is.

Hell, I've been a software developer for years, and once built my own PC, but I balk at the idea of building a pi-hole. For a lot of people you may add well ask them to change the engine in their car.

Until there is a plug and play solution to privacy I do believe it belongs to the digital 1%.

Create your own VPN and install the DNS blocker there. [Algo VPN](https://github.com/trailofbits/algo) is a great start. Furthermore, you could repave the VM daily or in any time interval. Using WireGuard as VPN protocol as opposed to OpenVPN and others will increase your browsing speed to the point of making it feel you are not even connected anywhere.

Not every piece of the puzzle is going to be a silver bullet. You can still run ad blocking extensions in web browsers.

Have a look at dnsadblock.com. The mobile app is not so great but I'm getting there.

There should be a plug and play mobile app for that?

So? What’s the problem?

If I go in public someone might photograph me or see me. If I visit a website, it might log information about my visit (to show me ads).

Sounds fine to me. I like reading free news articles, paid for with ads. If I don’t like it I can stop reading them.

A single photograph of you in public is usually not a problem. But if I were to follow you around and take a picture every 10 seconds as you go through your day (including at home and in the bathroom), would you like that?

Advertisers don't follow us around and take photos of us, so that's not something we need to worry about. But CCTVs do have photos of us being taken every 10 seconds in public in major cities. It's not an issue. Nobody is publishing CCTV photos of me. Similarly, advertisers aren't telling your spouse what websites you visited. They just want to show you ads that are relevant to your interests.

My point is that the expectation of privacy is unreasonable in every situation. For example, it's unreasonable in public or when doing business with others.

It's only unreasonable insofar as "they totally could do that". Whether they do do that is a matter of regulation.

How ever ad locking and do not track extensions does not block ip tracking. Say that you can read browser window size + ip that is unique enough to track you.

It's much worse than this, this doesn't even discuss tracking by apps on mobile devices, which is an increasing percentage of overall Internet use.

It really is crazy. For example, CDW.com has so many tracking websites whitelisted in its header that our firewall blocks it:

    Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval' *.cdw.com *.richrelevance.com *.bazaarvoice.com *.qualtrics.com *.optimizely.com *.hotjar.com cdw.needle.com nexus.ensighten.com api.bluecore.com bluecore.com px.spiceworks.com *.liadm.com scripts.demandbase.com triggeredmail.appspot.com connect.facebook.net d31y97ze264gaa.cloudfront.net *.bounceexchange.com www.googleadservices.com *.doubleclick.net *.google-analytics.com st1.dialogtech.com bat.bing.com *.googleapis.com nsg.symantec.com analytics.po.st px.ads.linkedin.com po.st *.cnetcontent.com selectors.cnetcontentsolutions.com *.akamaihd.net *.google.com *.twitter.com *.justuno.com *.liveclicker.net www.netapp.com dpm.demdex.net *.d41.co *.cxense.com static.ads-twitter.com vault.pactsafe.io pactsafe.io *.webcollage.net *.ziftsolutions.com *.simpli.fi pixel.mathtag.com *.googletagmanager.com *.googlesyndication.com googletagservices.com t.sellpoints.com a.sellpoint.net media.flixfacts.com www.youtube.com media.flixcar.com *.flix360.com *.easy2.com *.go-mpulse.net *.cdnwidget.com *.rlcdn.com *.flixsyndication.net *.adobe.com *.hotjar.io *.eloqua.com *.swogo.net *.swogo.com *.nanovisor.io *.btttag.com *.gstatic.com; style-src 'self' 'unsafe-inline' *.cdw.com *.bazaarvoice.com cdw.needle.com *.cnetcontent.com *.justuno.com *.webcollage.net *.ziftsolutions.com t.sellpoints.com a.sellpoint.net media.flixcar.com *.easy2.com *.amazonaws.com platform.twitter.com *.typekit.net *.adobe.com *.nanovisor.io *.btttag.com; img-src 'self' *.cdw.com *.bazaarvoice.com *.qualtrics.com cdw.needle.com nexus.ensighten.com px.spiceworks.com *.liadm.com *.bounceexchange.com www.googleadservices.com *.doubleclick.net *.google-analytics.com bat.bing.com nsg.symantec.com *.cnetcontent.com selectors.cnetcontentsolutions.com *.akamaihd.net *.google.com *.justuno.com www.netapp.com dpm.demdex.net *.cxense.com vault.pactsafe.io pactsafe.io *.webcollage.net *.ziftsolutions.com *.googletagmanager.com t.sellpoints.com a.sellpoint.net media.flixfacts.com media.flixcar.com *.flix360.com *.easy2.com *.amazonaws.com platform.twitter.com *.linkedin.com *.tribalfusion.com *.company-target.com www.facebook.com events.bouncex.net *.cdnwidget.com *.rlcdn.com *.cloudfront.net *.adobecqms.net *.turn.com st2.dialogtech.com secure.insightexpressai.com px.gumgum.com *.bluekai.com k.intellitxt.com *.everesttech.net *.adnxs.com sync.fastclick.net simage2.pubmatic.com us-u.openx.net ads.yahoo.com pixel.rubiconproject.com *.advertising.com magnetic.t.domdex.com *.rfihub.com *.mathtag.com *.mathtag.co *.amgdgt.com *.casalemedia.com www.bluecore.com *.prod.bidr.io cdn.optimizely.com syndication.twitter.com x.bidswitch.net pe.intentiq.com loadm.exelator.com insight.adsrvr.org um.simpli.fi acuityplatform.com data: *.dotomi.com *.flixsyndication.net liveintent.com cbssports.com maxpreps.com wogo ce.lijit.com soma.smaato.net cs.admanmedia.com eb2.3lift.com live.sekindo.com *.adobe.com *.sc.omtrdc.net df7xs8p1yjitw.cloudfront.net *.core.windows.net *.nanovisor.io *.btttag.com; frame-src 'self' *.cdw.com *.bazaarvoice.com *.qualtrics.com *.hotjar.com *.liadm.com *.bounceexchange.com *.doubleclick.net nsg.symantec.com selectors.cnetcontentsolutions.com *.google.com *.twitter.com *.liveclicker.net *.cxense.com *.webcollage.net *.ziftsolutions.com pixel.mathtag.com *.googletagmanager.com googletagservices.com a.sellpoint.net www.youtube.com media.flixcar.com *.easy2.com www.facebook.com *.rlcdn.com rs.gwallet.com *.liveclicker.com pages.cdwemail.com www.emjcd.com *.dotomi.com *.flixsyndication.net cdw.zuberance.com *.hotjar.io *.eloqua.com *.swcontentsyndication.com www.cisco.com *.nanovisor.io *.btttag.com; font-src 'self' 'unsafe-inline' *.cdw.com cdw.needle.com *.googleapis.com *.cnetcontent.com *.webcollage.net a.sellpoint.net media.flixfacts.com media.flixcar.com *.easy2.com *.flixsyndication.net *.typekit.net *.adobe.com *.nanovisor.io *.btttag.com; connect-src 'self' *.cdw.com *.richrelevance.com *.bazaarvoice.com *.qualtrics.com *.optimizely.com *.hotjar.com cdw.needle.com nexus.ensighten.com api.bluecore.com px.spiceworks.com *.liadm.com scripts.demandbase.com triggeredmail.appspot.com d31y97ze264gaa.cloudfront.net *.bounceexchange.com www.googleadservices.com *.doubleclick.net bat.bing.com *.googleapis.com nsg.symantec.com *.cnetcontent.com *.akamaihd.net *.google.com *.justuno.com www.netapp.com *.d41.co vault.pactsafe.io pactsafe.io t.sellpoints.com a.sellpoint.net *.go-mpulse.net platform.twitter.com *.company-target.com www.facebook.com events.bouncex.net *.cdnwidget.com wss://*.hotjar.com p.po.st *.cdnbasket.net *.akstat.io data.g2.com data.g2crowd.com *.adobe.com *.hotjar.io *.swogo.net *.swogo.com *.nanovisor.io *.btttag.com; object-src 'self' a.sellpoint.net *.nanovisor.io *.btttag.com; worker-src 'self' blob: *.nanovisor.io *.btttag.com; media-src 'self' *.cdw.com *.cnetcontent.com *.webcollage.net media.flixfacts.com www.youtube.com blob: *.flixsyndication.net *.nanovisor.io *.btttag.com;

Coming to you from my new install of the Tor browser...

How much does it reduce tracking? Or at least make it not useful to the tracking firms?

Depends on what you're doing. On some websites you actually stick out (all tor exit nodes are public and easily identifiable).

Depending on how closely you stick to the recommend configuration (default window size and such) it'll at least minimize some of the tracking capability of most sites.

The best methods to prevent tracking IMO are ublock origin + decentraleyes + HTTPSeverywhere (Http is leaky) in incognito mode, periodically you should destroy the session (restart your browser).

Tor really only helps with the IP part, the rest are extensions + incognito mode.

Google is going to captcha you to death as punishment.

I just had a look at the article, Privacy Badger told me there was 14 trackers on the page.

yeah that's covered in the article.

I fully expect future historians to call this era the Surveillance Economy.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact