Hacker News new | past | comments | ask | show | jobs | submit login
The High Privacy Cost of a “Free” Website (themarkup.org)
281 points by kiyanwang 27 days ago | hide | past | favorite | 152 comments

Is the problem actually "free?"

I mean, if Disqus or facebook had been offering these as paid services, would we expect them to be respectful of privacy? When buy an android phone, amazon echo or whatnot... When we order stuff on amazon, subscribe to Spotify, download a paid app or even purchase a car... do these come with the expectation of privacy?

Increasingly, data gathering is just built into everything, free, paid or foisted. I think the "if you aren't paying, you're the product" trope is trite at this point. We're "the product" regardless.

Framing this in terms of "the problem with free" is off, IMO. It doesn't point to a solution. A paid facebook or google is neither realistic nor likely to solve anything. All it does is point the finger in the wrong direction, as if we, the ignorant masses, are selling ourselves willingly.

Corporate (and state) espionage is simply the default. Avoiding it takes effort, and compromise. Part of that compromise may be financial, or it may not be.

> Is the problem actually "free?"

It's not. Paying to avoid advertising (which is the precursor to the modern "paying to avoid total surveillance") has been a thought-terminating marketing cliché since the days that "pay TV" was being introduced in the 60s. At this point, as you say, it just serves to blame the surveilled for the surveillance.

People using a free service make a lower-quality advertising target than people using a paid service, and people using a less-expensive service are a lower-quality advertising target than people using a more expensive service.

Blacklight even turns up trackers on ftc.gov.

Think about that. If you go to ftc.gov to file a complaint about Google, Google will have a log of that interaction.

I trust that Google is honorable and does not abuse this privileged position, but it is unsettling to realize just how involved FAAMG has become in daily life.

Except pay TV still comes with ads.

Even Amazon prime plays stupid ads. Paid DVDs come with unskippable crap. Pay to remove ads never really worked.

It's funny how you get ads when you pay for movies and TV shows, but you don't if you obtain them for free.

Not with old school piracy (just getting a video file), but on pirate streams, you do get ads (for porn).

I appreciate this. The point, succinctly: This is a manufactured problem for a manufactured solution that hasn't evolved utility.

> Framing this in terms of "the problem with free" is off, IMO.

Not necessarily, I would say it's just an outdated concept, since an expectation of privacy has been corrupted by marketing departments everywhere.

It _used_ to be the case that paying for something meant there was an expectation of privacy, but as you so eloquently put it, "the ignorant masses, are selling ourselves willingly", and these companies have realized that they can sell a product _and_ their users' info as a bonus.

Yes, and this makes it even worse. I recently bought a Sonos. Deliberately chose the one without mic and Alexa. But Sonos are tracking the hell out of everything. Usage tracking on by default, and even their non opt-out-able “functional” tracking is just riddiculous. And that’s in a country covered by the GDPR and ePrivacy directive.

...so I’m clued enough to notice, care and able to block most of it. But my mum and so many others sadly aren’t.

how do you block most of it?

In general at home I’m using Nextdns across all my devices including the router. uBlock origin extension on my browsers. The Sonos specifically is now blocked on my router...

the problem with some devices however is that I might actually want them to connect to the internet, but I have little control about what payloads they might be sending and where.

I use pihole to block sonos, tv etc phoning home but while still allowing them to play music or videos from spotify, netflix etc. it works pretty well

Use incognito mode in browsers. It clears cookies, trackers can't work without persistent cookies. I use it by default and only inconvenience is logging in more often, but its just two more clicks. And a good adblocker too.

If you're on any UNIX, grab a RAW hosts file off


and put it in /etc/

I use Unified. On FreeBSD and Linux, the OS handles the hosts file really fast, faster than ublock origin. Windows crawls to a halt with a large hosts file for some dumb reason.

..if Disqus or facebook had been offering these as paid services, would we expect them to be respectful of privacy

I would, but it almost certainly wouldn't happen, knowing the way they work. This is one reason I won't sign up for allegedly privacy-centric iOS apps that ask you to create an account with the vendor/author.

But when you buy an Android Phone or an Amzazon Echo you are buying the hardware, that's what you are paying with your money. What you are paying with your privacy is the software or the service that is running in the cloud.

Arguably, payment systems themselves (credit cards) are the richest, most deterministic, and among the largest data siphons available.

Back in the late 1990s a friend working for the leading back card reported a conversation with another worker with defence experience. Defence's was talking of how an IC associate expressed jealousy over the size of the bank card's data trove: "you've got more information than we do".

I'm unsure how reliable this is, or was, but the conversation was at least a casual acknowledgment of the possibility at the organisation.

Free makes any business model other than surveillance capitalism effectively impossible. Direct paid revenue is a necessary but not sufficient condition for an honest transparent model of user interaction that respects privacy.

Facebook with a subscription fee could still have been shady, but a world where Facebook charges a subscription fee is a world where an alternative might have a chance to finance itself as the direct revenue potential of a social network would not be $0.

This is why I think it is trite to go on saying "if you're not paying, you are the product."

On paper, sure. You are right. Free makes any business model other than surveillance capitalism less viable (though I definitely would not say impossible).

In reality, "surveillance capitalism" is the most profitable regardless. So, even if we pay, we still get surveillance. The trope just lets the finger be pointed at the public, instead of at whoever is doing the surveillance. Even if you could get most people to prefer paid to free (you can't), it wouldn't do anything to mitigate surveillance in reality.

Just because paying enables nonsurveillance in theory, doesn't mean that it will actually result in non-surveillance in reality. In fact, it probably won't.

Sadly, you are correct. Increasingly, even if I pay good money, the expectation is that I forego any kind of privacy. An annoying recent example, I was debating buying AR-15, but website recommended by a friend wanted me to drop adblocker for 'fraud prevention' purposes. Guns are not a cheap hobby; I already pay a premium and deal with a lot of bureaucratic idiocy on every level and now this. Next step is probably required FB login with integration allowing you to blast everyone with the fact that you purchased $something.

It gets more annoying when a random store clerk is unable to ring me up when I won't give her my phone number.

Like you said, it take a lot of effort just to avoid it. I don't know what it would take to actually nullify its impact on me.

A lot of this comes about, not because of "free," but because of that ol' "solved problem" canard.

As I have stated before, if I had a dollar for every time I've been told "XXX is a solved problem," I'd be rich. Since this phrase is usually bundled into a package, denigrating my own choice to "roll my own," in some effort, I get just a wee bit peeved, when I hear it. I know that I can come across as a cranky bastard that don't trust anyone (possibly because I am), but that doesn't make me wrong. Sometimes, it does mean that I ignore some very good solutions, because I can't bring myself to trust them.

Finding a dependency that does most of what we want, isn't hard. Vetting that dependency; especially in regards to embedded dependencies, is not as easy. This can be made more difficult by downstream dependencies burying ToS statements by upstream dependencies; requiring anyone that includes the dependencies to recurse the chain, studying each ToS.

Nowadays, data is currency. Every application seems to be some form of PID miner. This is why I get a solitaire app that requires me to sign up for an account.

I think there's a lot of legitimately well-done, truly free stuff. A lot of it isn't popular or flashy. Maybe there's an opportunity for someone to create a dependency index that rates things like privacy of dependencies, and unwinds dependency hierarchies, sort of like GitHub does with security issues.

Totally... I really had to defend my decision to script my own backups to a friend of mine. He said 'why didn't you just use duplicati like everyone else'.

But duplicati is very big and doesn't do exactly what I want to do (which is not backup as such but more like archive). Also this way I had everything in standard formats that I know I can still open in 30 years if I want to. I've been bitten by commercial backup software formats before, had to install a Windows 95 VM to load really old backups :)

I also wrote many mini applications in work that have been used for many years at no cost, eventually being replaced by off the shelf stuff which did much more than we need and as such was much more complicated for the users.

The IT world is really too focused on off the shelf products right now. Homebrew can be a really efficient way too.

> Homebrew can be a really efficient way too.

What's funny, is that a popular package manager for Apple systems is called "Homebrew"[0].

I am a dependency skeptic, but not a full-on "curmudgeon" (although I am often accused of being one).

If you can prove that a dependency is the way to go, and I am satisfied with its provenance and support, I'll probably let it in.

One of the big problems with a lot of these efforts, is that the documentation is atrocious. It often has a fancy-ass, big-header, scroll-forever Web site, with God-awful information architecture, no ToC or index, and lots of fancy CSS (like video backgrounds -always a win for a documentation page). There's the obligatory "step and repeat" page, with all the logos of every company that has ever sent them an email, and cute, tongue-in-cheek bios of the team, but it can be difficult to find out how to call a function properly.

[0] https://brew.sh

> One of the websites doing this, SunTrust Bank, sent the user name and password we entered to a third party, Jornaya, which says it encrypts and discards the data it collects.

Wow. Deep down the page is a nugget about suntrust just giving away your username and password. Big reminder to use unique passwords of every site.

Was there a TOS or service agreement sent to the customer?

Third party marketing data agreements, especially with anything dealing with money, are a usually bulletproof and opt-in.

Sure. They’re not going to sell your username and password. But they’re a marketing company, and my experience with marketing companies is that they view IT as a cost center (despite marketing companies being mostly IT driven these days), and don’t spend a lot of time, effort, or energy on security. When they have a breach, it’s going to cost you more than just privacy.

Creator of the spartapride.org website here! Crazy to see my name pop up in a hackernews post!

This was actually a case of being too tech savy. When I setup spartapride I had all my tracker blockers on and because of that disqus was not loading (hence the like 30 more trackers not loading)

Needless to say the nice people at themarkup contacted me and we had a nice interview/talk.

I allowed the discus tracker through and saw the other 30 being loaded and was just like, "Yep.. that's not worth it" and I ended up removing discus from the site. :)

Lesson learned.

How do you justify including Facebook and Twitter though? Especially Facebook is known for abuse of their data and collects much more info than they say.

Considering the sensitive nature of your topic and audience I personally wouldn't include them just for the sake of a like button :) If I were a transsexual in a highly homophobic organisation, even seeing a facebook like button on a website covering this topic would scare me off.

But anyway at least the like buttons make it clear to your users that you do it. Much better than using transparent pixels and hidden cookies etc.

Indeed. Privacy focus and Facebook cookies are fundamentally opposite ideas. Her claims don't seem to make sense at all. "I don't track users [except sending your data straight to Facebook]" -- what does that even mean? It's a self contradictory position.

Hello, on my actual account now that I'm on my computer. The answer to that is mostly I was overruled and leadership wanted to have a twitter and facebook feed on our website.

Hands are tied on those two things.

I saw a wrapper around these buttons that loads them grayed out (just placeholder images). Only when you interact with them, they are actually loaded.

This way, only the users who actually want to use the buttons are tracked.

I understand and I've been in an identical position, but it means you can't make those sorts of claims.

I just used blacklight with my own site and found a perfect score.

My site costs me essentially nothing* to host (netlify and aws serverless technologies that are mostly under the free tier).

*My highest cost so far was when I was debugging serverless websockets and had a bug in my code that caused constant messages between the browser clients I was testing (which I left open for a day when I started work). That cost me $7 dollars.

I have my own service-hosted playground using little more than git and a few cli tools.

We need to rebuild the ad-free web.

I suspected (perhaps like you) that this article was going to be about free hosting providers (Netlify and the likes). When I saw it was Disqus/Facebook/Twitter I wasn't too surprised. I also host a lot of projects "for free" (TM) on Netlify, Firebase, etc. and don't tend to include any 3rd party scripts.

It makes me wonder if some form of data (anonymous or not) scraped from folks like Netlify is slated to be sold off to advertisers or SaaS products looking to find customers. As they do things like process your HTML they could pull out textual content looking for signals.

Yeah I was wondering if this was about maybe Cloudflare or Vercel getting lots of user / usage analytics off of freely hosted sites... doesn't seem like that's what they're doing, but who knows?

Now where did I put my tin foil hat...

Cloudflare freely admits they use traffic data across clients for DDOS analysis. Which seems like a good tradeoff to me. Not even really a tradeoff.

I have the feeling that you're fluent in web technologies.

Even Terry Tao uses wordpress.com. Life is short, and not everyone enjoys building their own houses or tuning their own cars.

wordpress.com is often used by internet marketers because it has loads of tools for that and content management.

But that doesn't mean it has to include all the tracking plugins.

Also, there are plenty of site builders like wix.com, squarespace, etc. that can launch a site in minutes.

Using those tools doesn't necessarily imply ads or tracking. (But I would agree it would be impossible to know for most uses if it did have tracking in the background.)

And yes, I wrote my first web page in the 90s, but that doesn't mean we don't have nicer tools that anyone can use now. (Wow, I'm trying to remember how I figured that out as a teenager. I think my dial-up isp had a tool on their website and instructions for creating a path and uploading your own web files.)

Just a heads up: sites built with Wix and Squarespace come with tracking by default, even if you don't use it, and it's impossible to disable.

> We need to rebuild the ad-free web.

Agree but how do you pay for? Will it be a government owned, taxpayer funded free internet?

The hosts should pay.

If one has something to say one can host it. Hosting a small website yourself cost peanuts and simple websites should cover most non commercial use.

The walled gardens are killing the independent small scale websites. Facebook and Reddit essentially are eating every community site and those doesn't make much admoney anyway.


I’ll pay to host my own content - but if it’s just static content - that’s essentially zero.

Also, I’ll make tools to make it easy for anyone to create their own self-hosted website.

No ads needed.

And if I make a product or service of value, I’ll sell it - without ads.

Word of mouth recommendations are great. Somebody can make a review or a blog post about it because they like it.

Others can find it when they search for it - because they need it.

Think about it - how many things/services do you buy because you saw an ad versus because you had a need and sought out the best product that could fulfill it.

In fact, I think it would be awesome if a new search engine could step up - ad free web search (excludes any website that uses ads).

The big problem I believe is discoverability. People use Google's algorithmically culled search which favors big sites and SEO spam. Earlier there was a logistic with web rings and link-lists. They are pretty much gone. I mean since the phone book where I live is not printed anymore I can't look up what companies operate plumbing or pizza in my area because there are no such lists anymore. I can find a few on Google. Google pretty much put the information collect and arrange business out of business and didn't replace it.

"The information age" should really be called "the age of colored noise".

I don't think the web-rings ever worked well, but maybe there exist a search engine for the little guy:

- Adfree Only (No Ads or Tracking Allowed on the domain)

millionshort.com - I don't see that as an option.

wiby.me - looks like this is a 1990s website generator

It's a bit of a tangent, but you might be interested in this solar-powered website if you haven't seen it before:


Amazing how blacklight gives it a perfect score when, in reality, IP addresses, domains, SSL certificates and other traffic metadata are clearly visible to AWS.

Not only the data can be subpoenaed, but it's also being intercepted by the usual 3-letter agencies.

If you care about user privacy find a smaller hosting in a country with good data protection.

Even better, host the site in somebody's home in that country.

And keep in mind that this is still not enough.

Good enough for what?

My site is just a blog with a bunch of prototypes and games.

My goal is an ad-free Internet.

If the client is worried about surveillance then it would be up to them to use an appropriate vpn - but if a three letter agency cares about who reads my blog, I‘ll probably have a lot more problems then a visitor.

you can serve ads without mining users data. I am old enough to remember an Internet were ads were relevant to the content.

I remember an internet without ads. We had alot of "in construction" gifs though :)

And marquees. How I hated those!

I'm just going to leave this here:


> website operators are often effectively as blind to exactly what information advertising companies and marketers are collecting from their website visitors—and what they’re doing with the data—as the people browsing the internet are

This is part of why I've taken to blocking things at the network level (with pihole ATM). Ignorance does not absolve them of blame, but I can't expect every site to know (or even care) about the issue so I have to take measures myself (or decide not to care).

This is the biggest problem I see with the Wordpress ecosystem. While WP is a fantastic way to build a site, the plugins you get can do all the things that the article describes exactly that (via the Disqus plugin).

I build an affiliate marketing site a few months ago and when I started, I decided to build the thing from scratch rather than using Wordpress. Some people gave me a hard time about this because I was wasting time I could be spending writing for the site but whenever I see an article like this I remember why I put in all the extra work. I know exactly what my site is doing when, exactly what information is being send to the user and what info is being posted from the user.

I'm in two minds on what to use for my own portfolio site when I finally resurrect it. I've pretty much decided on a static site generator over a dynamic CMS but I quite fancy writing my own so it works just as I want it. Trouble is I keep rethinking that "just as I want it" is... I may just hand-craft version 1 to put an end to that particular source of procrastination.

But yes, one of the significant main reasons for avoiding WP and its ilk is the allure of plugins and the work needed to verify what they are up to. If I'm going to have to make my own parts to avoid the things I don't want, I might as well create from scratch anyway.

Agree re plugins, and it’s not just the work to verify them once, they change over time. Their motives change or are revealed (once enough customers are locked in they switch to evil mode), or they are acquired by profit-hungry capitalists devoid of ethics, or simply abandoned and then compromised etc.

You can look at every plugin source to see what it does. Back when I used to do a decent amount of WP stuff I found a bunch that had security holes and hidden tracking.

Most of the big plugins are trustworthy. Just make sure you read their privacy policy.

Perhaps there is an opportunity to create a way to self-host static websites via people's own smartphones. Bandwidth costs for most sites must be tiny.

The business models of website/page-builder platforms are based on gathering a long tail of small businesses, and then selling the audience attention via ad networks and data mining.

Looking at the ecosystem from a distance, it's creating financial value, but there are technical means to provide the same actual business value in cheaper and more privacy-preserving ways.

(By "actual business value" I mean that a person can view the menu at a local cafe in their area -- a valuable interaction for both parties -- without that lookup being intermediated by a platform and N different third parties where a kind of "shadow market value" is extracted)

Bandwidth costs are tiny, but phones use radios and radios cost lots of energy to handle real TCP connections that must be maintained. It'd drain the battery really fast. Plus it's a very rare telco indeed that gives your phone a real routable ipv4 address. Most are behind carrier network address translation.

But the spirit is right. Just do it on a real computer at home on a landline. A static website is easy, incredibly safe (compared to, say, browsing with JS turned on), and useful.

That's not going to work because having a server on 24/7 is going to kill battery life. Not to mention issues around NAT and hole punching.

You could have peer-to-peer hosting - you give up some space on your device to host, and if your device is offline another device has a copy of it that they serve. Of course, there would need to be some sort of regulation to ensure people weren't using it to distribute illegal material.

We could use second to none middle-out compression and call it Pipernet. [0]

[0] http://www.piedpiper.com/internet-heritage/

You'll almost certainly need some sort of cryptocurrency to adequately compensate your peers. Otherwise it's not fair for you to host someone else's 10GB movies for free while you're only publishing a 1MB static site. But if we get to that point why not just pay someone else to do it, since there will be a competitive marketplace for hosts?

Fairness is overrated. 8TB drives are currently around $150, which prices 10GB of storage under 19 cents. Bandwidth costs would dominate if one peer had to upload the same material thousands of times, but in a P2P system the average peer uploads each thing once, and then we're back to pennies. It's an amount of money smaller than its own collection cost. Do I really care if I'm paying for $0.75/year in local computing resources and only using $0.03/year myself? It's not even worth negotiating over and would generally have still existed and just been idle in the alternative.

Moreover, even if everybody is a scrooge who can't just eat a negligible annual cost without worrying about it, you don't need everybody. You just need a few extra nodes to provide redundancy for you. Which could be your sister and your best friend and your father in law rather than some strangers you have to pay in a way that requires you to file weird tax paperwork.

The fundamental problem is that within a network, resource (storage, bandwidth) is scarce. There might be altruists that are willing to donate resource to the network, but there's a finite amount of them. Using a market to solves two problems: it deters abuse (eg. someone using the network as their personal backup service) and incentivizes non-altruistic people to contribute resources to the network.

I don't know if a peer-to-peer resource trading market and currency are necessary here; the device owner with spare capacity and the business proprietor may be one and the same person, which would simplify the situation a great deal.

You have a cafe/small business owner who already owns a smartphone: that device has a decent chunk of data plan allowance, CPU and memory.

They'd like a small site to promote their business online; so they install the web server app and it allows them to add, edit and publish basic content - a few images, some blog posts, etc.

Self-hosting like this means they don't have to sign up for a monthly subscription, their users aren't subject to any tracking (assuming there's none embedded in the content generated by the app), and service provision scales until they need to upgrade to a dedicated host.

The questions about radio and battery usage are totally valid, so perhaps those need more consideration (user superkuh has some suggestions there).

In short: the resources and capacity already exist and we've already paid for them. The question is whether we can re-use those rather than being sold more of the same.

Using pricing to mitigate abuse is only necessary if you're not doing anything else to mitigate it.

Imagine a collection of independent storage pools managed by various entities that control access, e.g. the IT guy of your family, your employer, some free software collective, some small business association. You join one (or more than one) based on a referral or personal relationship or whatever criteria they want to use. Then you have an account with a quota in excess of what any ordinary person is going to require but which doesn't allow Bob to dump 500TB of garbage into the pool without first negotiating with the administrators.

At which point the altruistic donations, i.e. the idle capacity of the devices of everyone in the pool, are enough to handle everybody's non-abusive workloads.

Battery usage would probably depend on how frequently requests have to be served?

Discoverability and routing is certainly a question too. Something like ngrok could help there, with good network sandboxing on the user's local device.

There'd no doubt be times when the site would become popular enough that a single smartphone wouldn't be suitable and load-balancing the requests could be required.

> Something like ngrok could help there, with good network sandboxing on the user's local device.

ngrok only works because there's a company willing to proxy someone's traffic for free. In that sense "using ngrok" doesn't solve the issue, it only pushes the problem to someone else.

It's not an ideal long-term architecture, sure. It'd be better to have website search, discovery and routing possible without routing tricks.

Sometimes you have to make do with what's available currently while preparing the future possibilities.

Why makes you so sure about that? A suitable web server process uses zero CPU after it is started and opened the right port.

Phones will suspend and kill applications that aren't actively used; with iOS this has always been the case (you can't have ANYTHING running in the background on iOS unless it's navigation, calls, or music, that is you need to request special exemption from it). It might be possible on Android but even there the phone will slumber while in your pocket.

Phones are not servers.

Phones turn off their radio when the screen is off and go into a sleep state (waking up in intervals). Keeping the radio on 24/7 is just going to increase power usage by a lot. It's not the CPU that's the concern.

Even ignoring this issue, it would open your phone up to getting DDoSed by strangers on the internet.

How long delay does this sleep state impact messenger applications? Are we talking seconds or milliseconds?

Smartphones will not work as others have pointed out. Even convincing people to keep something running on their laptop for part of the day won't give good reliability.

The only real solution is selling little boxes for less than 50 dollars that you can plug into your router and make it dead simple to host a website without any technical knowledge.

The problem is ISPs dont provide any options for decent upload speed. I have a 200 Mbit download connection and spectrum throttles upload to 10. There are no symmetric options. Users can never win against a vertically integrated monopoly.

Yes. But that is a problem that you can work around with distributed hosting. Won't be perfect, because I think initial connection will take several seconds for every visitor, but should be faster after that.

Perhaps something like the Beaker Browser P2P website system? https://beakerbrowser.com/

I've run some internal-facing websites on a Raspberry Pi, also a low-cost system, but have never tried publishing them on the wider web.

The amount of unused computing and storage on personal devices is staggering.

Yet HN never misses the opportunity to dogpile on anyone who points this out: data trasfer costs, battery usage, jokes about the Silicon Valley sitcom.

As if most phones weren't connected to wifi and power for 10+ hours a day.

Someone still needs to pay for the data transfer costs ala ngrok.

I don't know of a single US mobile network that provides publicly addressable IP space, v4 or v6, it's all behind more than one layer of NAT.

AT&T doesn't even allow direct internet access, every single connection goes through their horrendously bad transparent proxy that breaks TLS 1.3, breaks ESNI, causes a SSL_RX_RECORD_TOO_LONG connection reset if you negotiate only modern ciphers, and sits in the middle of every connection to prevent things like obfsproxy.

I'm not sure if things like IPFS [0] or SecureScuttleButt [1] would be the solution (or at least partially the solution) for self-hosting websites/web content via a person's smartphones, but i like that concept.

[0] = https://ipfs.io/ [1] = https://scuttlebutt.nz/

Perhaps there is an opportunity to create a way to self-host static websites via people's own smartphones

An interesting notion, but would only work until someone's site became very popular. Then it all goes pear-shaped. And I expect that most people would want their sites to be very popular, or it wouldn't be worth their time.

There's a lot of unused computing power on phones that are charging and doing nothing overnight. If only there was a way to harness that computing power and forward the results once a day or something.

Wasn't that the plot of a season of Silicon Valley?

> She said she only allowed three trackers on spartapride.org: cookies from Twitter and Facebook that accompany their “like” buttons on the site, and one from Disqus,

There's no need to get tracked just for the like/share buttons. These don't need any JS or third party cookies, just a specially formatted link to the "social" website.

The site creator was not a web developer at the time she made the website. She was likely following deeply outdated (2014 or earlier) blogger advice about how adding those stupid buttons "create engagement" by making it easier to share the article.

Unfortunately for her and the users of the site, the mere presence of those buttons would be enough for the FB/TW to link their accounts to visiting that website, if they were logged in on the same browser.

spartapride creator here. facebook and twitter are there because of the "see recent facebook posts" and "see recent twitter posts" on the right side of the website.

The 30 other trackers from discus were loaded because of discus. When I setup the site originally my adblock/tracker blocker actually blocked discus so I never saw the other trackers being loaded.. When themarkup pointed it out to me I ended up allowing discus through and saw all the other trackers being loaded.

Thanks to them, I removed discus from my site. :)

I'm not a web developer, the server is running ghost blogging platform with a theme.

Thanks for providing the details. I withdraw my incorrect assumption :).

We're building open-source privacy solutions for websites and we also regularly scan the web for tracking technologies, as the article says Google Analytics is by far the most popular technology we find, followed by Facebook tracking pixels, Adobe Tag Manager and other analytics services. That said the amount of tracking on a site depends greatly on how it is funded. Websites that generate their revenue from advertisements tend to have 5-10 times more trackers installed than e.g. e-commerce websites.

So right now it's pretty much a "Win-Lose" situation as website publishers try to force privacy-invading trackers on users, which clearly have no interest in being tracked. Current consent management solutions are (IMHO) not a great solution for this problem, as consent managers that use dark patterns (e.g. each tracker needs to be disabled by hand, decline buttons are hidden two or three layers deep, ...) are not compliant and consent managers that give users a free choice have bad opt-in rates (which is not surprising). On our own site for example, about 50 % of users decline consent (live stats here: https://kiprotect.com/klaro/demo), based on consent requested via our own, user-friendly consent manager (it's open-source btw: https://github.com/kiprotect/klaro).

Another dilemma is that most publishers don't want to force invasive tracking on users, but they don't have much choice as the tracking/analytics market is highly concentrated and there aren't many privacy-friendly options that can deliver the required functionality and visibility.

I feel that the issue at heart consists of two parts (a) convenience and (b) cost.

The first example in the article shows this. A small publisher who wants to add commenting / discussion functionality on their website. They settle on a turn-key solution provided by Disqus. All they have to do is embed 5 lines of HTML and boom: fully fledged commenting. There's no need to invest time and money setting up and maintaining their own infrastructure to host comments.

The problem here is that the solution is that the publisher doesn't pay for the privilege of leveraging Disqus. Of course, there's no such thing as a free lunch and so Disqus will generate revenue by passing along trackers from partners who deal in advertising.

Most "solved problems" for small publishers work in similar ways, leveraging embed code and outsourcing hosting of content or complex functionality: YouTube, Instagram, Facebook widgets,... You want search on your website but you don't have time, expertise or budget? Google Site Search will give you what you need for free. The downside is that anytime you embed these third party widgets on your website, you also expose your visitors to the host of trackers that will invade their device through that embed code.

Now, imaging a Web where none of these convenient affordances existed and you will understand that small publishers would be hard pressed trying to build and deliver similar user experiences compared to what is possible today. Hosting video clips, or setting up a fully fledged search is prohibitively expensive if you don't have the resources and/or the expertise to do so.

You could also look at this from a different perspective: So many non-technical people are now able to publish content and provide rich user experiences to large audiences because of the widespread availability of these tools. Both the availability of "free" tools, and the influx of non-technical people over the past decade feed into one another.

Case in point: The wide-spread success of YouTube or TikTok where millions can just stream from their phone anytime anyplace, even though streaming video independently online (e.g. through a bespoke solution) is still very much prohibitively expensive.

People buy into these tools because they are convenient, and so easily accessible, and always available. As with most things, they don't consider the hidden but massive costs that come with it.

And so, I think that the amount of tracking only partly depends on how well funding covers the design, development and upkeep of a website. Plenty of massively expensive projects have this feature request "As a content manager, I want to be able to embed a YouTube Video in my copy." Boom. Trackers again.

At this stage, the reality is that if you want to avoid trackers on your website, it inevitably needs to be a conscious choice or strategy.

Those are very good points. I think using trackers on websites is not evil in general, it's just that we make it way too easy for the third parties to persistently track users across sites. A single Facebook, Google or Youtube script is not very problematic from a privacy standpoint, if 60-80 of sites have such a script it can have serious privacy risks for users. So I'd say we just need to make it harder to re-identify individuals across many sites, then the privacy risk becomes much smaller.

I think that the right to privacy, at it's core, is really a right to consent. That is, individuals decide whether or not consent to divulging personal information to another party. That could range from storing someone's fingerprints, to tracking the websites they visit across the Web to opening up their mail and reading the contents of the envelope.

The issue is in how that consent is obtained.

So, today, the Internet has become the primary medium for most transactions. Whether it's booking a flight, ordering food, buying books, making a restaurant reservation, reading the news or applying for jobs. The biggest societal change of the past 20 year is the reliance of our daily life on the Internet. If you don't have access to the Internet today, you are at a huge disadvantage.

The big problem is that with any of those cases I just mentioned, you are required to access on line services and, in doing so, implicitly consent to the fact that your visit will be tracked and analysed by (a) the operator of the website or platform (e.g. American Airlines) and (b) any third parties that track you through embed codes.

Moreover, you don't know in advance whether or not you will be tracked until you open up the website and be confronted with a YouTube embed and a set of AddThis buttons, at which point it's already too late to retract your consent.

This is the very reason why the EU has issued the GDPR regulation forcing website operators to explicitly ask consent from visitors on their own behalf, and the behalf of third parties tracking through embedded widgets.

One downside of the GDPR is that end user is confronted with a pop-up before they get to the actual content.

Another downside is that some website operators tend to make consenting explicitly obtrusive unless you, as a visitor, hit the big green, flashy "Accept All" button that brings you directly to the content. Which is anything but what the GDPR envisioned.

A final downside is that the GDPR is difficult to enforce. As one is required to file a formal complaint if one feels that their rights aren't respected by website operators, which is generally not something most individual consumers are willing to spend time, money or effort on.

Even so, despite these downsides I feel that the GDPR is a step in the right direction, as it forces website operators to think about consent.

Consent to divulging personal information is only one part of the equation. I think the other part, the really problematic part, are the ends to which personal data are harvested and in whose hands that data ends up. While I agree with the statement that tracking in itself isn't evil per se, I think that scandals such as Cambridge Analytica, and how harvested data is used by advertisers to accurately target consumers, well, that is completely problematic. I think that far more regulation and, above all, enforcement of that regulation, is needed in this area.

Context very much matters. Personally, I don't mind being tracked across several e-commerce shops to provide me with relevant offers. I do mind being tracked on websites that absolutely have nothing to do with e-commerce, only to notice how that information ends up being used/abused in totally unrelated and, frankly, unwanted/undesirable contexts.

This is all a form of privacy arbitrage. Users are trading their privacy in return for access to content at no $$ cost. The problem is that the deal is not fully transparent to the end user. The amount of privacy they are actually trading is far higher than they imagine.

The Blacklight tool is great in that it gives a user visibility into the actual privacy cost of a given website. Whereas they maybe thought their general interest in a topic was the only thing traded, they can now see in how much detail the tracking goes. Far beyond what most likely anticipated.

You didn't make the complete argument. The problem isn't just that users are not aware of exactly how much privacy they give up. Even if the user knows exactly, you get a sub-optimal Nash equilibria like scenario. The problem is that

* the user perceives that giving up a few details about themselves doesn't seem like it would benefit the ad company that much. This is the marginal benefit (to the ad company) per user.

* the company builds a database of user data and gets some total benefit. From this you calculate the average benefit per user.

The problem is that the average benefit per user is much bigger than the marginal benefit per user. So each user doesn't see the danger, but collectively there is a big problem.

But certainly, in some contexts, the user benefit isn't marginal at all. People know YT tracks their habits and shows them ads that are sometimes extremely specific to what they do both online and in the real world, but they still will use YT since the lure of an ad-supported platform that gives you access to billions of hours of {entertaining | educational | relaxing | etc} content instantly. Search is also extremely useful and accurate for 99% of people and the tracking is probably as a viable trade-off for access to it (although most will use adblock on desktop), although the problem of most people not knowing the full extent of their Tracking still persists.

The title is quite misleading, the privacy problem is not related with the website being free or not.

Technically you're right. You can run a website that gives users a service at no cost without violating privacy.

Pragmatically, however, it's very rare and 'free' correlates very closely with privacy violating business models. Talking about how free services violate privacy is fine when 99% of those services do exactly that. Those businesses shouldn't get a free ride on harvesting and selling personal information just because a handful of other businesses manage to be better.

Just because you are paying doesn't mean you aren't also giving up your privacy as well.

Your argument, even if considered valid only cover one part of the problem with the title. Many paid services also violate users privacy. So no, the correlation of "free" and "violates privacy" is not that strong.

There's plenty of services that try and operate under the 'freemium' model, but they're often pretty restrictive, and website hosting is often bottom of the barrel scraping lvl when it comes to that.

The title is slightly misleading. The article claims the privacy issues are related to the Discus service being free for the websites that use it.

Disqus could be free and not behave this way. Disqus could be a paid for service and still behave this way.

I mean, yeah, free Disqus is bad, I don't need an "one-of-a-kind free public tool that can be used to inspect websites for potential privacy violations in real time" to tell me that. And that "one-of-a-kind" claim is just laughable, I can name like a dozen number-of-trackers-on-a-page projects.

The only mildly entertaining thing is that the website reporting on this isn't packed filled with trackers of its own (just BlueKai, which unsurprisingly isn't reported by its "one-of-a-kind" tracker detecting tool).

Hey, I work at The Markup. Do you mind sharing where you're seeing the tracker? We very intentionally aren't loading trackers like that, so if it's showing up somewhere I'd like to figure out why.

The tool does pick up Oracle DMP, which is what Oracle renamed kinda sorta renamed BlueKai to. For example, see The New Yorker's site (scroll down and expand the Oracle heading): https://themarkup.org/blacklight?url=newyorker.com

Edit: Oh, hah, I suspect whatever tool you're using is picking up this image, which uBlock origin is flagging it for me: https://mrkp-static-production.themarkup.org/graphics/blackl...

_You_ don't need that, but the point of the article is that the woman building the trans support site did. Not everyone is as tech-literate as HN readers.

This reminds me of a strange ad I got from Facebook back in the early days( 2009 or so ). I had dated a Korean girl for a brief period of time , and the moment we broke up ( via Facebook) Facebook started spaming me with dating ads. All the dating ads had Asian women in them

I meet this girl in real life, so it's not like they tracked my web activity. I've always felt like they read your messages, the people you interact with , and then build a model of you.

I was "married" to another guy on facebook during my undergrad, one of those joke scenarios. My interest was set to women. However, i kept getting gay dating ads. I always thought that was weird, why would it give me dating ads if i was married?

I've always wondered how it weighed my preference and my apparent relationship status.


42% of people on Tinder are already in relationships. I view these as Validation apps as most people don't meet anyone . In that case even if your happily married you might still seek validation for the low low price of 35$ a month.

how did it even know you'd broken up? that sounds more worrying

Sentiment analysis isn't exactly hard.

In fact it could be as simple as knowing her ethnicity and then seeing we used to message each other a lot, and then suddenly stopped sending messages to each other.

That title is super misleading. Hosting a free website does not have a high privacy cost. Using "free" third-party services to build your website potentially has a high privacy cost.

It blows my mind how much of the internet economy is built on sucking up user data, some of which makes sense, but I'd imagine the majority of 3rd party tracker data is used for recommending ads. Ads a solid constituent of users ignore entirely. I have personally clicked on an ad in earnest maybe once or twice in my entire life. I know a lot of this data is poured into recommendation engines but it seems like those I use aggregate results mainly based on the product seller's contract with the site and anonymous user data. If you could observe the gross cost of aggregating all of my data would the net of purchases swayed by the results of tracker data aggregation exceed the cost if excluding purchases which would've been made without any intervention based on that data? I just don't understand the economics of, "data is the new oil" because it seems like the number of profitable conclusions that can be made on mass aggregation are limited to a small handful of huge companies, and those selling data snake oil to political campaigns or whatnot.

It's so valuable because it's not about ads at all. It's how it started but not where is headed.

See how Cambridge Analytica literally seated public opinion with this data. And that wasn't even targeted at individual users.

Alright, that settles it. This weekend I'm removing all third party js stuff I have in my website—either dropping the functionality entirely or replacing it with OSS alternatives I'll manage myself, or that I can trust enough to pay for—starting with disqus.

I had been thinking of getting rid of that for a while, but this is the last push I needed.

As a web user I thank you!

Jaron Lanier gave a good Ted talk about the problem with free. I'm not sure it would have changed things as greed always seems to win.


Many of the comments, while meaning well, are way off the mark in my opinion. This isn't about coming up with another solution to make friendlier and less privacy-destructive alternatives: People want zero-friction and these companies give it to them.

These anti-privacy organizations (I include ALL ad companies and the like) do not give a fuck what you want... it's all about them and their customers.

The only solution is war. By this I mean, countering all their attacks by disabling all their weapons.

Ad blockers, tracking blockers, disabling javascript, bypassing paywalls, whatever it takes.

You are just $$$ to them and they will take from you whatever they can.

Exactly. I'm beyond looking for solutions or compromise. Stuff like that BAT token from Brave, or EFF's no tracking policy document. Screw that. I'm all out on blocking by any means possible and I will never stop even if the situation improves. My trust has just been eroded to the point of no return

Covid: What is self-isolation and who has to do it?

When do I need to self-isolate? You should self-isolate if: • You have Covid symptoms - a new continuous cough, high temperature, or change in sense of taste or smell • You test positive for Covid-19 • You live with someone who has symptoms, or is ill • You arrive in the UK from one of a number of countries which aren't exempt from quarantine rules • You are contacted by NHS Test and Trace to say you have been in close contact with someone who has tested positive

Check more updates on - https://dailyuknews.com/

In 1998, we had big banner ads that were relevant to the content on the page, instead of viewer's personal profile. They were generating revenue without invading anyone's privacy.

Companies always buy ads that are not personalized: on live TV, in stadiums, on billboards, on highways and so on.

So it is possible to have ad-supported business without invading privacy. But invading privacy brings a lot more revenue. So I don't agree with the argument that a privacy-respecting free services have to invade privacy.

Also in the report, it also says that many paid services like banks and others were also invading privacy.

Yeah but targeted is more lucrative and all companies want is more dollars. Anything that generates less is not even considered.

It's so bad it basically makes untracked ads unviable. With billboards it's simply the most lucrative option available.

By the way in Belgium people already found face tracking display billboards so even untracked billboards will soon be a thing of the past.

They say "To investigate the pervasiveness of online tracking, The Markup spent 18 months building a one-of-a-kind free public tool that can be used to inspect websites for potential privacy violations in real time. Blacklight reveals the trackers loading on any site—including methods created to thwart privacy-protection tools or watch your every scroll and click."

But the EFF's Privacy Badger does exactly this. It's possible that I'm missing something, but how is Blacklight "one of a kind"?

I'm a fan of Privacy Badger, but if you use both on the same site Blacklight appears to cover more types of scripts and does a much better job of explaining potential issues with each.

Privacy Badger deems scripts either "Potential tracker", or "Your Badger hasn't decided yet if these domains should get blocked", while Blacklight has 7 categories each with a short headline and a long description.

Putting all that aside, maybe it's one of a kind in that it can be run on their infrastructure first, without risk to you.

> To avoid giving website analytics market leader Google data about every visitor to his website, Butler said Protonmail built proprietary analytics software. Most websites can set up Google Analytics in an hour, he said, but ProtonMail’s system took years to build, cost half a million dollars in server hardware costs alone and requires a permanent full-time staff to continue to maintain it.

Sounds like an opportunity to sell privacy-respecting analytics software ?

There are some already. I'm working on Plausible Analytics. Modern dashboard, simple metrics to understand, open source, can be self-hosted, lightweight script (<1 KB), no cookies and doesn't collect personal data https://plausible.io/

I believe https://simpleanalytics.com is one such privacy-respecting analytics package.

No relation to the organisation, I've just seen them talk on this forum a lot.

In addition to already mentioned Plausible Analytics and Simple Analytics, there is also Fathom Analytics.

Cloudflare also just announced yesterday a "privacy-first" focused analytics offering.


You also have piwik and matomo to throw into that market.

It kinda looks saturated already to me but I'm sure there's plenty of room for smaller players!

Makes you wonder: why does protonmail need so many statistics and why it's it so valuable to them to spend millions on?

Is there a npm package or a "component" you can drop in a website's source with some config that would allow you to block off anything that's not on the configured "allow-list"? Sounds like a "privacy badger" or "uBlock" of the web-app could be a neat thing

See the tracker feature of the latest Safari, it's hard to find any familiar site that isn't horrific. I know my employer's websites are atrocious with all the connections it makes, but of course I can't do squat about it.

What I wish I could be surprised about is that people are still confused by this.

There ain’t no such thing as a free lunch. TANSTAAFL

These services don’t operate out of thin air - they have overhead that has to be paid for - one way or another.

The issue with services like Disqus is that even without the advertising (which adds their trackers), Disqus itself is a tracker, the same way Google Analytics is.

You can see more details here: https://data.disqus.com/

One alternative to Disqus, Commento, suggests that Disqus isn't even GDPR compliant.

A alternative to disquss, good for static websites https://posativ.org/isso/

isso is great, it works pretty well though the moderation tools are a bit light.

Commento is super easy to self-host with Docker, highly recommend it!


Or you can run your own website from your own computer and implement your own comment system. For my own I just have a perl script tail my nginx access.log any any URL request that has a /@say/ in it is parsed as a comment. Comment systems don't have to be attack surfaces even if you run them yourself.

Another self-hosted alternative to Disqus https://github.com/umputun/remark42

Wow, thanks for that, I had no idea. What a sinister page.

I run a website with no account or login. Is there any way to get 1-2 USD/CPM in revenue without using a privacy destroying ad network?

I'm reading Shoshanna Zuboff's Surveillance Capitalism at the moment and the thing I realized is that I don't fully grasp what privacy is. I mean I know abstractly what it is, but I don't feel it strongly. Or perhaps what I mean is that it's so deep down, so complete a part of me that I don't recognize it as a distinct thing.

She tells a story about a small town in England who tried to stop the Google Streetview car coming through and I thought to myself "what were they actually feeling"? Because I doubt I would have felt that as pertaining to my privacy and yet of course it is.

I think a lot of people like me find it hard to quantify what that thing is and where their boundaries for it are and as a result we are obviously wide open to having it exploited.

The Wikipedia opening line is interesting:

"Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively." [1]

Other dictionary definitions:

"someone's right to keep their personal matters and relationships secret" [2]

Now personally I think privacy != secrecy. It's more like obscurity or vagueness or indeterminacy. Like not being pinned down, rather than having any specific information hidden.

"A state in which one is not observed or disturbed by other people" [3]

I think that's a bit closer but it lacks the idea from Wikipedia of "being able to expression themselves selectively"...

[1] https://en.wikipedia.org/wiki/Privacy

[2] https://dictionary.cambridge.org/dictionary/english/privacy

[3] https://www.lexico.com/definition/privacy

> "Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively."

This is the problem with social media such as FB. You do, in principle, have the ability to express your selectively, but the friction of creating and maintaining lists and only showing certain posts to certain lists is way too difficult.

Messengers like Whatsapp are much better because group conversations are selective and maintenance is much simpler - akin to real life interactions. I do notice the same person behaving differently in different groups, which is really hard ton FB.

> I don't fully grasp what privacy is

> people like me find it hard to quantify what that thing is

Privacy is [REDACTED]. Intelligence is not privacy.

Recently I cleaned some olde books I don't need anymore and there were a bunch of them on info security, so I skimmed them good bye, and they all stated the same vague definitions like no one knows what privacy is, there's no agreement, etc.

Actually there're definitions of privacy and intel, they are just sad and unethical, hush shush.

neocities.org, an attempt to replace groceries, places no trackers or advertisements your website. And it’s free.

So... this is an ad for Blacklight?

A couple of months ago a big German publisher asked us to provide information on our service (regarding GDPR). I asked them who is our mutual client (advertiser), that is using our services on their network. The answer was: The tracker was added programmatically, so they cannot say who's behind it.

And recently, a couple of days ago, a bunch of German publishers started a concerted action. They are asking every known mar tech company (I guess from the official TCF vendor list) to give detailed information about tracking cookies, tracking urls and so on.

It's pretty clear that publishers don't really have an idea what they are implementing on their sites, when e.g. adding the GA containers or selling their assets. The mar tech universe is huge and complex, consisting of hundreds and hundres of different companies, using tracking technologies, piggy-backing pixels, transfering data between each other.

In the beginning there were three main stake holders: Advertisers, Publishers and Users. When mar tech companies arrived, the amount of participants increased tremendously. Just to improve the delivery of ads.

And GDPR? Either you'are facing dark patterns or you have to select between "ad supported" and "paid content". GDPR will not reduce the complexity of the system. Best two evidences:

The TCF. An industrial standard from the industry for the industry. It's not designed to be understood by end users. The user is facing mar tech buzzwords, dozens of purposes and hundreds of vendors. How does that help?

And plugins that hide / skip consent banners automatically, because people hat them.

would some steps like blocking 3rd party cookies, social media trackers, cross-site tracking cookies, fingerprinters help to thwart most of such tracking stuff ?

BTW what is a good free CMS now? I looked and Wordpress/Joomla /Drupal were most popular. What year is it?

I feel like for the technically minded, static site generators are pretty popular. For people that just want to write, things like Wordpress are still popular (because there is nothing to maintain).

Thanks, I'm thinking of going the wp route. https://www.docker.com/blog/deploying-wordpress-to-the-cloud...

Maybe it's because those three are good enough?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact