Hacker News new | comments | show | ask | jobs | submit login
Visiting a site that uses Disqus when not logged in sends URL to Facebook (dantup.com)
205 points by d2p on Jan 6, 2017 | hide | past | web | favorite | 76 comments



I wish all website would wait for the user to turn on social features before offering them. I'm not interested in any of them, the scripts shouldn't be loaded for nothing.

Take a look at this way to do it: http://panzi.github.io/SocialSharePrivacy/


As a user I use uBlock Origin to block all 3rd party JS by default. This protects me from loading ads, social widgets(trackers), and trackers. A lot of the web is completely broken when you don't run 3rd party JS so each site requires a bit of whitelisting before it will function correctly.

As a website owner I try to lead by example by not including any 3rd party JS(or any JS at all for that matter). Specifically avoiding trackers from Google or Facebook.


Even without going as far as blocking all 3rd-party js/frames (which I do personally), one can use the dynamic filtering pane to at least block the most ubiquitous domain names and allow only on a per-need basis.

I do use facebook.* in one of my tutorial about reducing privacy exposure[1], but of course this applies to any ubiquitous 3rd-party servers out there (of which Disqus qualifies in my opinion, same for Gravatar, etc.)

This approach will minimally break the pages with nice benefits in return: reducing the ability of ubiquitous 3rd-parties to profile browsing history, faster page load.

[1] https://github.com/gorhill/uBlock/wiki/Dynamic-filtering:-to...


Agreed. I use PrivacyBadger[1] to block the download of assets from third-party domains. That way I can selectively enable anything that I want from the blocked domains.

[1]: https://www.eff.org/privacybadger


Heise now uses a new tool they developed called Shariff. Looks better imho and is easy to use. It also shows a like counter if you proxy requests to FB's Graph API through your server.

https://github.com/heiseonline/shariff

Would recommend this over their old two click social share. Your link is a fork of the old Heise tool. It looks dated on mobile.


Bruce Schneier's security blog implements something like this, providing an on/off switch for each social network's "Like", "+1", etc.

https://www.schneier.com/


h-online used to do something like this as well.


that looks great! I like how the buttons are originally grey, which is a clear visual indicator that they are not enabled.


Use ScriptSafe, Ublock, Privacy Badger, Ghostery or any number of script blockers to block 3rd party scripts.


This tracking stuff is a plaque and I'm part of the problem. I run an unpopular site with random bits of information on it that uses AdSense to give me a few bucks a month and Disqus to allow comments.

Uhg. I really need to think about whether I want to be part of the problem.


somebody in the other thread mentioned https://www.discourse.org/ as an open-source alternative to disqus, although there were some people that downvoted it, so I don't know how good it is.


Discourse is primarily a better alternative to bulletin-board style forums. Here's their docs on embedding, which seems like it would make it act a little more like disqus. https://meta.discourse.org/t/embedding-discourse-comments-vi...

Note that a major difference is that you apparently have to go into the forum page to leave a comment, you can't do it from the page you're discussing.


Disqus = paste some markup in your page and done

Discourse = set up at least a 2gig server then figure out how to integrate with your site.

On that level they aren't really comparible.


As mentioned in the article there was a related discussion yesterday, where removal of ad network stuff doesn't really matter since Disqus is used for comments:

I've removed all ad network code from my blog (troyhunt.com)

https://news.ycombinator.com/item?id=13326792

This included a screenshot of DoubleClick still being blocked on Troy Hunt's blog.


I'm reviving my blog, and currently plan to explicitly ask:

1. May we retrieve common libraries from third party CDNs? Doing so helps support this site by saving on our bandwidth costs, but may expose information about you to those third parties.

2. This site allows commenting through Disqus. We have no control over what Disqus does with your data, and so your information may be exposed to Disqus and any third parties they communicate with. Would you like to enable comments?

3. (Similar for tracking, if I decide to do something other than log parsing.)

Default 'no' to all, and I still need to find a way to ask the questions in a way that doesn't disrupt simply viewing a blog post that someone linked. Perhaps if someone returns, I'll prompt then.

Anyone have thoughts on if this sounds sane?


Just code your site to do something sane with 3rd-party content blocked. E.g. handle load errors with fallbacks.

That way people with µMatrix or similar blockers can use the control tools they have instead of needing to do something site-specific.

Also, such decisions can't be remembered if cookies/localstorage are disabled. So prompting over and over again could also be annoying.

> 1. May we retrieve common libraries from third party CDNs? Doing so helps support this site by saving on our bandwidth costs, but may expose information about you to those third parties.

In an ideal world browsers would never send a cache-refresh request for resources tagged with SRI[0] because the hashes would guarantee that the content is 100% stable. Alas, it has non-trivial privacy implications, so they don't do that.

Maybe it could be implemented as a privacy addon with a whitelist for CDN domains, but then sites would still have to adopt SRI for that addon to do its work. Or maybe an addon that injects cache-control: immutable[1] to CDN could work too, but that's limited to https.

[0] https://developer.mozilla.org/en-US/docs/Web/Security/Subres... [1] https://bitsup.blogspot.de/2016/05/cache-control-immutable.h...


My two cents:

1. All local. Unless you don't want and a 100-200kb JS file is too much of a strain on your server bandwidth. Or are you serving 15Mb of JS files?

2. Screw Disqus. Screw Facebook Comments. Start thinking about your visitors, as someone said on another related thread, you are responsible for the tracking of your visitors by 3rd-party sites. Local comments or turn them off if you don't care about what others are saying. Don't save any information about the commenters except what they enter in the boxes. One-way hash the IPs if you need to compare for spam reasons.

3. If you need your ego stroked when you see you had xx visitors on your site, go ahead, use Google Analytics and screw us all. We're gonna block it anyway.

[1] This is a privacy policy I use and respect very much when interacting with the visitors/commenters on my personal blog.

1. https://vox.space/pages/106/privacy-policy


> 3. If you need your ego stroked when you see you had xx visitors on your site

That's a bit harsh - wanting to know whether you get 0 visitors to your blog post or 2,000 yesterday isn't just about ego; it helps you understand the value of your posts (and whether you should bother). Knowing how many people visited isn't the same as bragging about it.


Just use the web server logs and crunch them into a log analyzer.


My blog is hosted on GitHub Pages; I don't have access to web server logs.


Maybe it was harsh, the idea is: You might write better when you don't know how many people read your articles. On the other hand, you might write better when you know. Plan accordingly :)


Maybe you do, but it's not outrageous to want to know. If my posts are getting 0 visitors, I'd rather know and spend the time on something else!


2. I need to look into self-hosted comments; but I was hoping to make the blog portion of the site static to keep it simpler. Project pages may have demos/etc that pull in JS libraries. But you raise a good point in 1 that also applies to 2 - given that I'm probably going to be reaching only a handful of people initially (and perhaps longer), worrying about bandwidth is a premature optimization.

3. I've just about talking myself into going with log-based analytics here. I find ga's omnipresence too worrisome to contribute to it, even with consent.

Thanks for that link, that's essentially the policy in my head before I started thinking about things like comment support. It's way better written than I would've come up with.


Re #2: - patio11 recently fired off a few tweets about doing away with comments on his blog: https://twitter.com/patio11/status/813895918692876289

Can't say I would disagree. A lot of folks these days just seem to append an HN/Reddit link to posts that get discussions on those sites. There's the blog as an expression of author's personality, and then there's the discussion space as an area with a life of its own.


1. If you're only interested in saving bandwidth and don't care about cache hits from overlapping with other sites, maybe you can host static content somewhere free (GitHub Pages?) or even just set a long cache header (ensure version numbers in filenames, cache for > 1 month) since presumably you're going to serve them the first time before the user has answered anyway?

2. I'm thinking of putting a "Click to load comments" box in place of Disqus on my blog so nothing gets loaded unless the user clicks. Seems better than bothering the user up-front.

3. I use Google Analytics - I figure it's common enough that if people don't like that, they'll already have it blocked, so there isn't really any additional tracking they won't want (unless the twitter timeline widget is tracking; which it might be, but I suspect I'll remove it soon anyway).


1. That's a possibility - though any time you're sending off to a third party for content, there's no way of knowing what they're doing around cookies and browser fingerprints across their properties. A step up from running scripts loaded from those sites though. And yeah, the default will be serving the content locally until explicit consent is received.

2. I like that idea. I also kind of like the idea of just not using comments - when I used disqus years ago it was mostly spam - but I think I want to try again and see if it's worth it.

3. Also a good point, but that only accounts for those people who are aware of the tracking as a point of concern. Given that the blog will be technical with a personal bent and vice-versa, one or two of my ten readers may not be aware of tracking as a thing :)

On this front, though, I'm probably just going to start with log analytic tools. It's really the only way to get a fully accurate picture across visitors (server side logging can't be blocked, but GA and even self-hosted data gathering can), and I don't really care too much about the additional info that analytics can provide.


2. I use Disqus and don't actually get much spam (maybe 1 spam post every 6 months, and it always gets flagged by Disqus) but there is often useful stuff in the comments. I think my blog would be much worse without the comments (and I wouldn't get the occasional "Thanks!" comments, which help me know that my posts aren't useless) :-)

Given the option, I would probably also just parse logs - I don't think Analytics is adding much on top of that; I just don't have that option using GH Pages. The reason I moved from AppEngine to GitHub was to stop messing with the code for my blog in an attempt to make me write more posts instead! =D


Notes;

1. Serving from github still shares the tracking information. It can be argued that github is better than cloudflare/facebook, however bear in mind github has politically motivated staff. Long cache is a great idea. Alternatively cut out unnecessary js.

2. Nice idea, it does hamper the ease of use of your blog though - I would never click to view, though I did read some that were visible when I finished the article.

3. Do you find the information from this useful? In a way that isn't trivially parsable from server logs? I ask because we are reviewing the quality of our user analytics, and our ga seems rather pointless atm.


1. Good point; I'm not really sure where I was going with this now; GitHub and another CDN are basically the same. I must've been distracted while replying!

2. Yeah, it's not ideal. In this case, it looks like Disqus are gonna fix stuff though (they've commented on my post; there's a link right at the top of the article now).

3. I don't have access to the server logs as I'm running on GitHub Pages, so something like Analytics is all I have. I do find it useful (given no server logs), it's nice to see the traffic to my blog; there's no point posting if nobody is reading! :-)


2. That's pretty good!

3. That is very interesting, now knowing your stack (pages + disqus + adverts) I see one side of the 'problem' is that bloggers don't have much choice in terms of revenue, so the infrastructure charges with user data . The other side is likely the complexity, incompatibility, and time wasting of home rolled solutions.

The really nice part of a CDN deployed blog is handling the traffic spikes though.


What's wrong with Cloudflare?


They receive a large amount of internet traffic and have the potential ability to fingerprint users and subvert privacy protections. AFAIK they don't do anything malicious, but I don't know they don't.

In fact I would say CloudFlare are better than both GitHub and Facebook, and I am only wary of them because of their position of power and the potential they have (ie. they are a victim of their own success). Both Facebook and GitHub have shown themselves to make political decisions at the expense of their users.


Depends on the definition of wrong! CloudFlare is a bit of an HN darling thanks to their employees' active contributions and submitting every technical post on their blog. Free distributed DNS and potential DDoS protection is also a tempting offer.

To privacy-conscious users: CloudFlare is the man-in-the-middle for more and more of the Internet, potentially tracking at Google-like levels.

CloudFlare may: ... Add script to your pages to, for example, add services, Apps, or perform additional performance tracking. (Unfortunately this is opt-out rather than opt-in.)

https://www.cloudflare.com/terms

To Tor users: CloudFlare implements a captcha to protect servers from malicious traffic; the implementation has caused tremendous annoyance in the past and the company may have been slow to address this problem.

https://news.ycombinator.com/item?id=7977780 (example complaint, 3 years ago)

https://news.ycombinator.com/item?id=11388560 (9 months ago, from cloudflare)

https://news.ycombinator.com/item?id=11404770 (the tor project response)

https://news.ycombinator.com/item?id=12122268 (6 months ago, additional discussion of tor vs. captcha)

To DDoS victims: CloudFlare protects several DDoS vendors while gaining business protecting DDoS victims, citing free speech.

https://krebsonsecurity.com/2016/10/spreading-the-ddos-disea...

https://news.ycombinator.com/item?id=7242377

To CloudFlare customers: CloudFlare has a "target on its back" and has faltered against DDoS in the past, causing outages for all of its customers. AFAIK: It's been a while.

To CloudFlare freeloaders like me: CloudFlare doesn't have much incenctive to protect its free-tier users from DDoS.

Related: Akami stopped helping DDoS'd pro-bono client Brian Krebs. https://news.ycombinator.com/item?id=12561928


Ah, thank you for the detailed reply. I started using CF more extensively yesterday, due to their free CDN (which is working great), but I agree that their MITMing the internet is worrisome. Maybe I should switch to MaxMind, if it's cheaper than CloudFront.


Like Ghostery, it is important to be aware of the cons but I'm still using CloudFlare.

In my book CloudFront easily ranks ahead of had-been "do no evil" Google's irrevocably merging it's entire history on me ex post facto. https://news.ycombinator.com/item?id=12760003


2. This sounds off to me. Imagine if a restaurant's menu said they don't know where their ingredients come from or what they may actually consist of - that's probably true a lot of the time, but it makes the customer wonder why the restaurant brings it up but doesn't do anything about it...


I'm working on an alternative to Disqus called Remarkbox - http://www.remarkbox.com

One of my early design decisions is to be as lightweight and fast as possible. This means no oauth, no ads, and only core features that you would expect to find in a comment system.


Just tried it out, very cool man.

My suggestion would be to make the design more appealing, it looks a little bland now.

And also promote the privacy oriented mission of the service a lot more. Currently there is no mention of privacy/tracking, you only mentioned no ads.

And https is a must in 2017.

Just a few question:

* When do you plan to launch?

* What is the backend built with?

Good luck man.


* When do you plan to launch?

I'm soft launching with beta users right now.

* What is the backend built with?

Python, Pyramid, SQLAlchemy (which supports PostgreSQL, Mysql, and SQLite3), uWSGI, Nginx, Ubuntu


For me, the problem is that the smaller a service is, the less reputation they have to lose by screwing everyone over. I don't know who you are or that you won't inject ads or affiliate links into my site in a few months (or sell your domain to someone for a few quid that will). (This doesn't mean I think your intentions are bad; I just think it's a bad idea to trust people you don't know on the internet!).

I don't mind included scripts on my page from huge orgs that have a lot to lose by doing bad things but there aren't that many companies that fall into this (Disqus did, but possible shouldn't ;))


That is an interesting point you bring up. The stigma that small businesses have to overcome when being, well small. I think for the most part recently the bigger companies are the ones putting one over on end users. (changing terms, shutting down services).

I'm planning on building a business around this service, and reputation will matter. I don't plan to sell out because I eat my own dog food. I built Remarkbox for a personal itch, an itch I feel other people may also have.


> I think for the most part recently the bigger companies are the ones putting one over on end users. (changing terms, shutting down services).

For some things, this is true. However you can be sure an entity like Microsoft or Google isn't going to accept a few thousand quid to inject ads or affiliate links into customers websites. A one-man-band that's struggling to turn a profit though, it's less certain. There are a lot of people trying to make a quick buck online and most people have a price.

There are definitely some great things out there being built by small teams that I might miss out on, but that's how it is unless we can tightly control what third party scripts can do on our pages. Sometimes a service will be "so good" that I'll do it anyway, but it's always a trade-off. I don't think most people are as anal about this as me though!


Just a note ...

It is possible for someone to say "hugs"[1] at the end of their discourse and still be a liar and a cheat and a terribly bad actor.

No idea, of course, about any of these people - but don't let cost-free, content-free expressions alter your (bullshit/fraud) detector.

[1] See comment on OPs blog from "disqus here"


Sure, I just posted a link to make it easy to find their comment. I'm giving them the benefit of the doubt that this is an accident and they're working on it, but I'll believe they care when the fix is live and I can see it with my own eyes :-)


Thanks for calling my honest reply a lie and me a liar, really appreciate it.

I'm @madbyk on Twitter and you can also Google my full name to catch my other lies and bad acting on some of my recorded talks.


"Thanks for calling my honest reply a lie and me a liar"

I did no such thing. In fact, I specifically admit to having "No idea ... about any of these people".


True that. I reread your comment and realized I misinterpreted it.

Skepticism is good :)


PrivacyBadger blocked his Disqus embed. I think a good test of whether your site/blog is privacy conscious is to see if PrviacyBadger reports any tracker.


FWIW - Disqus commented on my article - there's a link to their comment right at the top of the article now.


TLDR: it was because Disqus added the Facebook SDK in the last week or so, for some new feature they're testing. They're looking into this.

^ That sounds legit to me... I believe this was the primary reason why Facebook made an SDK and Like button in the first place...for data mining. Pretty clever.

This is the consequence of building on a platform like FB, you exchange your visitors browsing habit data for access and FB expands their graphs of IP<>websites to improve their ad targeting. And with Disqus is won't be as obvious because the publisher might not be aware that it leads to an FB connection.

So regardless if it was unintentional this is a relevant story for the trade offs of using platforms.


I noticed the same thing about a week ago when I was setting up comments for my blog [1]. I hate bloated websites, so I copied the Disqus markup and opened up Chrome dev tools, and saw the Facebook URL along with dozens of other resources being loaded.

I ended up researching WAY too many comment systems, and eventually settled on Reddit. Not ideal, but better than all the alternatives.

Blog commenting is pretty broken right now, I guess due to the dominance of social networks. I wanted to write my own blog comment service in rage but thought better of it.

Disqus seems pretty sloppy. I was surprised to learn that they were an early YC company.

[1] http://www.oilshell.org/blog/2016/12/29.html


How did you use reddit as a commenting system? It's something I've thought about before but didn't know someone has already built it


I created a subreddit [1], and I just submit every post to it. In my scripts to generate HTML from markdown, I add a footer with the comment URL.

It's not very active, but it's only been alive for one week. It's an experiment, as I mentioned.

It could be improved by showing the comments inline using Reddit's API [2], which seems pretty good (although I haven't used it). And I could probably automate the subsmission too.

But I'm trying to keep it simple for now, until there's evidence that a lot of people want to comment!

[1] https://www.reddit.com/r/oilshell/

[2] https://www.reddit.com/dev/api/


I'm guessing they've just set up a subreddit, which they post in each time they make a new blog post. In the past I've seen "Join the conversation on Reddit: ${link}" at the end of blog posts, but maybe they're doing it differently.


As the sibling poster suggests: I was thinking of somehow embedding a reddit comment thread onto the page. Either via an iframe (if you own the subreddit you can control the CSS on the other side to make it match), or some JS library that did XHRs to reddit's API.


I'm not a user of reddit, nor familiar with their ins an outs, but does reddit allow for threads in an iframe? maybe that could be an option?


Ugh, thanks for this. I've made it a goal to start understanding all the little tricks and details of modern day tracking techniques that allow Facebook, Amazon, etc., to know everything that I do. Anyone know if there's a good one-stop-shop website for this topic? I've found lots of separate articles about the it but no central clearinghouse of information.


Some years ago I looked at Facebook's ToS for implementing "log in with Facebook" and at that time it looked like it precluded an implementation that would only send requests to Facebook if the user chose Facebook login. I don't think it's for sure that disqus could fix this problem if they wanted to.


Back in the day, Heise apparently caught some flack for protecting their readers while still allowing Facebook "likes".

It feels to me like the typical Facebook approach: do what they want to do or a little bit more, monitor the blowback and walk it back as little as possible only if required to keep everyone happy.

https://yro.slashdot.org/story/11/09/03/0115241/heises-two-c...

https://www.heise.de/extras/socialshareprivacy/ -> http://panzi.github.io/SocialSharePrivacy/


That was basically just a trademark dispute. They claimed it was confusing to show a Facebook "like" button that didn't work like Facebook's actual "like" button. It's fine if you use your own assets to indicate what the button does, but you can't use a Facebook logo or their thumb icon.


I understand Facebook chose to use trademark law to threaten to block the Heise app id and even their entire domain (any sharing of the paper's content on Facebook).

Facebook continues to use every tool at their disposal to protect their expansion of the privacy invasion of their product.


But nothing came of it, and other sites have implemented similar safeguards without being blocked.


Yes. As stated above:

monitor the blowback and walk it back as little as possible

In this specific case it was indeed only possible to walk it all the way back.


The facebook SDK surely can't reliably tell if it was loaded on page load or only after the user clicked a "Facebook" button? And they support OAuth, so you don't have to use their code at all on the client side.


I don't know the details, but I can't see a reason why they can't just not include the SDK until they need it (sure, this will add a delay before they can use it, but seems better than this current implementation for privacy!).


It's obviously in facebook's interest to load the SDK as much as possible. Even if you are not logged in, they can get a lot of valuable tracking information from the server logs, including IP address, referrer, any fb cookies other than login, etc. In fact, so long as a client loads the sdk on multiple sites, even if logged out, Facebook can still track that client across sites visited (simple list of referrers associated withvthis cookieset)


> Troy cited tracking as one of the reasons for removing ads

Ads should be loaded into <iframe sandbox referrerpolicy="no-referrer">

It would still give them some information (affiliate ID and user IP) but no cookies or tracking of user interaction with the page itself.


Do ad networks allow doing this?


It would probably break Google Adsense, since the ads that are generated are based on the page content.


Today Disqus deployed a fix for this issue; you can read their comment on the blog posts here:

https://blog.dantup.com/2017/01/visiting-a-site-that-uses-di...


It's an unfortunate reality. Once Amazon figures out who you are, they send a feed of everything to you at or buy to FB.


I think Ghostery stops this.


I believe ghostery is one of those kind of adblockers that checks if the ads and trackers target the right people, no?

Sort of a meta-tracker. But maybe I'm too paranoid.


>I’m certain Disqus could fix this,

most likely they are getting paid for this tracking


I think this is unlikely, it seems like a silly accident to me.


Someone claiming to be from Disqus has said they are investigating the issue, as it's not their intent.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: