Hacker News new | comments | ask | show | jobs | submit login
Looking Up Symptoms Online? These Companies Are Tracking You (vice.com)
185 points by sinak on Apr 26, 2015 | hide | past | web | favorite | 102 comments



Just a reminder that https://www.torproject.org offers a free and open-source unzip-and-run Firefox to use a anonymizing network run mostly by volunteers.

Using Tor to anonymously and privately educate yourself about embarrassing or potentially ostracized problems with yourself is a great use of it. Just remember that you should not ever enter any identifying information while using it.

Tor is more than fast enough for every day browsing, heck I use it to watch Youtube without major problems. I also use it to read the news, find recipes or lyrics (or similarly shady web circles) etc.

If the other side does not need to know who you are and does not have to synchronize that information into a vast tracking/advertising network, why should you willingly submit it?


Isn't it ironic that to check your symptoms you have you use the same technology that you buy your cocaine with?


That's more tragic but I know what you mean.

"It has been said that capitalism is the worst form of Gov­ern­ment except for all those other forms that have been tried from time to time" (paraphrasing a speech Churchill gave about democracy).


If I download Tor I'll end up on one of the governments' lists. Is there a way to download it anonymously :-)


You're already on the lists.

If you don't use Tor, you're on the list and they've got ready access to your browsing data and metadata.

If you do use Tor, you're only on the list, and their workfactor for accessing your data and metadata is far higher.

Plus you're providing more cover for those who have strongly urgent needs for similar levels of protection.


That's one more reason to also convince your coworkers, acquaintances, friends etc. to download and try it, even if you know they probably won't need it :) The longer the lists, the harder it will be for anyone to be singled out from those lists and the lesser information the fact that we're on them will provide.

...the best place to hide is in a crowd, and the bigger the crowd the better :)


You can use GetTor[1] to download it via email. If you send an email to gettor@torproject.org with the body content `windows`, `linux` or `osx`, they send you a reply with a download link to Dropbox.

Then you need to verify the integrity using information they provide on their site.

[1] https://www.torproject.org/projects/gettor.html


Perhaps, but it might be hard to use Tor without said government knowing that you're using Tor. If someone can see your IP connecting to torproject's IP, they can surely also see your IP connecting to a known entry node.

On the other hand, it irks me that people should have to be afraid of wanting privacy. For my own part, I use Tor partly as "civil obstinacy" – if we do not exercise a right, we risk losing it. I feel it's wrong to target someone simply for using Tor / wanting privacy, and I think it should ideally be considered more normal.


You just talked to me. I run a Tor node so you are suspicious by acquaintance. Also you mentioned the selectors "government list", "anonymity" and "Tor". There is nothing to lose by downloading Tor now.


Why won't any browser's Incognito/InPrivate/Porn mode be enough? You need to prevent associating search queries with your logged in social accounts, Tor is kinda overkill for that.


IP tracking, browser fingerprinting, "supercookies", canvas fingerprinting, and other hacks are still a thing.

http://www.browserleaks.com/canvas http://www.techopedia.com/definition/27310/super-cookie https://panopticlick.eff.org/


There are a lot of ways that you can still be easily tracked in incognito mode such as browser fingerprints, supercookies, and WebRTC to name a few.


> You need to prevent associating search queries with your logged in social accounts

Every 3rd party HTTP request can already be used to track you -- being logged into a social account just makes the tracking way easier for them.

The prominent example to trigger a 3rd party request is the 1px GIF. Incognito mode won’t help you in this case.


My biggest single problem using Tor is that far too many sites either block exit nodes outright or subject them to (frankly, understandable) increased levels of scruitany.

I use and run a Tor proxy (_not_ an exit node, mind), but notably Craigslist tends to block pretty much _all_ Tor traffic, and sites employing Cloudflare's DDoS protection present a Javascript-only CAPTCHA. Given that one of my primary Tor browsers is a console-only browser without JS support, that does little for me.

Google despite other problems (below) actually Does The Right Thing and presents an image which I can fetch and verify, though many of the graphics are exceptionally difficult to interpret.

I've documented my own other hassles accessing Google via Tor (G+, Gmail, etc.) in "How to kill your Google account: Access it via Tor":

https://www.reddit.com/r/dredmorbius/comments/2w618r/how_to_...

https://news.ycombinator.com/item?id=9060922

(The problems were compounded by Google's account recovery and verification procedures, though I ultimately did recover control thanks in no small part to intercession by Google's Yontan Zunger, for which I remain grateful).

Other options include the /etc/hosts file mentioned above (I've extended my own set with 62,000+ entries from a set of blockfiles used by the uMatrix Chrome extension). There's also Privoxy (though supporting _both_ Tor and non-Tor variants might be useful), and various browser extensions including Ghostery, Privacy Badger, AdBlock+, uMatrix, ScriptSave/NoScript, etc.

It's getting more than slightly tedious and is eroding trust in the Web generally.

The other area of significant interest is seeing work toward reputation systems which are compatible with Tor use. There are two I'm aware of, FAUST and "Fair Anonymity", though I've seen little discussion or adoption of these anywhere.

FAUST: https://gnunet.org/node/1704

"Fair Anonymity for the Tor Network" http://arxiv.org/pdf/1412.4707v1.pdf

Briefly discussed here:

https://www.reddit.com/r/dredmorbius/comments/30gszt/the_bac...


Just a reminder that uBlock now blocks Google Analytics by default as well as Addthis:

https://news.ycombinator.com/item?id=8919523


Is uBlock effectively an AdBlock/AdBlock+ replacement?


Yes, though it's substantially more memory-efficient.

There's a fairly detailed description of the differences here:

https://github.com/chrisaljoudi/uBlock/wiki/What-uBlock-can-...


NoScript will prevent most of those evil trackings by default. e.g. cdc.gov displays fine without any JS, and google analytics or addthis on my untrusted list anyway.

Its still possible to browse without JS most of the time. Some pages are crippled by design, so disabling CSS might show the content. Others provide a escaped_fragment variant. But a stupid JS antipattern is sometimes used to display normal content with JS. One big problem are domains like ajax.google. This is often used to enhanced website, but google using it to track users.

When talking about evil Google, one needs to add YT. A friend of mine once claimed: You watch a stripper, if you visit YouPorn. But you strip your privacy, if you visit YouTube.


The CDN from which are served popular JavaScript libraries, ajax.googleapis.com, is not tracked. It's a cookie-less domain totally separate from google.com.


And they do not store IPs at all? All hops between you and them are NSA-proof?


Google's Blogger site is a tremendously flagrant example of this.


The original source paper is at http://arxiv.org/pdf/1404.1951.pdf

Much as this sort of thing makes me glad I don't need to purchase private health insurance, the article would be a lot more helpful if it distinguished more clearly between what is and isn't legal use of the data as well as between the Experians and Google Analytics of this world.

That said, the original source paper probably if any thing plays down the potential concerns, contending, for example that a URI like http://www.ncbi.nlm.nih.gov/pubmed/21722252 contains no symptom-specific information when any sufficiently motivated actor can write a scraper that links anonymous looking URIs on healthcare domains to conditions and symptoms referenced in the page content.


this sort of thing makes me glad I don't need to purchase private health insurance

Are you in the USA? Thanks to Obamacare your medical history doesn't matter anymore. I purchase my own insurance and only three things matter:

   your age

   your gender

   the type of coverage (bronze, silver, gold, etc)
It doesn't even matter whether you're single or married or have kids. My family policy cost is exactly the sum of:

   my policy cost based on age and gender

   my wife's policy cost based on age and gender

   each of my children's policy costs (I don't
   remember if age or gender matter, I don't think
   they do)
I generally don't like the idea of Obamacare, but in this case it did a lot of good. Before Obamacare, insurance companies went out of their way to simply not offer private coverage at all to people with any medical issues, even minor ones. They can't do that anymore.


They actually can't charge different premiums based on gender (this rule took effect in 2014).

They can charge tobacco users higher premiums in many states (there is a federal limit of 50% more, but states can impose a lower limit, and some do).


I think at this stage we need to consider this a part of how the internet works.

I'm far from an expert, but I do think that the majority of legislative efforts as well as many initiatives from browser makers are approaching this wrong. Privacy, as much transparency as possible and optional setting for anything that comes with a trade-off need to be built into the browser, and not as a request sent to websites.

Transacting, being logged in, and certainly browsing are not inherently hindered by privacy. It's up to users (or their browser really) to demand it, in the economic sense of demand.

For now, there is no cost to this kind of tracking so it happens almost by default. Moral or even legislative pressure will not have the same effect as economic pressure. The decision to protect users privacy or not needs to come with costs.


When the major browser developers directly profit from being able to track users across the web, they're not going to make modifications to the way browsers work to prevent tracking. Not in any meaningful way.

It's a shame so many people use Chrome. They're effectively giving an Ad company which specialises in tracking people, power to control how the web develops.


And, I've found no solution regarding polluting your history with obfuscated searches.

If I, Mr. Spy Provider, start seeing a single user who has every possible documentable illness, that user's search has been polluted and is worthless.

So, how does one do this? Someone needs to write a search algo that pulls 100 crap medical searches for every good one. All you need to do is query the 1px image on the page. I'm guessing that could be done with 10KB/illness search for privacy pollution.

Should we have to? No. But this is the reality we live in. We can use the tools to keep us from being "found", but we still are querying the server the content is on. Nothing we can do about them selling that log. But we can pollute that log.


Sadly I'm increasingly coming to the conclusion that fuzzing all search traffic in this manner is becoming a necessity. My concern is that it's still not sufficient. As Bruce Schneier notes, computers are exceptionally good at finding needles in haystacks, and even highly fuzzed data contains signal.

That said, there are browser extensions which run random/arbitrary background Web queries.


I would tentatively agree. Fuzzing does appear that it might work, but given the current corpus of what the spy companies have, this is probably a bad avenue of approach.

Instead, lets use The Pirate Bay. We can build a scrape of WebMD and a few other places. The front page would have every disease WebMD has. And then we upload it to TPB. Highly illegal, but it does solve the problem of tracking our individual illnesses.

Thoughts?


Wikipedia is frequently as good, and occasionally a better reference, than WebMD or other sources (I generally prefer Mayo to WebMD which is frankly spammy as fuck).

And Wikipedia's fully syndicable.

Improve Wikipedia medical content, syndicate.

Problem solved, laws unflouted.


So i open the page and Disconnect shows 36 tracking items blocked, and uBlock shows 18 more items blocked.

Awesome :)


I use Tor browser quite often, and this is a primary reason why. I have several times thought "hmm, I should not be typing this into the search box", especially when at a work or at a public network.


> But the chief problem is simply that just about all of the above, under current laws, is legal

In the US maybe, but I would guess the business practices of most data brokers are already completely illegal in the EU. We have many laws and requirements for keeping and selling data on EU citizens. I would welcome stronger actions against these companies in the EU.

But somehow I fear that enforcing EU laws on US companies is not part of the TTIP trade agreement under negotiation between US and EU.


The EU does have the requirement for websites to say they're using third-party cookies (e.g. from Google Analytics). The weakness is the poor wording chosen. "By browsing this website, you agree that a list of pages you visit will be sent to Google and ComScore" would have had a much stronger message. Perhaps follow on with "The visit to this page has already been tracked. To remove this information from Google/ComScore's servers, click here".


> they're using third-party cookies (e.g. from Google Analytics)

Google Analytics cookies are first-party (i.e. only available to the domain of the site).


Yeah I don't think these laws really work.

IIRC just few years ago when EU stared investigating Google and asked Google where does user data come in from, Google wasn't able to answer. I don't think they are able to track it anymore.


I can't understand why any company that cares about privacy would use Google Analytics over Piwik.


I suspect that is an easy one. Your company might care about privacy, but not be in a technical industry or expert on these kinds of issues.

Google Analytics can be set up in a few minutes by anyone who could set up their own web site in the first place.

Setting up Piwik means understanding this: http://piwik.org/docs/installation-maintenance/

If you run web sites for a living, the latter is no big deal. If your company is a florist and you just learned a bit of basic HTML to write your blog about flower arranging, what's a MySQL?


12 years of software engineering experience here.

I tried setting up Piwik for my website. Instead of showing N number of visitors, it consistently shows one or two. Tried googling, read the docs, nothing! Gave up. I have no idea why it fails so spectacularly.

Google Analytics will do just fine for now until better understanding my site's visitors becomes a more valuable proposition. Right now it's not worth it.


Check out Mozilla Lightbeam (aka Collusion) too: https://www.mozilla.org/en-US/lightbeam/


EFF PrivacyBadger + uBlock with lots of lists enabled blocks most of the tracking garbage. Sad state of affairs that basically every website is doing some kind of for-profit selling of their users.


Ghostery found one tracker (PiwikAnalytics) that PrivacyBadger didn't on the PrivacyBadger page itself.


I'm not sure there's much wrong with using Piwik, as long as it is self-hosted. If using Piwik is bad, then so is using Apache access logs.


> so is using Apache access logs.

This kind of might be. Ideally, anonymous is supposed to mean collecting no data at all.


That's just silly. It's next to impossible to provide good UX without knowing what your customers are doing on your site.


My guitar has superb UX. Are you suggesting it's tracking me? (Oh my!)


Try making an Internet-connected guitar, that, say, downloads its own effects, and see how easy it remains to use.


i stopped using EFF PrivacyBadger after i noticed it automatically adds cookie exceptions to Firefox which makes the SelfDestructingCookies addon useles. This 'bug' is stil not fixed at time of writing btw: https://github.com/EFForg/privacybadgerfirefox/issues/206#is...

So i started using Disconnect instead...


But, and I may be misunderstanding something, the page that I visit has the responsibility of serving these trackers? They call out to an adbroker, or analytics service, and they are responsible for the content surely? I mean if a newspaper prints a race hate ad for a neoNazi or FOX News runs porn adverts, they are the responsible party.

So it seems we could do with strong adblocking, but more useful (given spam email still exists) more useful will be actual enforced laws.

(I may be getting a bit old...)


My government doesn't even begin to know how to deal with internet laws, the only thing I've seen come down the pipe are laws designed to protect square peg business practices in round hole environments.

The only thing that can be done is to make privacy and ad blocking tools universally deployed, and let the fallout happen.


Firefox has recently released a tracking protection feature[1] that uses Disconnect's blocklist[2].

[1]https://support.mozilla.org/en-US/kb/tracking-protection-fir...

[2]https://disconnect.me/


Devil's Advocate: This data is important to public health. Search history for drugs is one of the best ways for companies, the public, and researchers to find out symptoms and the occurrence rate of symptoms. If that data is attached with location data then it gives them more pieces to the puzzle.


My default search engine is ixquick.com . The service has a nifty proxy with a convenient proxy link next to each search result. The proxy breaks a lot of sites (JS blocking) but usually lets me see enough to determine if it's worth revisiting via Tor.


Ghostery reports the following tracking beacons in the article itself.

===

Alexa Metrics

ChartBeat

Disqus

DoubleClick

eXelate

Facebook Connect

Google Adsense

Google AJAX Search API

Google Analytics

Google+ Platform

Krux Digital

Moat

NetRatings SiteCensus

Neustar AdAdvisor

PubMatic

Quantcast

Sailthru Horizon

ScoreCard Research Beacon

Twitter Button


Something about pots and kettles and the colour (or lack thereof) black...


Separation between business and editorial operations is usually construed as a positive.


That is why I am thinking that private health insurance that covers doctor visits are probably flawed.


... says the news site sending my data to The Nielsen Company.


I see vice asking to load shit from 12 other domains. Which one is neilson?


imrworldwide.com ... I might have missed the rest because I block JS and 3rd party requests by default, so I only see the initial requests, not those subsequently loaded.


Ghostery.. do yourself a favour and install it after reading this.

https://www.ghostery.com/en/download


uBlock Origin + Privacy Badger covers the bases better I would say.


A good use for duckduckgo


How does DDG help here? It doesn't matter how you find the site. It's still full of third-party elements that can be used to track you.


Yes, but you won't be tracked by the search. (Especially by Google/Bing who know a lot about you already)


Yes, one more brick in the wall. You can't have the wall without a lot of bricks.


These mostly serve one-pixel GIFs; sometimes I find one-byte javascript "sources".

Put these in your hosts file:

   127.0.0.1 aax.amazon-adsystem.com

   127.0.0.1 ad.crwdcntrl.net

   127.0.0.1 b.scorecardresearch.com

   127.0.0.1 gs.dailymail.co.uk

   127.0.0.1 gum.criteo.com

   127.0.0.1 i.dailymail.co.uk

   127.0.0.1 moat.pxl.ace.advertising.com

   127.0.0.1 pixel.mathtag.com

   127.0.0.1 pq-direct.revsci.net

   127.0.0.1 rta.dailymail.co.uk

   127.0.0.1 sync.go.sonobi.com

   127.0.0.1 t.dailymail.co.uk

   127.0.0.1 ted.dailymail.co.uk

   127.0.0.1 x.bidswitch.net

   127.0.0.1 www.google-analytics.com

   127.0.0.1 ssl.google-analytics.com

   127.0.0.1 www.hosted-pixel.com
Search with https://www.duckduckgo.com/ For extra credit use DuckDuckGo's Tor Hidden Service.


You shouldn't use loopback to block hostnames but instead use 0.0.0.0


Can you please explain why?


0.0.0.0 is a completely invalid destination IP, and any attempts to establish a connection to it will fail long before any packets are sent. 127.0.0.1 is still valid so you depend on the network stack to either receive a RST sent to itself, or time out.

At least on Windows, the former is much faster to return with an error.

It also avoids potential conflicts if you happen to need to run a server on port 80 for any reason.


This isn't true on non-windows OSes, where 0.0.0.0 (or simply 0) means all interfaces, not nothing. ssh to a web server and do curl 0 ; it's a super-useful shortcut.

Further reading: http://serverfault.com/questions/78048/whats-the-difference-...

http://en.m.wikipedia.org/wiki/0.0.0.0


When I ping 0.0.0.0, I get replies from 69.41.141.1. I don't know but expect that's earthlink's router.


That's odd... on Windows, pinging 0.0.0.0 results in "Destination specified is invalid."

On *nix, the behaviour is slightly different and is supposed to ping localhost instead (confirmed on one of my Linux machines):

http://unix.stackexchange.com/questions/99336/how-does-ping-...


Do the RFCs have anything to say about 0.0.0.0?

Perhaps it is left up to the implementation, in which case both Windows and *NIX would be, strictly speaking, correct.


When I ping 0.0.0.0, I get replies from 127.0.0.1 (running Ubuntu 14.04).


I understand that 0.0.0.0 is the local subnet broadcast address. Is that really the case?

I've puzzled over using 127.0.0.2. What I really want to get is an RST immediately upon sending a SYN.


Of that set, only these weren't in the comprehensive (60k+) hosts included in the Chrome extension uMatrix's hosts files (I've added these to my own /etc/hosts file): gs.dailymail.co.uk gum.criteo.com i.dailymail.co.uk moat.pxl.ace.advertising.com www.hosted-pixel.com

I strongly recommend using a hosts file these days, as well as tools such as uMatrix, privoxy, and where possible, Tor (more discussion below).

And thanks, by the way.


Since by far my browsing is done on an iPhone, any recommendations for dealing with hosts files? I am guessing scriptable solutions like Firefox but t be honest I mentally gave up when I don't have root


Not sure about iPhone, but there are AdBlock plugins for Android, and the hosts file itself is accessible if you jailbreak.

Better: install CyanogenMod.


http://ba.net/adblock is geared for iPhone iPad


VPN + Privoxy.


I very much prefer this one http://winhelp2002.mvps.org/hosts.txt

Edit: I see some of the ones you posted are not on the file.


See http://pgl.yoyo.org/adservers/ for a nice comprehensive list of hosts to block.


Anyone know how this compares to http://someonewhocares.org/hosts/ ?


This merges the two of the common lists, easy to add more: https://github.com/jakeogh/dnsmasq-blacklist


http://ba.net/adblock seems to block all these. They list 180k blocked ad and malware domains


I use DDG as my primary search now. Have been for months. It's getting a lot better. I'd say it's as good as Google about 95% of the time now. For about 5% of queries I end up bumping over to Google, but over time that's shrunk from ~15% of queries.


I've been using DDG for over a year now, and I haven't had to bump over to Google for a long time. In fact with the bangs I search pretty much everything with it.

I do wish the "defaults" were more trackless oriented though, for example, !i searches images... on google images.


Have you noticed any particular classes of search for which Google is still better than DDG?


Not a class of searches, but a feature: DDG does not have an equivalent of Google's time-based searches (e.g., last 24 hours, last week, etc.).


Hmm... hard to say, but the first thing to comes to mind is programming-oriented stuff. Google tends to find a more accurate hit there sometimes. Google also seems to do a better job filtering out spammy stuff like those irritating portal pottie sites that clone StackOverflow or mailing list content.

My pattern is if DDG finds nothing of value or brings up too much noise, hop over to Google and sometimes it does better. If I can't find it via either search engine it's probably not out there.

I will say that there's rarely a case where Google is worse than DDG... unless you count the tracking and near-monopoly issues but those are peripheral to the core function.


Or use uBlock or similar as well.


Recommended privacy extensions:

- Privacy Badger by the EFF: https://www.eff.org/privacybadger

- uBlock Origin (blocks ads as well): https://chrome.google.com/webstore/detail/ublock-origin/cjpa...

- HTTPS Everywhere by the EFF: https://www.eff.org/https-everywhere

I recommend running all three, they each do a different job and cover all the bases. If you are on Firefox, RequestPolicy (https://addons.mozilla.org/en-US/firefox/addon/requestpolicy...) is also useful but I find PrivacyBadger simply does the job better.


Nobody told me before that DDG has infinite scrolling! I'm sold!


I regard infinite scrolling as a Tool of the Devil.


I sometimes agree, but it seems pretty useful for search results. I mean, do you often find yourself bookmarking the second page of a google search?


I built this list with the Activity window from an old release of Safari. I expect there's lots of ways you could do it, I often contemplate such ways but this way works pretty well.

Just open the Activity window then visit a few random, at least somewhat sketchy websites. Pop-culture news sites are good. You'll see lots of 43 or 60 byte GIFs, commonly they will have lots of query parameters in the URLs, sometimes I see one-byte javascripts.


https://www.requestpolicy.com/ is useful in this regard as well. - Firefox plugin to block cross-domain content.

lots of websites pull shit from 20+ 3rd party domains. An obnoxious number of sites block/don't render completely ( or at all ) without allowing shit from all over the web to load.


I used to use RequestPolicy, but found Policeman is better.


Strictly speaking a one-pixel gif could be legitimate. What I find especially sketchy is that the URLs that my browser actually fetches are huge, long and full of query parameters.


I do not want one-pixel gifs, because the only use of such techniques is exploiting the HTTP "referer" field.


They used to be somewhat legitimately used for spacing and sizing table-based HTML layouts (through the mid-2000s or so). That's rarely done today -- layout is in CSS and gifs are generally used for tracking.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: