
Looking Up Symptoms Online? These Companies Are Tracking You - sinak
http://motherboard.vice.com/read/looking-up-symptoms-online-these-companies-are-collecting-your-data
======
aw3c2
Just a reminder that [https://www.torproject.org](https://www.torproject.org)
offers a free and open-source unzip-and-run Firefox to use a anonymizing
network run mostly by volunteers.

Using Tor to anonymously and privately educate yourself about embarrassing or
potentially ostracized problems with yourself is a great use of it. Just
remember that you should not ever enter any identifying information while
using it.

Tor is more than fast enough for every day browsing, heck I use it to watch
Youtube without major problems. I also use it to read the news, find recipes
or lyrics (or similarly shady web circles) etc.

If the other side does not _need_ to know who you are and does not _have to_
synchronize that information into a vast tracking/advertising network, why
should you willingly submit it?

~~~
andy_ppp
If I download Tor I'll end up on one of the governments' lists. Is there a way
to download it anonymously :-)

~~~
dredmorbius
You're already on the lists.

If you _don 't_ use Tor, you're on the list _and_ they've got ready access to
your browsing data and metadata.

If you do use Tor, you're only on the list, and their workfactor for accessing
your data and metadata is far higher.

Plus you're providing more cover for those who have strongly urgent needs for
similar levels of protection.

------
SG-
Just a reminder that uBlock now blocks Google Analytics by default as well as
Addthis:

[https://news.ycombinator.com/item?id=8919523](https://news.ycombinator.com/item?id=8919523)

~~~
dredmorbius
Is uBlock effectively an AdBlock/AdBlock+ replacement?

~~~
oostevo
Yes, though it's substantially more memory-efficient.

There's a fairly detailed description of the differences here:

[https://github.com/chrisaljoudi/uBlock/wiki/What-uBlock-
can-...](https://github.com/chrisaljoudi/uBlock/wiki/What-uBlock-can-and-can-
not-%28currently%29-do)

------
kephra
NoScript will prevent most of those evil trackings by default. e.g. cdc.gov
displays fine without any JS, and google analytics or addthis on my untrusted
list anyway.

Its still possible to browse without JS most of the time. Some pages are
crippled by design, so disabling CSS might show the content. Others provide a
escaped_fragment variant. But a stupid JS antipattern is sometimes used to
display normal content with JS. One big problem are domains like ajax.google.
This is often used to enhanced website, but google using it to track users.

When talking about evil Google, one needs to add YT. A friend of mine once
claimed: You watch a stripper, if you visit YouPorn. But you strip your
privacy, if you visit YouTube.

~~~
moogly
The CDN from which are served popular JavaScript libraries,
ajax.googleapis.com, is not tracked. It's a cookie-less domain totally
separate from google.com.

~~~
ncza
And they do not store IPs at all? All hops between you and them are NSA-proof?

------
notahacker
The original source paper is at
[http://arxiv.org/pdf/1404.1951.pdf](http://arxiv.org/pdf/1404.1951.pdf)

Much as this sort of thing makes me glad I don't need to purchase private
health insurance, the article would be a lot more helpful if it distinguished
more clearly between what is and isn't legal use of the data as well as
between the Experians and Google Analytics of this world.

That said, the original source paper probably if any thing plays down the
potential concerns, contending, for example that a URI like
[http://www.ncbi.nlm.nih.gov/pubmed/21722252](http://www.ncbi.nlm.nih.gov/pubmed/21722252)
contains no symptom-specific information when any sufficiently motivated actor
can write a scraper that links anonymous looking URIs on healthcare domains to
conditions and symptoms referenced in the page content.

~~~
PhantomGremlin
_this sort of thing makes me glad I don 't need to purchase private health
insurance_

Are you in the USA? Thanks to Obamacare your medical history doesn't matter
anymore. I purchase my own insurance and only three things matter:

    
    
       your age
    
       your gender
    
       the type of coverage (bronze, silver, gold, etc)
    

It doesn't even matter whether you're single or married or have kids. My
family policy cost is exactly the sum of:

    
    
       my policy cost based on age and gender
    
       my wife's policy cost based on age and gender
    
       each of my children's policy costs (I don't
       remember if age or gender matter, I don't think
       they do)
    

I generally don't like the idea of Obamacare, but in this case it did a lot of
good. Before Obamacare, insurance companies went out of their way to simply
not offer private coverage _at all_ to people with any medical issues, even
minor ones. They can't do that anymore.

~~~
maxerickson
They actually can't charge different premiums based on gender (this rule took
effect in 2014).

They _can_ charge tobacco users higher premiums in many states (there is a
federal limit of 50% more, but states can impose a lower limit, and some do).

------
netcan
I think at this stage we need to consider this a part of how the internet
works.

I'm far from an expert, but I do think that the majority of legislative
efforts as well as many initiatives from browser makers are approaching this
wrong. Privacy, as much transparency as possible and optional setting for
anything that comes with a trade-off need to be built into the browser, and
not as a request sent to websites.

Transacting, being logged in, and certainly browsing are not inherently
hindered by privacy. It's up to users (or their browser really) to demand it,
in the economic sense of demand.

For now, there is no cost to this kind of tracking so it happens almost by
default. Moral or even legislative pressure will not have the same effect as
economic pressure. The decision to protect users privacy or not needs to come
with costs.

~~~
mike-cardwell
When the major browser developers directly profit from being able to track
users across the web, they're not going to make modifications to the way
browsers work to prevent tracking. Not in any meaningful way.

It's a shame so many people use Chrome. They're effectively giving an Ad
company which specialises in tracking people, power to control how the web
develops.

------
kefka
And, I've found no solution regarding polluting your history with obfuscated
searches.

If I, Mr. Spy Provider, start seeing a single user who has every possible
documentable illness, that user's search has been polluted and is worthless.

So, how does one do this? Someone needs to write a search algo that pulls 100
crap medical searches for every good one. All you need to do is query the 1px
image on the page. I'm guessing that could be done with 10KB/illness search
for privacy pollution.

Should we have to? No. But this is the reality we live in. We can use the
tools to keep us from being "found", but we still are querying the server the
content is on. Nothing we can do about them selling that log. But we can
pollute that log.

~~~
dredmorbius
Sadly I'm increasingly coming to the conclusion that fuzzing _all_ search
traffic in this manner is becoming a necessity. My concern is that it's
_still_ not sufficient. As Bruce Schneier notes, computers are exceptionally
good at finding needles in haystacks, and even highly fuzzed data contains
signal.

That said, there are browser extensions which run random/arbitrary background
Web queries.

~~~
kefka
I would tentatively agree. Fuzzing does appear that it might work, but given
the current corpus of what the spy companies have, this is probably a bad
avenue of approach.

Instead, lets use The Pirate Bay. We can build a scrape of WebMD and a few
other places. The front page would have every disease WebMD has. And then we
upload it to TPB. Highly illegal, but it does solve the problem of tracking
our individual illnesses.

Thoughts?

~~~
dredmorbius
Wikipedia is frequently as good, and occasionally a better reference, than
WebMD or other sources (I generally prefer Mayo to WebMD which is frankly
spammy as fuck).

And Wikipedia's fully syndicable.

Improve Wikipedia medical content, syndicate.

Problem solved, laws unflouted.

------
DavideNL
So i open the page and Disconnect shows 36 tracking items blocked, and uBlock
shows 18 more items blocked.

Awesome :)

------
belorn
I use Tor browser quite often, and this is a primary reason why. I have
several times thought "hmm, I should not be typing this into the search box",
especially when at a work or at a public network.

------
Maarten88
> But the chief problem is simply that just about all of the above, under
> current laws, is legal

In the US maybe, but I would guess the business practices of most data brokers
are already completely illegal in the EU. We have many laws and requirements
for keeping and selling data on EU citizens. I would welcome stronger actions
against these companies in the EU.

But somehow I fear that enforcing EU laws on US companies is not part of the
TTIP trade agreement under negotiation between US and EU.

~~~
Symbiote
The EU does have the requirement for websites to say they're using third-party
cookies (e.g. from Google Analytics). The weakness is the poor wording chosen.
"By browsing this website, you agree that a list of pages you visit will be
sent to Google and ComScore" would have had a much stronger message. Perhaps
follow on with "The visit to this page has already been tracked. To remove
this information from Google/ComScore's servers, click here".

~~~
magicalist
> _they 're using third-party cookies (e.g. from Google Analytics)_

Google Analytics cookies are first-party (i.e. only available to the domain of
the site).

------
BillFranklin
I can't understand why any company that cares about privacy would use Google
Analytics over Piwik.

~~~
Silhouette
I suspect that is an easy one. Your company might care about privacy, but not
be in a technical industry or expert on these kinds of issues.

Google Analytics can be set up in a few minutes by anyone who could set up
their own web site in the first place.

Setting up Piwik means understanding this:
[http://piwik.org/docs/installation-
maintenance/](http://piwik.org/docs/installation-maintenance/)

If you run web sites for a living, the latter is no big deal. If your company
is a florist and you just learned a bit of basic HTML to write your blog about
flower arranging, what's a MySQL?

------
shmerl
Check out Mozilla Lightbeam (aka Collusion) too: [https://www.mozilla.org/en-
US/lightbeam/](https://www.mozilla.org/en-US/lightbeam/)

------
seanp2k2
EFF PrivacyBadger + uBlock with lots of lists enabled blocks most of the
tracking garbage. Sad state of affairs that basically every website is doing
some kind of for-profit selling of their users.

~~~
perdunov
Ghostery found one tracker (PiwikAnalytics) that PrivacyBadger didn't on the
PrivacyBadger page itself.

~~~
mike-cardwell
I'm not sure there's much wrong with using Piwik, as long as it is self-
hosted. If using Piwik is bad, then so is using Apache access logs.

~~~
perdunov
> so is using Apache access logs.

This kind of might be. Ideally, anonymous is supposed to mean collecting no
data at all.

~~~
vinceguidry
That's just silly. It's next to impossible to provide good UX without knowing
what your customers are doing on your site.

~~~
perdunov
My guitar has superb UX. Are you suggesting it's tracking me? (Oh my!)

~~~
vinceguidry
Try making an Internet-connected guitar, that, say, downloads its own effects,
and see how easy it remains to use.

------
lifeisstillgood
But, and I may be misunderstanding something, the page that I visit has the
responsibility of serving these trackers? They call out to an adbroker, or
analytics service, and they are responsible for the content surely? I mean if
a newspaper prints a race hate ad for a neoNazi or FOX News runs porn adverts,
they are the responsible party.

So it seems we could do with strong adblocking, but more useful (given spam
email still exists) more useful will be actual enforced laws.

(I may be getting a bit old...)

~~~
GhotiFish
My government doesn't even begin to know how to deal with internet laws, the
only thing I've seen come down the pipe are laws designed to protect square
peg business practices in round hole environments.

The only thing that can be done is to make privacy and ad blocking tools
universally deployed, and let the fallout happen.

------
decisiveness
Firefox has recently released a tracking protection feature[1] that uses
Disconnect's blocklist[2].

[1][https://support.mozilla.org/en-US/kb/tracking-protection-
fir...](https://support.mozilla.org/en-US/kb/tracking-protection-firefox)

[2][https://disconnect.me/](https://disconnect.me/)

------
dm2
Devil's Advocate: This data is important to public health. Search history for
drugs is one of the best ways for companies, the public, and researchers to
find out symptoms and the occurrence rate of symptoms. If that data is
attached with location data then it gives them more pieces to the puzzle.

------
brightsize
My default search engine is ixquick.com . The service has a nifty proxy with a
convenient proxy link next to each search result. The proxy breaks a lot of
sites (JS blocking) but usually lets me see enough to determine if it's worth
revisiting via Tor.

------
hownottowrite
Ghostery reports the following tracking beacons in the article itself.

===

Alexa Metrics

ChartBeat

Disqus

DoubleClick

eXelate

Facebook Connect

Google Adsense

Google AJAX Search API

Google Analytics

Google+ Platform

Krux Digital

Moat

NetRatings SiteCensus

Neustar AdAdvisor

PubMatic

Quantcast

Sailthru Horizon

ScoreCard Research Beacon

Twitter Button

~~~
swombat
Something about pots and kettles and the colour (or lack thereof) black...

~~~
maxerickson
Separation between business and editorial operations is usually construed as a
positive.

------
yuhong
That is why I am thinking that private health insurance that covers doctor
visits are probably flawed.

------
logn
... says the news site sending my data to The Nielsen Company.

~~~
GoodIntentions
I see vice asking to load shit from 12 other domains. Which one is neilson?

~~~
logn
imrworldwide.com ... I might have missed the rest because I block JS and 3rd
party requests by default, so I only see the initial requests, not those
subsequently loaded.

------
kefs
Ghostery.. do yourself a favour and install it after reading this.

[https://www.ghostery.com/en/download](https://www.ghostery.com/en/download)

~~~
scrollaway
uBlock Origin + Privacy Badger covers the bases better I would say.

------
raverbashing
A good use for duckduckgo

~~~
blfr
How does DDG help here? It doesn't matter how you find the site. It's still
full of third-party elements that can be used to track you.

~~~
raverbashing
Yes, but you won't be tracked by the search. (Especially by Google/Bing who
know a lot about you already)

~~~
a3n
Yes, one more brick in the wall. You can't have the wall without a lot of
bricks.

------
MichaelCrawford
These mostly serve one-pixel GIFs; sometimes I find one-byte javascript
"sources".

Put these in your hosts file:

    
    
       127.0.0.1 aax.amazon-adsystem.com
    
       127.0.0.1 ad.crwdcntrl.net
    
       127.0.0.1 b.scorecardresearch.com
    
       127.0.0.1 gs.dailymail.co.uk
    
       127.0.0.1 gum.criteo.com
    
       127.0.0.1 i.dailymail.co.uk
    
       127.0.0.1 moat.pxl.ace.advertising.com
    
       127.0.0.1 pixel.mathtag.com
    
       127.0.0.1 pq-direct.revsci.net
    
       127.0.0.1 rta.dailymail.co.uk
    
       127.0.0.1 sync.go.sonobi.com
    
       127.0.0.1 t.dailymail.co.uk
    
       127.0.0.1 ted.dailymail.co.uk
    
       127.0.0.1 x.bidswitch.net
    
       127.0.0.1 www.google-analytics.com
    
       127.0.0.1 ssl.google-analytics.com
    
       127.0.0.1 www.hosted-pixel.com
    

Search with [https://www.duckduckgo.com/](https://www.duckduckgo.com/) For
extra credit use DuckDuckGo's Tor Hidden Service.

~~~
eterm
You shouldn't use loopback to block hostnames but instead use 0.0.0.0

~~~
middleclick
Can you please explain why?

~~~
userbinator
0.0.0.0 is a completely invalid destination IP, and any attempts to establish
a connection to it will fail long before any packets are sent. 127.0.0.1 is
still valid so you depend on the network stack to either receive a RST sent to
itself, or time out.

At least on Windows, the former is _much_ faster to return with an error.

It also avoids potential conflicts if you happen to need to run a server on
port 80 for any reason.

~~~
MichaelCrawford
When I ping 0.0.0.0, I get replies from 69.41.141.1. I don't know but expect
that's earthlink's router.

~~~
userbinator
That's odd... on Windows, pinging 0.0.0.0 results in "Destination specified is
invalid."

On *nix, the behaviour is slightly different and is supposed to ping localhost
instead (confirmed on one of my Linux machines):

[http://unix.stackexchange.com/questions/99336/how-does-
ping-...](http://unix.stackexchange.com/questions/99336/how-does-ping-zero-
works)

~~~
MichaelCrawford
Do the RFCs have anything to say about 0.0.0.0?

Perhaps it is left up to the implementation, in which case both Windows and
*NIX would be, strictly speaking, correct.

