
HealthCare.gov Sends Personal Data to Dozens of Tracking Websites - markolschesky
https://www.eff.org/deeplinks/2015/01/healthcare.gov-sends-personal-data
======
zaroth
This is pretty shocking. What is PII doing in the query string in the first
place? Disclosing pregnancy status from an insurance application sounds like a
possible HIPPA violation and runs afoul of various state laws around
'Insurance Information and Privacy Protection'. E.g
[http://www.leginfo.ca.gov/cgi-
bin/displaycode?section=ins&gr...](http://www.leginfo.ca.gov/cgi-
bin/displaycode?section=ins&group=00001-01000&file=791-791.29). See Section
791.13(k). That's just CA law but many states followed with their own version.
(IANAL)

I think the really big penalties come into play when medical information is
'personally identifiable'. Since this data is going to Google, Facebook, and
Twitter (really?!) with 3rd party cookies, or even without, it would be hard
to argue this data is not personally identifiable.

It's not like they didn't know they weren't sending this data out. Or perhaps
the highly advanced debugging prowess of "Chrome Inspector" is beyond their
pay grade.

Edit: Oh it's not even just Referral leak it's actually it the request in some
cases, so blatantly intentional. :-(

~~~
threeseed
> Oh it's not even just Referral leak it's actually it the request in some
> cases, so blatantly intentional.

Be careful about throwing the term intentional around. There is nothing to
suggest this is the case. It's just a shocking breakdown in security/testing
processes and/or a bug. We see security/privacy issues everyday. They are
almost never intentional.

~~~
Alupis
> They are almost never intentional.

Somebody had to specifically code the application to concatenate into the
string:

> smoker=1&parent=&pregnant=1&mec=&zip=85601&state=AZ&income=35000

~~~
acdha
Reading the example more closely, that's part of a URL:

[https://4037109.fls.doubleclick.net/activityi;src=4037109;ty...](https://4037109.fls.doubleclick.net/activityi;src=4037109;type=20142003;cat=201420;ord=7917385912018;~oref=https://www.healthcare.gov/see-
plans/85601/results/?county=04019&age=40&smoker=1&parent=&pregnant=1&mec=&zip=85601&state=AZ&income=35000&step=4)?

Unfortunately, a quick Google search doesn't explain what the oref parameter
does but from the name I'm assuming it's something like "original referrer".

You don't need malice to explain this – it's entirely plausible to imagine
that some people wanted to track user activities and they had a staggering
lapse in HIPAA auditing due to the rush of getting the site out and
stabilized.

~~~
shard972
> it's entirely plausible to imagine that some people wanted to track user
> activities and they had a staggering lapse in HIPAA auditing due to the rush
> of getting the site out and stabilized.

Considering they spent 1.7 billion on the site, I simply cannot believe that
they were so unorganised and lazy on their testing that they couldn't find
this. Otherwise I don't know what to think anymore.

~~~
GabrielF00
I don't think it's accurate to say "they spent 1.7 billion on the site".

I think the $1.7 billion figure comes from this OIG report
[http://oig.hhs.gov/oei/reports/oei-03-14-00231.pdf](http://oig.hhs.gov/oei/reports/oei-03-14-00231.pdf)

However, the OIG report has a number of important caveats:

* The list of 60 contracts in the report includes contracts to support state websites and for programs unrelated to the website (for instance, I found an $85 million contract related to accountable care organizations, which doesn't seem to have any connection to the website).

* The $1.7 billion is not the amount expended, it's the estimated value at the time the contract was awarded if all the options are exercised. When you look at the individual contracts, this estimated value turns out not to be very useful. Some contracts had double the estimated expenditure, some had $0 expended. Looking at the total amount expended, you get a figure of $500 million.

So I think it's more reasonable to say that they spent $500 million on various
projects to implement the law, including both the user-facing website and all
the behind-the-scenes stuff.

------
devindotcom
_Spokesman Aaron Albright said outside vendors "are prohibited from using
information from these tools on HealthCare.gov for their companies' purposes."
The government uses them to measure the performance of HealthCare.gov so
consumers get "a simpler, more streamlined and intuitive experience," he
added._

It's one thing to send session length, general location, usage stuff like that
to see where, for example, awareness campaigns might be needed. But really:

    
    
      smoker=1&parent=&pregnant=1&mec=&zip=85601&state=AZ&income=35000
    

That's a bit much! And I suppose DoubleClick is carefully siloing this
information so it doesn't accidentally perform all kinds of analysis on it for
comparison with its other huge databases? Perhaps they are barred from selling
it wholesale to data brokers but I can't imagine they are unable to use it for
plenty of their own purposes.

~~~
Arnor
I'm sure you're right that the information is getting used by DoubleClick.

To the other point about this information not being appropriate for the
purposes Albright mentioned: Isn't this exactly the information that a health
insurance company wants to know for outreach? If I know that all the pregnant
women in Tuscon are signing up but none from Pheonix, I suddenly know where to
put my next billboard or field office.

If this was a private sector company, nobody would be surprised at collecting
this data. It would also be a different story if the data was being stored and
analyzed in house or even if the doubleclick request happened on the server
side instead of the client.

I agree with the general sentiment that this is a privacy violation, but
that's because of the way that the data is collected and who processes it, not
the collection and use of the data generally.

~~~
Arnor
Confused...

------
j_s
Paging HN user brandonb and 'a bunch of other Google, Facebook, and Y
Combinator alums' \-- did this exist while you worked on the site?

    
    
      > I've been working on healthcare.gov for the last few months
    

[https://news.ycombinator.com/item?id=7312442](https://news.ycombinator.com/item?id=7312442)

~~~
brandonb
We weren't involved with this specific part of the site but folks are on it!

(I wrapped up my involvement several months ago, but others helping out with
this open enrollment period.)

~~~
bashinator
> folks are on it!

Am I correct in thinking that the cheery use of passive voice means you're
under quite a serious NDA?

~~~
brandonb
There is an NDA but I used "folks" since I, personally, have returned to the
startup world and am not involved in the details of fixing this particular
incident. But yes, rest assured that the people who currently work on
healthcare.gov are busy testing a fix, which is why they're not posting on HN.

------
drylight
Give $563M to Accenture and you get some really shoddy work
[http://www.healthcaredive.com/news/accenture-snags-
new-5-yea...](http://www.healthcaredive.com/news/accenture-snags-new-5-year-
healthcaregov-contract-for-563m/347935/)

------
declan
An additional problem, as I see it, is that the Obama administration made
unambiguous assurances that no PII was being collected as part of
Healthcare.gov's use of web measurement tools. Here's the excerpt from the
privacy policy:

 _HealthCare.gov uses a variety of Web measurement software tools. We use them
to collect the information listed in the “Types of information collected”
section above. The tools collect information automatically and continuously.
No personally identifiable information is collected by these tools._
[https://www.healthcare.gov/privacy/](https://www.healthcare.gov/privacy/)

Note the last sentence is in bold on the actual web page.

A Department of Health and Human Services organ called the Centers for
Medicare & Medicaid Services is responsible for the site. An enterprising HN
reader might want to skim through the CMS (very long) privacy impact
assessment to see if there are any other incorrect claims about
Healthcare.gov: [http://www.hhs.gov/pia/cms-pia-summary-
fy12q4.pdf](http://www.hhs.gov/pia/cms-pia-summary-fy12q4.pdf)

It will be interesting to see if anyone gets fired as a result of this
particular privacy screwup. The buck should stop _somewhere_ , right?

~~~
jobposter1234
>A Department of Health and Human Services organ called the Centers for
Medicare & Medicaid Services is responsible for the site. An enterprising HN
reader might want to skim through the CMS (very long) privacy impact
assessment to see if there are any other incorrect claims about
Healthcare.gov: [http://www.hhs.gov/pia/cms-pia-summary-
fy12q4.pdf](http://www.hhs.gov/pia/cms-pia-summary-fy12q4.pdf)

Is there any way to split this up so each person is responsible for a section?
you'd miss a lot by missing context... but if the section readers bullet
pointed everything, that could be combined into a larger context.

Or, in HN speak, we could crowdsource a real-world Map/Reduce job to support
big data in the citizen-scientist.

~~~
declan
I love the idea of a real-world map/reduce job. :) But before spending any
time on this, please make sure it's the right PDF. It does mention
Healthcare.gov, but only a few times, and I'm no expert on HHS organizational
structure. Here's the full directory of PIAs:
[http://www.hhs.gov/pia/](http://www.hhs.gov/pia/)

------
garazy
Looks like a few of the tracking companies only just started to appear -

[http://builtwith.com/detailed/healthcare.gov](http://builtwith.com/detailed/healthcare.gov)

The only non-ad tool they added was the Twitter Platform to their homepage.
Lots of data leakage points though.

------
jayess
Isn't this a HIPAA violation?

~~~
Kikawala
No. I don't see any protected health information.

~~~
honksillet
# of pregnancies is a violation. One could infer whether the individual is
pregnant, has had a miscarriage or even an abortion.

------
tedunangst
Don't blame the browsers for continuing to send Referer headers though.
Because browsers take your privacy seriously.

------
bagels
I'm wondering whose doubleclick account those ad dollars are ending up in.

~~~
btian
Not ad account. I think doubleclick is used for surveys.

~~~
JeremyMorgan
Nope.
[http://en.wikipedia.org/wiki/DoubleClick](http://en.wikipedia.org/wiki/DoubleClick)

DoubleClick is a subsidiary of Google which develops and provides Internet ad
serving services

~~~
btian
I'm saying in this case. Doubleclick accounts are usually for ads.

------
seccess
This is certainly scary stuff, but I was a bit annoyed with the line:

"...consequences such as when Target notified a woman's family that she was
pregnant before she even told them. "

I've heard this story referenced time and again with respect to motivating
people to care about privacy and tracking. I'm all for privacy, but I feel
like: (a) we should have more recent anecdotes about the consequences of
tracking than a story from 2012, (b) the mechanism that Target used to infer
this is far less intrusive (not making it OK) than what we see here, and (c)
its really not strong enough an example.

Not that speculation is the way to go, but what about the possibility of
someone being turned down for life insurance due to this information?

~~~
protomyth
Well, it is a simple example and has the virtue of being true instead of the
often quoted but misrepresented McDonalds hot coffee story. Simple examples
showing a situation are best, and much like iOS bug statistics, the parties
who would have the statistics on situations caused by tracking are never going
to make them public.

------
jtheory
They don't even get into the repercussions of loading externally-hosted
JavaScript into a secure page.

We avoid this entirely (also hosting medical data), though it's been a bit of
extra work to do so.

I'm sure Chartbeat, Mathtag, Mixpanel, Google, etc. are reasonably careful
about their security, and of course they would suffer as well if one of the
servers/scripts was compromised and the breach was made public.

But in short -- healthcare.org's security _relies_ on the idea that _none_ of
these many 3rd parties will ever have a CDN server compromised, for example.
Or (in other situations) have the NSA demand access.

It just takes one -- and then an "improved" script could be delivered to only
clients visiting a single targeted site, or even specific targeted clients.
The normal customer just sees the lock icon and can verify that there's a
secure connection to the main host; but there are actually many other
connections going on to other hosts, and any of them may provide a script that
can access any sensitive data on the page.

------
mindslight
What else could one _possibly_ expect when an industry has succeeded at
convincing the government to make buying their product mandatory?!

I know the EFF focuses specifically on informational issues, but stirring
outrage over one abuse of a captive market when such abuses are _by design_ is
a disservice to general sanity.

~~~
threeseed
The ACA model in the US is very similar to what exists in many places in the
world e.g. here in Australia. And it was driven from the needs of the
government not the needs of the health care industry. Although they are a
beneficiary. That said the model really does work.

The fact is that uninsured people has a devastating effect on the economy. It
prevents movement of labour, affects productivity, promotion to higher socio-
economic levels, prevents people starting businesses, affects crime and
countless other social effects. You need to force people who don't think they
need it to have it.

~~~
brc
From experience, an Australian doesn't have the correct frame of reference to
even engage in the US healthcare debate. I have tried to understand the issues
many times but it comes from such a fragmented starting point it's difficult
to understand unless you've been in it for a long, long time.

Your points about uninsured are valid, but it's much more complicated than
just saying 'hey, you guys should insure everyone'. So I generally try and
observe from the sidelines.

------
fubarred
Currently, [https://disconnect.me/](https://disconnect.me/) browser extension
says [https://www.healthcare.gov/](https://www.healthcare.gov/) uses:

\- 0 Facebook, 3 Google, 0 Twitter

\- 0 Advertising

\- 6 Analytics: 1 ClickTale, 4 MixPanel, 1 Chartbeat

\- 0 Social

\- 6 Content: 3 Google, 3 Optimizely

------
EdSharkey
More government doing shitty things not in its charter. I'm numb to this
abuse. Next up: increased taxes + inflation.

I hope I live to see the day that the laws are twisted and shredded such that
all corporate-government data about every person is available for purchase.
I'd love to have that detailed record of everything I've said, thought, places
I've been, etc since ~Y2K. How cool would that be?

I've heard it said that future cultural anthropologists of the future will
absolutely love mining the rich personal data coming out of this period of
time.

~~~
at-fates-hands
>>> I've heard it said that future cultural anthropologists of the future will
absolutely love mining the rich personal data coming out of this period of
time.

Former Anthropologist here.

While culturally speaking it will be interesting, up to a certain point in
human history there has always been _physical_ things left behind by cultures
to denote their existence.

As our whole lives have become digital, once the servers are gone, the pseudo
physical evidence will vanish. One of my professors told me in passing in the
early aughts that, "This generation (meaning the Y generation) will barely
leave a trace of its existence in 200 years."

He inferred that once technology has evolved past our current rate of burn,
the mechanisms by which we preserve our memories will be forever wiped out. He
made a note of saying, "When was the last time you used something _physical_
to create, retain or share your memories?" When was the last time you printed
a photograph? Listened to a music _album_? Once the devices by which we save
our memories become obsolete, so does our existence.

It caught me off guard, and was. . one of those times where you stop and
wonder what people will dig up in 2-300 years from now and discover about our
civilization? Will it all just be zero's and one's on a server somewhere?

~~~
paulfurtado
Couldn't you say that hard drives, SSDs, tape backups, etc are all still
physical mediums? While these mediums lose data over time, forensics will
still be able to recover partial data, similar to other physical mediums (pen
and paper, photos, etc).

~~~
jcrites
Those are usually destroyed when their useful life ends, exactly because
someone might dig them up later and extract data from them. Large corporate
data centers, for example, physically destroy hard disks and never allow them
to leave the facility intact.

There will be hard drives left around by individual consumers, I suppose, but
the vast majority of all those that exist today are likely to be deliberately
destroyed. We're so good at copying and replicating data these days that we no
longer rely on hard drives for data permanence over long periods.

------
stephenhess
If you're looking for a better place to go than healthcare.gov, give us a try
at stridehealth.com. Bunch of ex-privacy folks and healthcare folks - can shop
from your phone. Pretty shocking to see such a novice mistake by an org I
think we were all expecting to take it up a level this year.

------
natmaster
Don't worry guys, Obama's friend's companies will use that information to sell
you better products. It's for your own good!

------
dkroy
I heard something awhile back about the us government(NSA) leveraging the
cookie in a way that they could use it as a surveillance beacon. I doubt there
is any relation, but it makes you think a bit.

[1] [https://www.eff.org/deeplinks/2013/12/nsa-turns-cookies-
and-...](https://www.eff.org/deeplinks/2013/12/nsa-turns-cookies-and-more-
surveillance-beacons)

~~~
bhhaskin
I am sure the NSA are involved with at least some aspect of all .gov
addresses.

------
kumarm
One of the heavily trafficked sites in India (Railway Booking) has been
showing Google adsense ads. Someone is making a Million dollars a month in
Government :)

[https://www.irctc.co.in/eticketing/loginHome.jsf](https://www.irctc.co.in/eticketing/loginHome.jsf)

536 Global Rank (50 in India):
[http://www.alexa.com/siteinfo/www.irctc.co.in](http://www.alexa.com/siteinfo/www.irctc.co.in)

~~~
nathanaldensr
What does this have to do with the topic of this thread?

