
Dear Google, am I pregnant? - cstross
http://www.antipope.org/charlie/blog-static/2014/03/dear-google-am-i-pregnant.html
======
patio11
_Side note: In the USA, doing this would be a federal offence under Title II
of HIPAA._

It would probably depend on whether the consultancy got a BAA signed by
Google. That would obligate Big Daddy G to a host of security and privacy
requirements, but would extinguish the HIPAA liability for the consultancy
with regards to the data while it was in Google's care. You can think of it
sort of like cert-chaining -- the government wants to see a chain of BAAs
linking all subcontractors that data stops at from the original covered entity
(e.g. a hospital).

Source: I have signed a few and gotten a few signed. Talk to a lawyer if you
need implementation advice on this, though.

~~~
JunkDNA
There is another loophole that is admittedly unlikely (and the post doesn't go
into any details on what the actual records contain). If these records were
somehow scrubbed of HIPAA identifiers, then it would in fact not be a HIPAA
violation in the US. For example: a dataset of randomly assigned ID's tied to
diagnosis codes. You could uniquely ID an individual within the dataset but
not know who they were in the real world.

I hear all the privacy folks lighting their torches and sharpening the
pitchforks. So, for the record, yes, there are all sorts of methods and
studies that show you can _potentially_ re-identify people from all sorts of
data that seems at first blush to be not that identifiable [1] and isn't part
of the list of HIPAA identifiers. However, in the US, in actual practice, when
you talk to compliance people, they often take a very narrow view of what
"identifiable" is. The standard is often that it has to be more or less
trivial to do. For example, matching on easily accessible public records.

I encounter this all the time in my capacity as a biomedical researcher and
have discovered that my "geek intuition" on what is identifiable does me no
good in this space. The most crazy one is your DNA sequence. I'm having
trouble finding the original document now, but Health and Human Services went
out of their way to _not_ make this a formal HIPAA identifier (except in very
narrow cases relating to insurance companies) when they had the opportunity to
do so during some recent rule-making. Which you would think it would clearly
be because HIPAA allows for "other biometric identifiers" and what could be a
better biometric than your DNA? But I digress...

One of the problems with HIPAA is that it leaves a lot to the eye of the
beholder, and many beholders have wildly differing vision. This, as you state,
is why you need a lawyer who can make sure your vision doesn't lead to
decisions with a high probability of business-ending bankruptcy and going to
jail.

[1] [http://arstechnica.com/tech-policy/2009/09/your-secrets-
live...](http://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-
in-databases-of-ruin/)

~~~
ronaldx
> when you talk to compliance people, they often take a very narrow view of
> what "identifiable" is. The standard is often that it has to be more or less
> trivial to do.

Frankly, I would expect those compliance people to be disciplined for this
obvious neglect of their duties.

(or whoever is responsible for the custom that 'identifiable' means
'identifiable to a 2 year old')

This reminds me strongly of ethically obviously-wrong tax avoidance schemes.
"Yes, it's OK to pretend you're a used car salesman for tax purposes. There's
nothing illegal about it." Let's get real.

~~~
JunkDNA
This stuff isn't done in a vacuum by compliance offices. It's done with
guidance from HHS. HIPAA has a lot of stuff that is not clearly defined. As a
result, it's important to be keeping with the spirit of the rule or HHS will
come after you. The analogous healthcare loophole scenario you describe would
not hold water with HHS.

Again, my perspective is from the biomedical research world for which the
HIPAA privacy rule gives certain limited affordances for communicating patient
data that is de-identified to other institutions. Without that safety valve of
de-identification being fairly reasonable, there are tons of research studies
that would not be allowed to go forward. There is a point where the very tiny
risk of re-identification is vastly outweighed by the good of a research study
going forward. This is what HHS and institutional review boards struggle with
all the time.

------
fixermark
While I agree with the overall thrust of the article, I have to take issue
with the author's ostensibly-BigData-access-enabled disaster scenarios.

Because let's be honest... From which source would the data be more likely to
be stolen to enable the author's scenarios? Google's infrastructure (post-
hardening against intrusion by---among other entities---GCHQ)? Or the NHS's
system (which has been proven to---among other things---hand data on every NHS
patient in England and Wales out to contractors on 27 unauditable, non-
remotely-deletable DVDs)?

~~~
qwerta
I think Google has direct line to NSA, so yes the Google.

~~~
fixermark
The article wasn't positing scenarios enabled by a nation's intelligence arm
having access to the health records; it was positing burglary gangs, religious
organizations, and insurance companies having (and abusing) access to the
data. I suppose one could posit a scenario where an intelligence agency abuses
access to the health records of citizens of another nation, but the scenarios
highlighted in the article are not enabled by PA uploading the data to
Google's servers. I'd even go so far as to argue that the specific scenarios
highlighted in the article are LESS likely if PA uploaded the data to Google's
service and then destroyed the originals; "27 DVDs" is a format far more
likely to be stolen by a burglary gang.

Assuming the author's scenarios are enabled by NSA access assumes a much, much
larger global conspiracy against the people of England and Wales than I'd be
willing to assume.

(This thought process even grants the assumption that the NSA has some kind of
clear-text access to the data in Google's services these days, which is also
not an assumption I'd make.)

------
jcampbell1
The title is ironic, since Google certainly knows if you are pregnant based on
your search history. Target knows if you are pregnant based on purchases of
prenatal vitamins, and does special marketing:

[http://www.forbes.com/sites/kashmirhill/2012/02/16/how-
targe...](http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-
figured-out-a-teen-girl-was-pregnant-before-her-father-did/)

~~~
rmrfrmrf
Ugh this specific example has been used time and time again and it irritates
the crap out of me. This is not some accomplishment of big data or even beyond
trivial to figure out.

Example Google searches:

    
    
        > 1 month before conception
    
        best days to conceive
        conception calculator free
        best iphone conception reminder
        best pregnancy tester
        what to do when planning to concieve
        planning for baby
    
        > 1 month later
    
        signs that youre pregnant
        how many days after missed period should i check if i'm pregnant
        best pregnancy test 2014
        reliable pregnancy tests 2014
    
        > 2 days later
    
        best prenatal vitamins 2014
        obgyn in <city> <state>
        what to eat when pregnant
        can i eat eggs while pregnant
        medicine safe while pregnant
        when to tell family pregnant
        when to tell friends pregnant
    

I mean, come _on_! That's not an accomplishment of statistical analysis or big
data.

For Target, it's even easier considering that making a purchase is considered
the holy grail of all of the Internet's efforts. I mean, when _else_ are you
going to be buying pregnancy tests and then prenatal vitamins within months of
each other? Not to mention the other things like baby name books, _What To
Expect..._ , etc. that a future parent would buy at Target.

How about a real challenge like determining paternity via search histories?

------
jhspaybar
Does "big data" mean anything? I feel like we are long past this phrase losing
all meaning. How is 125GB "big"? This is still well within "keep it all in
memory" sizes of data. Surely the indexes for this data could be kept in
memory by commodity hardware with something as simple as MySQL as the storage
engine.

~~~
bad_user
Nowhere in the article did I see "big data", or "big" for that matter.

------
protomyth
"considerably stronger than the implicit privacy rights acknowledged in the
US, which are not enumerated directly by the US constitution"

The Constitution was meant to enumerate the powers of the government, not the
people. Sadly, our courts have let us down and now we must rely on the
enumeration even if the 10th amendment says "The powers not delegated to the
United States by the Constitution, nor prohibited by it to the States, are
reserved to the States respectively, or to the people."

Privacy was historically understood and thought to need no additional
enumerations. Check the writings of the founding fathers and British common
law at the time.

------
casca
Their Data Protection Register submission confirms that "When this is needed
information may be transferred to countries or territories around the world".
For those not familiar with the UK Information Commissioners Office (ICO)
process, when submitting you can choose between transferring data just within
the UK, within the EEA and worldwide.

[http://ico.org.uk/ESDWebPages/DoSearch?reg=440927](http://ico.org.uk/ESDWebPages/DoSearch?reg=440927)

[http://ico.org.uk/ESDWebPages/DoSearch?reg=251239](http://ico.org.uk/ESDWebPages/DoSearch?reg=251239)

------
grecy
Should be - "Dear US Govt. am I pregnant?"

It's entirely likely some three letter agency intercepted that upload and has
all the data for itself.

------
oalders
On a vaguely related note, I recently discovered that "am i pregnant" seems to
be a popular search term on Google:
[https://twitter.com/wundercounter/status/430175442432053248](https://twitter.com/wundercounter/status/430175442432053248)

------
michaelfeathers
Definitely bad behavior but I wonder how much of it is attributable to the
cloud story that companies have been selling - that all storage and analysis
belong in the cloud.

------
Shivetya
Funny thing is, Target or another retailer might know because of your purchase
history and would that be a violation under some laws currently on the books?

~~~
bertil
Charlie Stross being male, and the Target story being massively advertised,
I’m assuming that was something he initially wanted to point at.

Purchase history is by far the worst offender in that aspect: you can predict
diabetes, risky behaviour, alcoholism, pregnancy, emotional state (and
marriage stability), intention to move (via DIY equipment), even religion and
ethnicity, that carry a similar moral and legal burden in Europe.

As far as I can read, all legal documents mention: asking, collecting,
processing explicit data. None seem to cover the case of a high factor between
“has bough unscented skin cream” and “Recommend cribs, nappies and milk
bottles”.

What was interesting in the NYT piece about Target was how even with that
knowledge, being too explicit was considered horrible and an invasive
practice. Based on anecdotal reaction to ‘People who bought that’ from Amazon
from more than a decade ago (and the general sucky obviousity of most of their
suggestion) I’m assuming that a lot of the crunchy cases (say: blue books, box
wine, dark chocolate and dildos) are censored to avoid public scandal, and a
corresponding regulation.

Based on experts’ mobility and the strength of statistical trends in general,
I guess the Target Data scientist who was fired after the NYT was not guilty
of spelling the beans, or revealing that Hadoop was the right choice (duh) to
competitors like WalMart and CostCo, but letting the _public_ know.

------
UK-AL
I fairly the Data Protection act says it's ok to move data outside the UK, as
long the country your moving to has similar data protection laws.

~~~
summerdown2
That's only one of the data protection principles (principle 8). The others
still apply, such as principle 7:

> Appropriate technical and organisational measures shall be taken against
> unauthorised or unlawful processing of personal data and against accidental
> loss or destruction of, or damage to, personal data.

... which I would say has pretty clearly not been followed in this case.

It's probably a breach of many of the other principles, too. For example,
principle 3:

> Personal data shall be adequate, relevant and not excessive in relation to
> the purpose or purposes for which they are processed.

... it's hard to see how the data used was not excessive for the purpose in
question.

[http://ico.org.uk/for_organisations/data_protection/the_guid...](http://ico.org.uk/for_organisations/data_protection/the_guide/principle_7)
[http://ico.org.uk/for_organisations/data_protection/the_guid...](http://ico.org.uk/for_organisations/data_protection/the_guide/information_standards/principle_3)

------
artumi-richard
The document where PA consulting announce their use of Google tools[1] ends
that section with "For more information please email
healthcare@paconsulting.com"

Just sayin'

1:
[https://www.google.co.uk/url?sa=t&source=web&rct=j&ei=oOwTU9...](https://www.google.co.uk/url?sa=t&source=web&rct=j&ei=oOwTU9qXCcOVhQfzm4DACQ&url=http://www.paconsulting.com/EasySiteWeb/GatewayLink.aspx%3FalId%3D32982&cd=5&ved=0CDkQFjAE&usg=AFQjCNH4YTWcDErZp2ZJn-
iXUNfSLLzIaw&sig2=gDp0bEwLRH4i0xV2B5CUSQ)

------
jcampbell1
I have never understood the reason medical records are extremely private. I'd
be much more worried about my email/search history/SMS/etc. being published.
My medical history I don't really care much about.

I understand that we don't want companies using our medical history to make
decisions, but is HIPAA just a hangover of HIV/AIDS?

My instinct is that HIPAA has cost many lives by locking up data that could be
used to help doctors be better doctors. If my kids were allergic to bee
stings, I wish this information would show up as the first result of a Google
search.

~~~
polymatter
TFA pointed to some reasons:

"Random scenario: a burglary gang gains access to the database and can thereby
identify patients aged over 80 living alone in up-market neighbourhoods who
have recently been admitted to hospital with conditions suggesting that they
will be vulnerable but not supported by full-time carers. A religious
organization targets men of a certain age who are HIV positive. Or women below
a certain age who are single and pregnant. Or an insurance company notes that
a patient made a mistake in their declaration of a pre-existing condition, and
thereby invalidates their claim. An identity thief uses the postcode and date
of birth, in conjunction with a copy of the public electoral register, to pick
victims. The possibilities are endless"

I can add a few more. A wife beater looks up his wife's records to see if she
has gone into hospital. A rape victim doesn't want her work colleagues to idly
google her medical records and find out. A hopeful job candidate doesn't want
her interviewer to know about her problems with self-harm 15 years ago.

Perhaps none of those situations are relevant to you. But they apply to some
people and that's why we keep all records private.

~~~
jcampbell1
I agree with those hypothetical possibilities. The other type of failure is
also possible.

My dad recently had a detached retina. He spent 5 hours getting his own
medical records. In the medical records, it said he was given a dose of Cipro.
"Which is funny because I am deathly allergic to Cipro, and since I am alive I
certainly wasn't given Cipro."

Had he not spent 5 hours getting medical records and writing letters, some
future doctor could easily kill him based on that erroneous medical record.
Other than retired people, who has the time to penetrate the HIPAA mess. It
can be lifesaving.

~~~
polymatter
I think it is easier to make private information public, than it is to make
public information private.

We can mitigate that sort of failure with things like medical bracelets
([http://www.mediband.co.uk/](http://www.mediband.co.uk/)). Paramedics check
for medical bracelet, necklace and wallet information cards. One of the first
words any medical staff will ask is about allergies.

I recognize that that is hardly ideal and indeed some lives are undoubtedly
lost because of it. But I think going public with medical records is too far.

~~~
jcampbell1
Fully public is clearly too far. I just wish I could access my medical records
like I access my own email.

As far as I understand, HIPAA regulations make this illegal to even think
about. That is what I mean by "extreme".

~~~
dragonwriter
> I just wish I could access my medical records like I access my own email.

Why do you think you can't? They can't actually be sent to you over unsecured
email, but they certainly can be provided to you securely, and the access
methods can be very similar to what you would do with email. I don't think
anyone offers a data dump rather than a UI into a hosted product, but I don't
see any _legal_ reason under HIPAA for that.

