
Google AI has access to huge haul of NHS patient data - AlexandrB
https://www.newscientist.com/article/2086454-revealed-google-ai-has-access-to-huge-haul-of-nhs-patient-data/
======
aub3bhat
Why are people astonished?

For more than a decade researchers have been able to get access to large
deidentified datasets. As a PhD student I have access to data on 150 Million
visits by 45 Million patients. In some ways the data and access I have is
superior to that of Google & NHS (since UK is a tiny country in comparison to
USA), and I am just a PhD student. Though I have been working on it for last 5
years.

Recently there is a new qualified entity program run by CMS which provides
access to Medicare data.

You can read more about my research and see the demo of the system in my past
submissions and at

[http://www.computationalhealthcare.com](http://www.computationalhealthcare.com)

Also for the actual govenment program
[http://www.ahrq.gov/research/data/index.html](http://www.ahrq.gov/research/data/index.html)

Actually when it comes to Medical information its a much much more complex
problem legally. This paper gives a good overview on issue of "Patient
ownership of data"
[http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1857986](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1857986)

I am by no means minimizing the concenrs that people have. But I think the
article paints Google in negative light while ignoring current standard
practices. I wish the discussion would be more rooted in the facts about how
such data sharing systems work currently.

~~~
fweespee_ch
> Why are people astonished?

Because politicians keep making promises in regards to privacy that they don't
keep in regards to "confidential" information.

Similarly, most of the medical value from such information [e.g. Frequency of
X within a given population fitting certain characteristics] would likely
deanonymize people.

~~~
DanBC
No politician was involved in this decision.

~~~
fweespee_ch
[http://www.bbc.com/news/uk-16021240](http://www.bbc.com/news/uk-16021240)

> "All necessary safeguards would be in place to ensure protection of
> patients' details - the data will be anonymised and the process will be
> carefully and robustly regulated.

> "Proper regulation and essential safeguards need to be in place when it
> comes to patients data," he said. "It cannot be done in a way where
> essential rules are threatened."

The legality of these data sharing laws start off with public promises of
anonymization, robust regulation, safeguards, and privacy.

[https://www.england.nhs.uk/2014/01/geraint-
lewis/](https://www.england.nhs.uk/2014/01/geraint-lewis/)

> Amber data are where we remove each patient’s identifiers (their date of
> birth, postcode, and so on) and replace them with a meaningless pseudonym
> that bears no relationship to their “real world” identity. Amber data are
> essential for tracking how individuals interact with the different parts of
> the NHS and social care over time. For example, using amber data we can see
> how the NHS cares for cohorts of patients who are admitted repeatedly to
> hospital but who seldom visit their GP. In theory, a determined analyst
> could attempt to re-identify individuals within amber data by linking them
> to other data sets. For this reason, we never publish amber data. Instead,
> amber data are only made available under a legal contract to approved
> analysts for approved purposes. The contract stipulates how the data must be
> stored and protected, and how the data must be destroyed afterwards. Any
> attempt to re-identify an individual is strictly prohibited and there is a
> range of criminal and civil penalties for any infringements.

The problem with psuedonymous data is the NHS basically admits it can be used
to identify people given sufficient effort.

\---

That is why people are "astonished" by these decisions. The politician
provides the initial promises that imply anonymity, the implementation doesn't
provide true anonymity but provides criminal penalties for pulling off the
mask, and then the data is handed to enough 3rd parties if such data is leaked
its likely impossible to know by whom unless the data was tampered with to
provide a per-contract identifier.

I understand this _specific_ decision did not involve a politician but the
conversation was why people are surprised. How many people do you think really
know the anonymity originally promised became a permeable pseudonym?

~~~
DanBC
You've linked to a page about care.data

This Google thing has fuck all to do with care.data - they're totally
separate.

I understand "the" NHS is complex, but it's pretty frustrating talking to
someone who has very strong opinions and who clearly doesn't know what they're
talking about.

It's really weird to link to a document that talks about the severe legal
penalties for anyone who attempts to de-anonymise the data, and then use that
to say "look how flimsy these agreements are!", especially when the document
you link to has a BOLD lead saying that things are even stricter in the newer
document.

> The politician provides the initial promises

Again, not a politician. Chief data officer at NHS England, and a real doctor.
[http://www.nuffieldtrust.org.uk/about/our-people/dr-
geraint-...](http://www.nuffieldtrust.org.uk/about/our-people/dr-geraint-
lewis)

> The problem with psuedonymous data is the NHS basically admits it can be
> used to identify people given sufficient effort.

It's trivially easy for Google to do this already without the NHS data, and
they don't face prison time for doing it. See all the pregnant teens outed by
supermarket loyalty cards for other examples.

~~~
fweespee_ch
> It's trivially easy for Google to do this already without the NHS data, and
> they don't face prison time for doing it. See all the pregnant teens outed
> by supermarket loyalty cards for other examples.

And how many members of the general population do you think are aware of this?

> I understand "the" NHS is complex, but it's pretty frustrating talking to
> someone who has very strong opinions and who clearly doesn't know what
> they're talking about.

It probably has something to do with the fact you are completely missing the
point I'm discussing rather than the strength of my opinions.

------
cromwellian
I think this situation nicely juxtaposes positive externality with private
rights.

As an individual, you may wish to hoard all of your personal medical
information. Doing so may provide marginal benefit or protection against some
theoretical harms. However, much of medical science relies on large population
studies. Is knee surgery worth it? Are breast exams? Do COX-2 inhibitors
increase heart attacks? Enormous amounts of good could be done by having large
datasets of entire patient histories available for analysis by all, assuming
they could not be de-anonymized (if not, then the datasets have to be analyzed
under contracts)

Whenever I visit the doctor and I am given a form asking if my data can be
sent to some group to participate in a study, I always answer yes. Never once
have I had any negative repercussions from doing so, and it is my hope that
the data was used to publish scientific papers that added to net knowledge of
humanity. However, I have to wonder if asking me for permission actually
interferes with random sampling. Are people who give permission vs people who
refuse statistically more likely to have other behaviors that may influence
the results?

~~~
AlexandrB
> Whenever I visit the doctor and I am given a form asking if my data can be
> sent to some group to participate in a study, I always answer yes. Never
> once have I had any negative repercussions from doing so, and it is my hope
> that the data was used to publish scientific papers that added to net
> knowledge of humanity.

I've grown cynical. My first thought on seeing this kind of form is that a
third party will be making money from the information I provide somehow -
whether it's through patented drugs or treatments or through pay journals like
those owned by Elsevier. Neither I, nor anyone I know, will get to benefit
from from the information I provide without spending money (often exorbitant
amounts of it). This is the flip side of privatizing everything. Why should I
lift a finger to help for-profit entities when they will not do the same to
help me?

~~~
savanaly
It's going to sound like I'm telling you how to live your life and how to feel
and I really don't mean it that way but I don't know any other way to express
my viewpoint so I'll just power through it:

The idea that someone, somewhere is making a profit on something you provided
should make you happy, not sad. To make a profit, they had to be paid, and to
be paid they normally have to have provided something that someone wanted
enough to part with cash. That cash changed hands is not the main point of the
transaction (it's a net zero for society, someone gained cash and someone lost
it), the main point is rather that something of value was created.

You threw out the example of pay journals like Elsevier. It is true that for
cases like that, there may be a market inefficiency that means not much value
is created. But I think those are edge cases and relatively rare. Even if our
worst fears about Elsevier and other rent seeking journals are true, it would
still be the case that you're helping rather than hurting by providing
anonymized personal data.

~~~
AlexandrB
> The idea that someone, somewhere is making a profit on something you
> provided [for free] should make you happy, not sad.

This seems like a double standard. Why should I be happy when corporations
will regularly go to court to make sure that nothing they create returns to
the public domain within my lifetime? The norm seems to be capturing this kind
value wherever (and for a long as) possible, not distributing it freely for
the good of mankind.

I'm not making a normative argument that this is a healthy way to live or for
society to operate. In a functional community resources, ideas, and
capabilities may be shared freely for the benefit of all, but it can't always
go one way. That's exploitation, not community.

~~~
Bluestrike2
Ok, so what? Even if a company fights to maintain control over their product,
there's _still_ a societal benefit even if that benefit has been artificially
limited due to price. And while they can delay things to an extent,
eventually, they'll lose control over their product as IP protections run out.
There are still games that can be played at that point when everything is
lined up just right, but those are games that can be addressed through
legislation and the courts.

I'd rather hideously expensive treatments exist, even if I can't afford them,
than live in a world where they're never even an option. Price can and does
change over time, and those treatments become more accessible to more people.
But if the treatments are never developed in the first place, then that's an
even greater tragedy because then there's literally nothing that can be done
to make them accessible.

------
nezumi
Terrible reporting. Improving health outcomes by AI analysis of patient data
is a much bigger prize - morally and commercially - than anything which could
be achieved through ad targeting. Google is far too smart to squander such an
opportunity by abusing patients' trust.

~~~
buro9
The big problem is the precedent it sets for data access.

What are the criteria for who gets access? What are the constraints of that
access?

This story covers the latter being blown apart, the constraints were poorly
defined and implemented and thus even if the criteria is well defined access
to far more data was made possible.

I'm sure that few patients desire an end to research, or would argue that such
access isn't a good thing... but what of the insurance industry? Should they
have access? Would the NHS be able to define and enforce those constraints?

Perhaps that's an obvious no.

What then of an insurer partnering with a medical research company, from the
viewpoint of "This costs insurance a lot of money, we'd like to fund a way to
reduce that financial exposure".

The grey areas emerge immediately.

If we cannot control access to patient data, data that would be trivial to
either strip anonymity or just to have in aggregate enough to still produce
net-negatives (i.e. correlated by post code would reveal enough with little
extra work)... and if we cannot define and enforce the constraints of
access... then we really shouldn't be sharing what is highly sensitive and
personal information that was originally only disclosed between a patient and
a Doctor under the premise that what is shared is covered by the explicit and
implicit confidentiality of that conversation.

It's always worth remembering:

Data was acquired under doctor patient confidentiality.

If we considered that data to have a licence, it is the most restrictive
licence possible. One could consider what has happened here as a re-licensing
without permission. Such an act could have a chilling effect on the
relationship between the doctor and patient.

~~~
chris_va
You are making some implicit assumptions that they data access isn't highly
controlled.

I have seen a few of these sorts of deals killed because of data access
concerns, and/or computation requirements ("you can have access to anonymized
data, but you have to run your code in a sandbox on our health servers").

And, this is why we have legislation.

~~~
buro9
Less implicit, from the originally linked article:

> The scale of the sharing program was apparently misrepresented to the
> public, originally announced as an app to help hospitals monitor patients
> with kidney disease with real-time alerts and analytics. But since those
> patients don't have their own separate dataset, Google has argued it needs
> access to all patient data from the participating hospitals.

No assumption there, they didn't have a separate dataset and so granted access
to all patient data.

~~~
chris_va
"so granted access to all patient data"

Yes, but under what conditions? Many privacy laws apply here, and treating
Google as some monolithic entity where everyone working there can now read
anyone's personal health history is inaccurate.

~~~
fweespee_ch
Its psuedononymous data the NHS has previously admitted can be deanonymized
given sufficient effort but such deanonymization carries criminal and civil
penalties.

------
ggggtez
As an expert in machine learning, I don't see how someone would expect this to
work. Actually to predict a disease, you need true positives and true
negatives. If you only had access to the true positive data, it would be a lot
harder to predict accurately.

~~~
chris_va
Seems pretty simple, really.

1000 patients come in with symptoms that look like cancer at year 0.

100 actually get diagnosed with cancer at some point between year 0 and year
5.

Presumably, the remaining 900 didn't have cancer at year 0.

~~~
throwanem
"Seems". You can't know what fraction of your remaining 900 had cancer but
weren't diagnosed with it for any of a wide variety of reasons, from death
through misadventure, through false negative, to simple loss to followup.
Clinical studies are designed to exclude such confounding outcomes. It's very
difficult to see how any study of this data could be designed to do likewise.

~~~
chris_va
True, but I think you can mitigate these issues.

Especially for the NHS dataset, since you will either see the patient in there
or in a death index (unlike, say, US where they may have just gone to another
hospital).

Also, the scope here is more like a longitudinal vaccine study than a clinical
trial. 50M people will provide a lot of robustness that you wouldn't see in a
1000 patient trial.

~~~
throwanem
> you will either see the patient in there or in a death index

Or you won't see any new information for the patient, because the patient is
lost to followup. Or - worse - you'll see new information, but it'll be
invisibly erroneous, because random GPs don't work to the standard that
physicians administering examinations in clinical studies do.

> 50M people will provide a lot of robustness that you wouldn't see in a 1000
> patient trial

Not if the 1000-patient trial is well designed, and the data of those 50M
people is totally uncontrolled and unverified. This isn't warfare - Stalin's
dictum has no place here. You can't overcome the flaws of a dirty dataset by
adding more dirty data to it, especially when you literally cannot know either
the magnitude or the nature of the inaccuracy, or even tell what's accurate
from what's not.

~~~
petra
>> Or - worse - you'll see new information, but it'll be invisibly erroneous,
because random GPs don't work to the standard that physicians administering
examinations in clinical studies do.

So you will take this into account and emphasize the more reliable types of
tests, like blood tests. Or you'll find ways of learning which doctors who do
the tests more accurately(or train doctors to do so) and which people are more
consistent/reliable with their relationship with their docs. Or maybe you'll
get a few hypotheses which are relatively likely and that would incentivize
the researchers/google to do small clinical trials on them.

It's worth a try at least.

~~~
throwanem
Even if it were - which is, for the aforementioned reasons, doubtful - it
would not be worth turning over the medical records of 50M people to an
unregulated private company without so much as a by-your-leave.

------
DanBC
People in the UK who want to complain about this could try the ICO who
strongly regulate health information.

They could also look at any advertising material and report that to ASA (if it
meets the criteria for being regulated).

That trust has a Caldicott Guardian who will be responsible - legally - for
keeping patient data safe. I would have liked a quote from them, although I
guess that quote would be something like "No patient identifiable data has
been shared with DeepMind". That would be scary, I have no doubt that DeepMind
would do a very good job of de-anonymising data, but I know that Google would
have to be monumentally stupid to try that.

There are other data projects happening in the NHS - care.data (that dot isn't
a typo!) is one that got a lot of attention. That allowed (after some fuss)
people to opt-out. (It didn't allow people to specifically opt in to show
their support, which is something I would have done.)

I'm a bit wary of Vice's reporting here. They don't seem to know what they're
talking about (there's nothing about controls over patient data in the NHS,
for example); they don't seem to have approached the Trust involved; they
haven't done a good job of explaining what's going on.

There are some really bad failures or data protection in the NHS (especially
around mass email! People using CC instead of BCC to a group of people using
an HIV clinic, for example) and there are some historic abuses (selling data
to insurance companies) that led to changes in the law.

So I don't know if this is terrible and deserving of anger, or okay and poorly
reported, or a good thing with misleading reporting.

------
VeejayRampay
The National Health Service (NHS) is the publicly funded healthcare system for
England (source: Wikipedia)

Putting that here because I was confused about what NHS was in the first place
(I'm French).

~~~
arethuza
There isn't really a single NHS - there are four National Health Services for
the four parts of the UK (England, Scotland, Wales and Northern Ireland):

[https://en.wikipedia.org/wiki/National_Health_Service](https://en.wikipedia.org/wiki/National_Health_Service)

------
grillvogel
best part of this is when we reach the skynet robot uprising they'll know
exactly how to kill us the fastest

------
superkamiguru
There is no equivalent of HIPPA in the UK?

~~~
aub3bhat
HIPPA stands for Health Insurance Portability and Accountability Act. The
"Privacy" component of HIPPA is very limited in scope. Even with HIPPA the
data being shared would be consistent with definition of "Limited" dataset.[0]
At least in United States there is significant precedent for sharing such
data, including even for marketing purposes as decided by Supreme Court case
(IMS Health vs Sorrell).

[0][http://www.hopkinsmedicine.org/institutional_review_board/hi...](http://www.hopkinsmedicine.org/institutional_review_board/hipaa_research/limited_data_set.html)

------
ocdtrekkie
This is probably one of the reasons some may be opposed to nationalized
healthcare... The government making a widespread decision to send your medical
data somewhere without your permission.

~~~
halhodson
but we don't really have the consent-at-scale technology for anything else, do
we? tbh it's just this specific NHS Trust that made the call in this instance.

Personally, I see potential in ResearchKit to solve the consent problem re
medical data

~~~
lbhnact
ResearchKit may be part of the solution for some things, although the auth and
data management will be barriers for commercial providers for the near future.

I have a small team that is working with leaders in the space[1] to help many
of the major EMR vendors support an open-standards based approach to medical
record sharing.

We'll be seeking public feedback when some of the preliminary work is ready,
but are very excited to get input from the community as we make progress.

[1] [https://dbmi.hms.harvard.edu/news/more-power-
patients](https://dbmi.hms.harvard.edu/news/more-power-patients)

