Hacker News new | past | comments | ask | show | jobs | submit login

Do you think the hospital is lying about "anonymized form"?



No, the issue is that NHS Digital, and its predecessors are utterly stupid when it comes to "anonymous"

firstly they sold everyone to a number of insurance companies. Something that as far as I can tell is illegal https://www.telegraph.co.uk/news/health/news/10656893/Hospit...

secondly NHS digital created a schema for sharing records with researcher. Supposedly it was anonymous, however it had date of birth, sex and postcode (a post code is about 80 houses) plus the address of every interaction with the NHS.

The only thing that was missing was the name. But cross referencing date of birth, gender and postcode with the electoral register give you a name in 99% of cases.

Also, knowing a few of the people who work _for_ NHS digital, and their involvement of leaking things to the press for personal gain, I have no faith in their moral compass.


Yes and no.

As I'm fond of saying, there ain't such thing as "anonymized", there's only "anonymized until combined with other data sets".

Plenty of non-obvious things can deanonymize you. An accurate enough timestamp. A rare enough medication or treatment you received. The combination of treatments you received. It's all fine until a chain of data sets form that can identify you with high probability.

Unrelated, I too used to be all for "share my data with whoever you need for medical research". These days, I worry that "medical research" doesn't mean actual research, but random startups getting deals with hospitals - startups that I don't trust to not play fast and loose with the data, and don't trust to share the results with wider community. I think there was even a HN story about that some time ago.


After working in healthcare, the startups probably have much better data security. Medical is a horror show of bad data practice. I'd trust Uber with my data before I thought any major health org had any chance of doing it right.


For people in the US, it's worth looking at the list of HIPAA-defined "protected health information.[1] The 18 listed fields are the basic standard for both anonymization and de-identification.

Frankly, it's a terrible list made by people who don't understand statistics. 16 of the fields are simply unique identifiers in isolation. The only two sops offered to prevent deanonymization are dates, which are restricted to year, and location, which is restricted to a >20,000 person zipcode identifier.

A complete medical history reduced to year is probably still a unique identifier for many people. Crossed with a 20,000 person geographic restriction, the year of even a single uncommon medical event is unique for many people. And that's before we even include non-redacted information like demographics. Adding just race, gender, and approximate age can easily turn 20,000 people into a few hundred.

Who can deanonymize that data? Well, Visa can see when and where you're diagnosed with things by spotting a radiologist's bill or a monthly pharmacy payment. Target can use your location and OTC medical supply purchases. Plenty of ad networks could pair an IP location and search term to a ZIP and diagnosis. And that's what I've got with 2 minutes thought and no training in doing this.

As a final, ugly aside: HIPAA coverage applies to patient-released data. Once it's anonymized, shared, and deanonymized, the new holder is likely free to shop it around with your name attached.

[1] https://en.wikipedia.org/wiki/Protected_health_information


No, but there's really no such thing as anonymous anymore given the corpus of data a company such as Facebook or Google have. Once they have just a few "anonymous" clues, such as zip code, year of birth, race, etc. they can probably connect any datum with an individual with pretty high accuracy.


> "anonymous" clues, such as zip code, year of birth, race

Those are examples of PII, not "anonymous"


It's not so clear-cut. https://en.wikipedia.org/wiki/Personally_identifiable_inform...

And that's the point. Anything can add up to PII if you have enough clues. PII is a statistical concept, not a binary concept.


in the EU, its a legal concept.

its anything that can be reasonably used to identify a person.


So by itself, medical data by birth year is not enough to identify a person. It's just statistics. Same as by race, or any other single characteristic.

But when combined, and matched against the kind of individual detailed data that Google or Facebook have about most of the population, it's a lot less anonymous.


In the old Data protection Act, it was any data when combined with public records.

In the new GDPR, as far as I'm aware, its the _act_ of trying to combine/process (not sure of the exact wording) to de-anonymise


>its anything that can be reasonably used to identify a person.

This bar moves over time, especially as people share more (not health-related) personal information with companies.


Under HIPAA at least, zip code is the only "protected health information" in that list - and even then three digits of it can be shared if they apply to a large enough population.

HIPAA permits two de-idenficiation methods. One, expert analysis, will presumably catch things like demographic identifiers. The other, safe harbor, offers a concrete list of data to remove - and that list leaves an absolutely massive amount of PII unrestricted.


If it's a US hospital, I would assume, absent specific reason to believe otherwise, that it follows deidentification rules in HIPAA regs if it says that (because it's a crime otherwise), but I'm not all that confident that data deidentified by those rules (especially since certification of a specified qualified expert substitutes for the concrete requirements) is necessarily effectively anonymized.


Yes. I've seen far too many people who don't really have a clue at how to actually make data anonymized and which have adversarial systems systems set up for reporting any issues that encourages ignoring the issue over fixing it.


Do you have an example where this happened with a hospital in the past 10 years? The only example anyone's given in this thread so far is from an insurance commission in the mid 1990s.


Last year with anonymised data freely available to anyone for health research.

https://www.abc.net.au/news/science/2017-12-18/anonymous-med...


Considering that personal injury law firms geotarget their AdWords to people who have been in a hospital emergency room, I'd say it's not that hard to stitch the pieces together to remove that anonymity.


Not necessarily.

The problem is that it's very hard to really anonymize data[1] and the hospital may not necessarily know that.

However, with Google, Facebook and (yes, even) Apple getting into the game my trust is ever more shattered. Let alone any shlocky "health-app" maker which sends your most personal data straight into "the cloud".

[1] https://arstechnica.com/tech-policy/2009/09/your-secrets-liv...


That's a terrible example to illustrate it's "very hard" to really anonymize data. Date of birth is an obvious deanonymizer. Zip code should never be included in publicly released data; it's too granular when combined with race and approximate age.

Definitely thought needs to be put in, but I don't think "very hard" is right.


> It's too granular when combined with race and approximate age.

That goes or everything, and that's the problem with the PII concept.

Passwords are (if handled well) unique secret keys. Either you know it or you don't. Close doesn't count.

Personal identity is an amalgamation of dozens of personal facts, and each fact statistically deanonymizes a person.


I think the hospital fails to understand that anybody can be deanonymised by cross referencing enough 'anonymised' data sets.


There’s nothing more “me” than my brain. I expect sufficiently advanced science to be able to narrow down my identity from a detailed enough scan of my brain. Whether this can be done today is a question of technology, and I’m not abreast of the latest findings in this field.


If the data shared included a detailed brain scan, then yes I'd say the hospital was lying about "anonymized form", same as if it included a retinal scan or fingerprint.


Or genomic information, or a unique combination of drug interactions which could be combined into a sort of health fingerprint, or phenotype information, race or age, which are often necessary to provide medical context for diagnoses, or timestamps on test results that match up with events in your calendar or destinations in your navigation or security camera footage, and I'm sure the list goes on.

The word "anonymity" is a little ambigious. It can mean "unable to be identified" or it can mean "namelessness". There's not much information about anyone that is truly unable to be identified, especially when combined with additional pieces of data readily available from external sources.

It all depends on who they're sharing the anonymous data with and to what external data do those partners have access.


Anonymized is a very loaded word, technically my IP address is anonymous but companies still figure out who I am from it.


He must believe both that the data won't be anonymized and that the data won't be given exclusively to medical researchers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: