Hacker News new | past | comments | ask | show | jobs | submit login
Many FDA-approved AI medical devices are not trained on real patient data (medicalxpress.com)
82 points by clumsysmurf 5 months ago | hide | past | favorite | 61 comments



"226 of 521 FDA-approved medical devices, or approximately 43%, lacked published clinical validation data."

The lack of "published" clinical validation studies implies neither that the AI developer performed no clinical validation nor that the FDA hasn't seen it. So, it is not clear if the problem is with the lack of clinical validation or the lack of reporting. For some reason the title exaggerates yet further (half of FDA-approved AI not "trained" on real patient data).


> AI developer performed no clinical validation nor that the FDA hasn't seen it

if the developer went through the trouble and expense of performing clinical validation you can be sure that they would publish the results __unless__ the results reflected negatively on them


Given that the design and endpoints of clinical validation studies is priceless information for developers of similar devices (i.e., competitors) applying for FDA clearance via the 510(k) pathway, it would not surprise me at all if this information was purposefully kept secret no matter how flattering it is. Especially for relatively new technology like AI


I've been part of bringing many medical devices to market and this isn't true.

Sometimes they do publish the validation data, especially in pharma etc. Most times they don't, why would you? It would only help your competitors.


510(k) summaries are always a negotiation between FDA (which wants as much data as possible in them) and manufacturers (which want as little data as possible, as it is priceless competitive intelligence)


In the spirit of the authors' point of sharing one's supporting evidence, could you please share the data and code for scraping the list of devices from the FDA website? :)


Not a surprise. I worked on a ‘AI’ health insurance product for several years and even getting access to data was a struggle.


I'm not sure if this is relevant, but Ben Goldachre in the UK is working on how to get access to NHS patient data in a privacy-preserving manner [0]. My understanding is that you essentially submit your analysis and it is run against live data but you only recieve summary results. I'm not sure if this could be adapted to training.

[0] https://assets.publishing.service.gov.uk/media/624ea0ade90e0...


The problem with this approach is it is very hard to do data science with messy clinical data when you have no mechanism of investigating the data yourself.


I really wish you had used "impossible" instead of "a struggle".


Yep, I'm in the rare disease space. "impossible" is pretty appropriate.

It's tricky. On the one hand, it's obviously not appropriate to be flippant about patient privacy. On the other, it's clearly that advancements in human health are being hindered by our current approach to (dis)allowing researchers access to data.


For me it's a situation of "once bitten, twice shy". What are the odds the medical data intended for research will be handled correctly and not used outside of its intended purpose?


What are the potential downsides to misuse of health data? Genuinely asking - I'm not sure what someone malicious would do with my health records, especially if it's anonymized.


Insurance companies refusing to pay because of $reason based on deanonymized data. Ad companies or bigPharma bombarding you to get new pills because they want to sell more pills. Black mail because you have embarrassing disease.

There's a lot of money being spent on deanonymizing data, and I would never count on it ever being able to remain anonymous with that much incentive.


There are several examples of anonymized data not being so anonymized after all and able to be traced back to the person. As far as what could someone malicious do with health records, you have a contingent of people in some states hunting down women for having abortions so that might be something you don't want getting out there. Or you might be someone in a very religious area and you don't want people finding out you're getting AIDS treatment.


Between your comment and others in the thread we've so far got:

- Insurers won't insure you

- Abortion activists will hunt you down

- Religious fanatics will shame you

- Credit rating

Not to downplay those (very real) risks in the slightest, but they are all US-centric problems.

Not sure what over-arching point to make, but it's certainly a set of US-centric problems.


I understand the theoretical concerns in these cases, but IMO it does not weigh heavily against the (conservatively) hundreds of thousands of annual deaths due to hindered medical research.

It's hard to overstress enough how impossible it is to do even basic research across institutional health datasets, even you're a giant organization with a compliance team. It's soul-draining and frankly the reason a lot of smart people jump ship and work in finance or crypto or whatever, where you can accomplish something even if it's goofy.


You're not addressing the root concern which is that healthcare is notoriously insecure. Approaching this as "who cares if things get leaked" instead of improving security of records is why getting data is impossible.


That's your root concern, not mine. My root concern is that people are dying for bad reasons.


What's my or your biggest concern is irrelevant. Patient data security is why you can't get the data you say you need. That is just a fact and I would think energy is better served towards improving the handling of patient data if you want easier access to that data for research purposes.


I worked at Datavant for 3 years building a network for deidentified data exchange.


Nitpicking but he gave you practical examples of stuff that already happened, not theoretical ones.


Those are both theoretical examples of what people might want do with re-identified medical data. They are not demonstrated harms of things that happened in real life.


Develop tools for health insurance companies to abuse patients. Instead of denying coverage to patients based on real life symptoms, they can deny coverage due to model outputs that are “based” on real life data.

Since these models are black boxes, it’s easy to hide biases within them


Or worse, people with conditions similar to you have shown to develop, so we're going to charge you now for what we think you might develop later.

Same negative attached to pre-crime in policing because people that wear the same clothes, drive the same car, listen to the same music, and other sames have committed crimes, we think you will too. someday


You just described the whole concept of insurance.


In the USA, health plans aren't allowed to deny coverage to patients based on genetics or pre-existing conditions. They aren't stupid enough to try to break those laws. Employees can't keep a secret. And most of the claims costs are directly passed on to employers (group buyers) anyway, so the major health insurance companies have little direct incentive to deny coverage; with minimum limits on the medical loss ratio it's rather the opposite.

https://www.hhs.gov/hipaa/for-professionals/special-topics/g...

https://www.hhs.gov/healthcare/about-the-aca/pre-existing-co...


Companies can just not hire you for "culture fit" or something else based on leaked data about your health problems in order to keep their premium payouts low or just to avoid hiring certain types of people (fill in your blank here).


especially if it's anonymized

And there's the problem. In theory, this is possible. But in reality, there is no such thing.

You also assume that your data will be correct. If data integrity was so easy and common, people wouldn't be encouraged to repeatedly check their credit reports for mistakes.

One bit flip and you go from "perfectly healthy" to "about to die" and suddenly you can't get life insurance, your credit score tanks and you can't get a job.


The downsides are there, of course, and you have already been provided theoretical risks by other users. Unfortunately the discussion only ever centers around the downsides, with fear mongering aplenty, rather than treating the situation the same as any other situation in life: a risk-benefit trade off.


One word: euthanasia.


"Your OneMedical account by Meta-Amazon LLC has been deactivated due to suspicious activity based on analysis of your genome and online browsing habits. Please proceed to the nearest fresh location for mandatory euthanasia."


Nonzero, for sure.


If a researcher can get the data, then so could someone else with less altruistic motives. So the good actor is slowed because of the bad actor. Unfortunately, there's very little way to prove the good is good and not crossing their fingers behind their back.


Perhaps opening up access to data 5 years after the patient is dead would fix this.


I interviewed with a company that built a Electronic Medical Records system optimized for cancer treatment, licensed it for cheap, then took the data and paid for medical professionals to spend time normalizing the free-text notes data from the doctors (this was ~a decade ago, when LLM's were not really a thing, so they got trained professionals to do it instead). That was all a play to get the level of data they wanted for training AI to make new discoveries.

They got acquired by a major pharmaceutical company, but I don't know of any major discoveries made from their data. Because even with real data this is a hard problem.


What's missing here is the risk of the medical device not performing as expected on real patients. These risks are usually mitigated reasonably by medical device manufacturers and designers such that it doesn't matter that "AI medical devices" are not trained on real patient data.

The FDA keeps a database of all adverse events reported by manufacturers or healthcare systems [1], so we can check in a few years if these AI medical devices are causing an uptick of complaints.

[1] https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/s...


Maybe we could have an option to be a data-donor as opposed to only having the option to donate your organs for the purpose of science.


Something like this or something more akin to "when I die, please take all my healthcare records, data from smart watch, etc and make that available to research"?

https://www.tidepool.org/bigdata


I fully expect the data to be de-anonymized and then used against people that have familial links to those who have donated their data.

"Oh, your great great-great-grandfather had <disease>, so now you're classified as a high-risk individual and you have to pay a higher monthly fee for insurance"

Only they won't tell you why you have to pay more, just that you do.


That is essentially why HIPAA exists. And the negative and limiting effects of HIPAA on clinical trials is well acknowledged and is considered as an acceptable cost of HIPAA.


HIPAA was written in 1996. The human genome project was completed in 2003, and WGS wasn't in the clinic for another decade. Nothing about HIPAA is considered or intentional in the context of current medical practice or research. It's just an old law that carries forward mindlessly like all laws do.


1. It's illegal, and insurance company employees aren't suicidal.

2. If health insurance companies wanted to break the law, they would simply violate the existing prohibition on discriminating on preexisting conditions, which gives them vastly more actionability than some tenuous, diluted link to a relative.

This is all frankly nonsensical because insurance companies charge people within tightly-regimented tiers, there's no wiggle room for mystery +30% fee increases.


Thousands of Wells Fargo employees would like to have a word with you about #1. “Illegal” is outweighed easily if there’s a bonus to be earned by doing something that everybody says you won’t be caught for doing. Plus, the company itself can’t be put in prison and the employees whose bad idea it was will probably get a slap on the wrist. And if it’s a big enough company, they won’t endanger it by truly hurting it with fines. #2 is a pretty good point though.


That's an inherently biased sample, so not that useful for research.


23andme tried that but well you know


Huge problem is data protections in the USA that do not allow for easy sharing of patient data for these purposes. The liability is huge and the upside is very little. Furthermore, if it's a college-affiliated hospital, that data is not going anywhere except to the internal teams within the college.


These are good things.

People don't want their info used for this without their permission.

Freedom


Your data is valueless when it's by itself. That's the fundamental mistake they made with HIPAA.


That's not true.

Knowing if you have an expensive condition could be invaluable to say, an insurance company.

Knowing something else may give people the opportunity for blackmail.

There are lots of reasons one's medical history is valuable. There is more that is valuable than just money.


Don't want to go too off-topic but:

Why does this has to be so hard? I mean what if the regulators went the opposite approach: make ALL the data publicly available by anonymizing it.

What privacy concerns would anyone have if the whole data is completely anonymized? Wouldn't it create far more benefit by accelerating innovation in healthcare? I strongly believe the privacy concerns (which shouldn't exist after stripping out any PII) are outweighed by an order of magnitude by the upsides, which are literally saving millions of lives in the longer term.

But again, regulation, as always, cripple innovation, this time slowing down developments that would literally save lives. Great.


Without getting into the weeds; privacy concerns remain after "stripping out any PII" simply due to the power of post analysis and combining masses of data looking for things that line up.

There are various long articles on this and it remains the case that patients have a right to confidentiality.


As a gastroenterologist I have used Medtronic GI Genius AI to do colonoscopies for the past 8 months. It is mildly helpful but has way too many false positive alarms for "polyps" that are not polyps. It needs better training in the real world.


How can anyone except big incumbents who are immune to competition get access to data? I feel this is an unfortunate situation where the opportunity to be an innovator is locked away. Some of the reasons are good but it’s not a great outcome for us all.


There is a healthy market for claims and ehr data resale. That can be part of the solution for out-of-sample outcomes comparisons.


another confounding problem is how the FDA classifies and approves devices for clearance. most regulatory strategies have the company strive to find a predicate and prove they are substantially equivalent (SESE), which generally means a class II device, which does NOT require clinical data for clearance.

it's frustrating that the article uses the term "approve" which is specific to PMA devices (class III), whereas class II devices are "cleared" by the FDA.

- a medical device executive


As long as these are diagnostic aids, not treatments like a pacemaker or medication, I don't see this as so concerning.


god save us.. hope they didn't use omniverse replicator.


Had a chat with a pharmaceutical company whistleblower recently.

They told me that FDA has nothing to do with actual product testing, e.g., human trials. What the "approval" guarantees is proper facilities and other second-order criteria.

I'm writing this comment in the hopes of learning that I have misunderstood something crucial here. Because if it is indeed the case that the companies themselves are the only ones vouching for drug safety, that old story about capitalism and incentives implies we're seriously fucked.


FDA clearance only means a company can legally market the device.

additionally, product testing != human trials. they are two very different things.

final product testing is called "design verification and validation" (aka dV+V) where a statistically significant number of devices are tested against several different categories.

human trials can run the gauntlet from a small IDE studies to phase I through IV.

the drug approval process is MUCH different than the medical device process.


Elizabeth Holmes (in coarse voice): “Hold my beer”




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: