Hacker News new | past | comments | ask | show | jobs | submit login

Federated learning (with implementations such as FEDn [1]) is supposed to solve this problem; Only sharing training weights, but not sharing data.

Sure requires some coordination, but the legal parts should at least be solvable in this way.

[1] https://github.com/scaleoutsystems/fedn




I've been involved in a federated learning radiology project, the biggest issue is image technique and labelling.

Different centers practice very differently, with different imaging protocols, disease prevalences, and labelling/reporting.

This project was looking at renal masses and the only part that worked well with federated learning was image segmentation and probability that the mass is a cancer, this is a competency I expect out of a first or second year trainee.

It was horrible for predicting subtype of cancer as we couldn't get a good training set (few of these lesions are biopsied, specific MRI sequences that may help are not done the same way in every center) which is what the goal was and more of an experienced generalist/subspecialty radiologist skill.

Practically subtype doesn't make too much of a difference for the patient as if it's "probably a cancer" it'll just get cut out anyway, but highlights a challenge with federated learning.


BERT-like encoder models actually do help here, although unfortunately the dense representation is not as nice and clean as a class, it will at least map to something similar. If the hospital provides even a completely bizarre entity id, the requirement would be to provide the dictionary of descriptions and train the BERT-like model to associate the entity ids and descriptions.


> BERT-like encoder models actually do help here, although unfortunately the dense representation is not as nice and clean as a class, it will at least map to something similar.

Completely agree with this.

> If the hospital provides even a completely bizarre entity id

This is where it beings to fall apart, the hospital doesn't always provide an entity. In this specific project, some will argue "it's a big enough renal mass just cut it out, based on current evidence it's safest to remove" which while true at the moment doesn't help us in developing a model to prospectively predict the subtypes that do/do not need surgical excision.

Consequently, there is no biopsy and specific MRI sequences that are proving helpful at subtyping are missing, irrelevant for clinical care per current standard practice but hugely relevant if you're trying to change that practice as we need labelled data but we don't have the resources to have a radiologist/pathologist go over every case again and try to fill in some blanks en masse.


Is that really possible? I don't understand the field exactly but at least if you use deep learning, you'll be able to recover inputs from gradients, and if you want to program it such that no other party is able to recover your inputs, that's called cryptography and homomorphic encryption is FAR from practical yet (like a million times slower than practical). Without rigid mathematical foundations, I would doubt that it's just a fancy way of gathering all the training data and doing the training and making it perceive as if everyone retained their own data.


Im not sure it even has to be complicated and full on machine learning.

Could also just make it easy to share diagnosis to the a database stripped from personal information about the client, properly index it, share it worldwide, allow doctors to contact eachother automatically who match.


If it is a rare condition, reidentifying people from even small amounts of relevant data (age, gender, hospital visited) data is trivial. If it is a common condition, usually there is plenty of data already there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: