There are some misconceptions on the thread, so let me help clear them up. A screening test is indicated on an annual basis for anyone with diabetes who does NOT have visual symptoms. Diabetic retinopathy (DR) progresses without any symptoms and is preventable if detected early, but despite its preventable nature, DR is the leading cause of blindness in working-age adults even in the developed world.
The test is for screening rather than providing a full diagnosis and is not intended to replace a dilated ophthalmologist examination. You don't need a specialist to screen, but you need a specialist to diagnose and treat. Sensitivity implies the percentage of times the test is able to correctly identify the presence of more than mild diabetic retinopathy (in this case, 87.4 percent of the time) and specificity is the percentage of times the test was able to correctly identify those patients who did not have more than mild diabetic retinopathy (in this case, 89.5 percent of the time). Note that neither sensitivity nor specificity implies accuracy. The sensitivity and specificity generally compare well to that achieved by humans.
It is changing though - it's not been true for the UK since 2014 - https://www.gov.uk/government/news/diabetes-no-longer-leadin...
The technology is there but there would be more work needed for a self-service kiosk to be FDA approved. Another thing that is not clear is whether it is commercially a good idea at this time, given that only a single disease (diabetic retinopathy) is approved. I can see a future where one can use such kiosks to look for multiple conditions and assess risks for various diseases including cardio-vascular disease, neurodegenerative diseases, stroke, and hypertension.
I read that as .87 accuracy, .9 specificity (True Negative Rate). However, I can't find in the link provided in the article above the sensitivity (recall or True Positive Rate).
I'm guessing it goes a bit like this (assuming perfectly balanced classes which in reality they aren't):
Predicted + Predicted - Total
Actual + 378 72 450
Actual - 45 405 450
Total 423 477 900
True Positive Rate: 0.8400
True Negative Rate: 0.9000
Recall (TPR): 0.8400
Off the top of my head, it looks like automated screening will do some good and probably more good than harm, but without knowing how doctors judge good vs harm there's no way to know for sure how useful this device will really be.
Note that this is "majority decision" of multiple ophthalmologists, which is going to be better than the average ophthalmologist's single diagnosis.
Sensitivity generally gives an indication of safety of a medical device, and specificity gives a general indication of effectiveness.
The original article on Verge reported "87% accuracy" and 90% what sounded like TNR. The new link points to an FDA page that makes it more likely that ".87" is actually sensitivity:
Dx-DR was able to correctly identify the presence of more than mild diabetic retinopathy 87.4 percent of the time and was able to correctly identify those patients who did not have more than mild diabetic retinopathy 89.5 percent of the time.
So, I guess, something like this:
Predicted + Predicted - Total
Actual + 393 57 450
Actual - 47 403 450
Total 440 460 900
True Positive Rate: 0.8733
True Negative Rate: 0.8956
Recall (TPR): 0.8733
I guess we treat it as another doctor? Like if we have 4 opinions that agree, we go with that one regardless of the source of those opinions (as long as they meet some minimum competence threshold).
Currently, what happens is that if a diagnostic test comes back and it suggests something serious, say cancer, and the doctor does not pursue it, then the doctor would be liable if it did turn out to be cancer.
So if a machine disagreed with a doctor, then I would assume that the doctor will grudgingly have to investigate further until there is enough evidence to rule out that diagnosis.
What I can see happening is that patients will go to this machine for a second opinion. And if an opinion then returns that contradicts the primary physician, then an entire can of (legal) worms will be open.
To elaborate further, there is sometimes what's called the benefit of history.
Say a patient visits 10 doctors. The 10th doctor has an unfair advantage to the first 9 simply because he/she will have the prior knowledge of which diagnoses and treatments were incorrect.
Similarly for an AI vs Human Doctor situation, the incorporation of additional information (for the AI) would require considerable amount of big data to train in order to be able to recognize prior history, failed treatments, and such.
For image specific diagnoses (eg. recognizing melanoma, retinopathy), these do lend themselves to AI very nicely. For other diagnoses that contain a significant amount of, shall we say, "human factors", then less so.
If a doctor reviews the available data, reasonably concludes that it shouldn't be pursued further, and it later does turn out to be cancer, then that by itself does not mean that the doctor is liable for anything. Malpractice requires actual culpable negligence, such as missing something obvious, not interpreting a questionable situation in a manner that turns out to be wrong. The existence of a second, contrary opinion doesn't change that.
There is always the possiblity that the doctors and the device are both right 90% of the time, but not the same 90% of the time.
Or that either the doctors or the device are right most of the time for the most severe cases but the other party is right only for the milder cases, etc.
It's not easy to look at absolute numbers here.
Though even in those cases, you might be looking to show that your tool agrees with physicians at least as often as physicians agree with each other. Malpractice is usually about failing to offer the standard of care, and if you can show a reasonable level of concurrence with the standard of care in research and trials, you may be able to move forward and reach those higher levels of accuracy.
a) Usage still requires the presence of the Doctor.
b) The doctor does nothing but relay the AI’s message.
c) The doctor continues to charge the same and treat the same number of patients.
d) Everyone who expresses “hey isn’t the doctor redundant now? Shouldn’t we be treating more patients for cheaper” gets ridiculed as “one of those people”.
e) Edit: Also, the doctors’ association devotes significant resources to come up with memetically virulent reasons why the world would end if we took doctors out of the loop.
I mean, that’s how a lot of obviated jobs are currently treated...
The computer aided diagnosis isn't another doctor, it's another stethoscope.
If the images are of sufficient quality, the software provides the doctor with one of two results: (1) “more than mild diabetic retinopathy detected: refer to an eye care professional” or (2) “negative for more than mild diabetic retinopathy; rescreen in 12 months.” If a positive result is detected, patients should see an eye care provider for further diagnostic evaluation and possible treatment as soon as possible.
Are you interpreting the Verge or talking about some other statement?
> IDx-DR founder Michael Abràmoff told Science News. “It makes the clinical decision on its own.”
But I would guess a specialist would still need to be involved since it's not a fool-proof system. A specialist might take other symptoms or variables into account when making the diagnosis or order further tests. While this tool might be useful for blanket screening considering that it is harmless, it seems like it's hardly going to be "making the decision on its own" and prescribing treatment.
It's not something primary care doctors currently do.
What if you have the condition, the GP runs the test, and it turns out negative, and then the GP decides to not send you to a specialist ...
Can anybody infer what the confusion matrix looks like from reading the text?
90% of people are accurately detected as not having the disease, i.e. 10% FP rate. So ~3m people would be falsely diagnosed as having the complication, to 200k/3m have the disease.
10% isn't a great number, but it isn't clear from this coverage whether this complication is generally asymptomatic or not. If there are symptoms to go with it, the numbers may be far better.
I'm more worried about the 87% accurately detected as having the disease, i.e. 13% of false negatives (FN). I don't know how many general practitioners would actually send a patient to a specialist if the device did not detect changes.
The retina seems well suited to AI approaches, though, so I'd be interested in what comes next from companies like this, DeepMind, and other researchers/organizations (look out for Lee et al over at the University of Washington)
The standard practice today is that if a patient is determined to be diabetic then they get referred to an ophthalmologist visit once a year. In that case would comparing those rates of diagnosis be useful?
Not true. The aim is to treat patients before they become symptomatic. Outcomes are much worse otherwise.
2min promo on Diabetic Eye Screening:
The false positive bayesian math is a good illustrative example, but reality is more complicated. And no doctor will base their diagnostic solely on one number.
P(H) = .0069
P(D|H) = .87
P(D|H')= .1 (10% chance the test says you have it but you don't)
P(H|D) = 0.057 or about 6% probability you have complication after machine says you do
I guess they may be misreporting sensitivity as accuracy.
The way it works is: the manufacturer of a medical device assesses the harm that can be caused by a software malfunction, and assigns it a safety classification (class A, B or C). Class A is used when no injury is possible, and class C is used when death or serious injury is possible (e.g. a surgical robot). The manufacturer also provides a "failure modes and effect analysis" document that looks at everything that could go wrong, what is the likelihood of the failure happening and what is the effect on the patient.
Based on the safety classification, IEC 62304 requires different levels of rigour. For example, the standard only requires blackbox testing for class B software, whereas for class C software it requires whitebox tests as well.
The manufacturer also needs to come up with a software development plan that ensures that all of the requirements of the standard are met, and an "argument" (supported by test reports, process documentation, source control history, etc) that the software was developed according to the plan.
And that is what the FDA audits: they look at the development process of a given feature and they check that the plan was followed. I think they rarely delve into the details of the implementation and are generally just checking that the safety arguments are sound and supported by evidence.
More generally, the regulatory body will be looking for you to have a formal engineering process in place and be able to demonstrate its efficacy. Part of that will be looking for how you do hazard and risk analysis, how you handle CAPA (corrective and preventative actions in FDA-speak), how you do system trace, design history file generation, etc. etc. That you have a software development plan and can demonstrate how you follow it.
So they aren't really interested in code reviews per se, but they are very interested in how you view code reviews, how you perform them when you do, what gets documented, how you perform trace an V&V etc.
What does this mean?
They intentionally built an unfair advantage so that they could sell it.
Here's a patent they have filed towards the system. Claims 1-18 and 20 are focused on the training of the neural network. Looks like Claims 1-18 are going to be granted soon largely in that form also from looking at PAIR.
Some of the most elegant code is terse, but it takes years of education, experience, and intelligence to be able to produce that logic.
In general though approval to market for particular indications is for one fixed configuration of a product, so your model parameters won't change.
All of this is in the process of being hashed out, but I expect for a while at least if you are doing on-line learning it will be in non-clinical configurations only and you will end up releasing an update periodically. Depending on the changes this may need a new 510(k) or not, but would definitely need a formal release.
In terms of how well the "experts" perform: "For moderate or worse DR, the majority decision of ophthalmologists had a sensitivity of 0.838 and specificity of 0.981"
And here's a video that describes what's going on in plain English:
Maybe people aren't getting diagnosed at very high rates? That would be a reasonable justification for deployment with somewhat less than perfect accuracy. Anyone have any insight?
In this case, they might weigh:
* How many new cases are caught by expanding access to specialist tools
* What fail safes exist in current course of care — how does a false negative result in a worse outcome for a patient than if they had had no diagnostic at all
The summary of their decision is public record, but not the detailed analysis.
If you can show a statistically better chance of a good outcome with a small chance of significantly worse outcome, the FDA will often approve.
The medical risk is that people will forgo other screening for 12 months when given a negative result. The cost of additional screening for false positives is the other big downside (this is all the machine does, recommend a specialist visit or to rescreen in a year).
Really? Standard advice is 1/yr for both.
Ultimately, accountability and transparency will be the Achilles heel.
Or, rather, we would if there were any existing companies in this space that could take advantage of this. The creator of this device, IDx (https://www.eyediagnosis.net), seems to be rather unique in being an entrepreneurial medical-device manufacturer; MDMs are a rather hide-bound lot.
Honestly, if there's going to be a wave of innovation in this space, I might expect it to come from the inorganic-chem-focused pharma companies, since they have the expertise in both materials science and machine-learning (from doing novel small-molecule detection studies) required to come up with the innovations. I expect they'd likely partner with one or more of the MDMs to build the hardware, but they would write the software.
What this software is used for is very specific, but also very useful in that it is a common medical problem. It is used only to help diagnose diabetic retinopathy (ie eye damage caused by diabetes).
This is AI Vision software used to analyze a photograph of someone's retina to detect damage. In essence it is much more like the programs that are used to analyze chest X rays to detect pneumonia that have been recently published. Where this is useful is that it can probably cut out a lot of human work in diagnosing retinopathy, however it is an incremental step. Even when I was a resident in a primary care clinic years ago the process was somewhat automated like this, with our medical assistants taking a photograph with a special machine, and then this photograph would be digitally sent to a specialist (I presume an opthalmologist but I could be wrong, maybe optometrists can be licensed to do this) for interpretation.
What this isn't, is diagnosing a patient based on taking a history and inputting examination findings and labs, etc... We are still quite a bit of a way from that but I'm sure people are working hard on that as well.
EDIT: In my opinion, where AI could really make a huge difference for my work as a hospitalist (a doctor that admits and rounds on hospital patients) is in voice recognition software, with eventual language processing to help me write notes faster. First, give me a program like dragon dictate but which I can use in the patient's room (obviously one would have to figure out the HIPAA compliance issues) that transcribes my voice and the patient or family member into a readable text file that I can review when I write their note.
Next step would be that same program can give me its attempt to summarize our interview into a reasonable note, which I can edit for accuracy. This would be in effect an AI scribe. A scribe, for those who don't know the medical jargon, is a hired person whose only job is to listen to a doctor interview a patient and help write medical notes, they are usually young pre-medical students. It's a relatively new position that became created as the burden of documenting in electronic medical records limited the amount of time providers could spend with patients. Very common in Emergency Medicine where high output is needed, sometimes also in primary care.
Next step, is you have a company with all this protected medical transcription data and eventual medical outcomes, and you use ML to find algorithms to try and tease out what variables ended up being the most useful for accurate diagnosis. Before that you could have the program prompt the doctor for questions that it thinks would be helpful, etc... Again, huge medico-legal barriers to this but there is a roadmap to becoming a billionaire in my opinion.
This is one of the classic AI application domains -- for example this is why Stanford's Knowledge Systems Lab was near to the medical school and why the mainframe Zork was developed was owned by MIT's Medical Decision Making group.
It's definitely a well identified market and there are people working on it, but I haven't seen much progress (not that I'm looking terribly closely at the moment).
From the point of the view of the last step, the ML one, I suspect the consenting is as much an issue as the PHI. Getting any real traction on this will likely require massive data sets and significant clinician time to aid training.
Source : personal experience.