
FDA permits marketing of AI-based device to detect diabetes-related eye problems - nradov
https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm604357.htm
======
ksolanki
As a founder of another company in this field, let me start by saying that
this approval is a big deal. Kudos to IDx. This is the very first time FDA has
approved a fully automated CADx (computer aided diagnosis) device. Eyenuk is
also on it's way to an FDA approval and it is a lot of work conducting the
prospective clinical trials.

There are some misconceptions on the thread, so let me help clear them up. A
screening test is indicated on an annual basis for anyone with diabetes who
does NOT have visual symptoms. Diabetic retinopathy (DR) progresses without
any symptoms and is preventable if detected early, but despite its preventable
nature, DR is the leading cause of blindness in working-age adults even in the
developed world.

The test is for screening rather than providing a full diagnosis and is not
intended to replace a dilated ophthalmologist examination. You don't need a
specialist to screen, but you need a specialist to diagnose and treat.
Sensitivity implies the percentage of times the test is able to correctly
identify the presence of more than mild diabetic retinopathy (in this case,
87.4 percent of the time) and specificity is the percentage of times the test
was able to correctly identify those patients who did not have more than mild
diabetic retinopathy (in this case, 89.5 percent of the time). Note that
neither sensitivity nor specificity implies accuracy. The sensitivity and
specificity generally compare well to that achieved by humans.

~~~
return1
what is the screening like? is it possible there will be automated self-
service 'kiosks' for that?

~~~
ksolanki
I believe the self-service kiosks would be very much feasible. There are two
key components: (1) automated non-mydriatic (not requiring dilation of the
pupil) retinal imaging and (2) automated grading of images using AI.

The technology is there but there would be more work needed for a self-service
kiosk to be FDA approved. Another thing that is not clear is whether it is
commercially a good idea at this time, given that only a single disease
(diabetic retinopathy) is approved. I can see a future where one can use such
kiosks to look for multiple conditions and assess risks for various diseases
including cardio-vascular disease, neurodegenerative diseases, stroke, and
hypertension.

------
YeGoblynQueenne
>> In one clinical trial that used more than 900 images, IDx-DR correctly
detected retinopathy about 87 percent of the time, and could correctly
identify those who didn’t have the disease about 90 percent of the time.

I read that as .87 accuracy, .9 specificity (True Negative Rate). However, I
can't find in the link provided in the article above the sensitivity (recall
or True Positive Rate).

I'm guessing it goes a bit like this (assuming perfectly balanced classes
which in reality they aren't):

    
    
               Predicted + Predicted - Total
      Actual + 378         72          450
      Actual - 45          405         450
      -------------------------------------
      Total    423         477         900
      
      Accuracy:             0.8700
      Error:                0.1300
      True Positive Rate:   0.8400
      True Negative Rate:   0.9000
      Precision:            0.8936
      Recall (TPR):         0.8400
      F-Score:              0.8660
    

I'm not sure how good or bad is 10% false positives and 16% false negatives
are for that kind of diagnosis. The linked trial page says that _40% of
diabetes patients have some degree of diabetic retinopathy (DR)_ , that early
treatment reduces vision loss "by as much as 52%" and that _only some 50%-60%
of people with diabetes have a yearly eye exam_.

Off the top of my head, it looks like automated screening will do some good
and probably more good than harm, but without knowing how doctors judge good
vs harm there's no way to know for sure how useful this device will really be.

~~~
ksolanki
This is not completely correct. Sensitivity (87% in this case) is not same as
accuracy, nor is specificity (89.5% in this case) same as true negative rate.

Sensitivity generally gives an indication of safety of a medical device, and
specificity gives a general indication of effectiveness.

~~~
YeGoblynQueenne
The link changed since I posted my comment :)

The original article on Verge reported "87% accuracy" and 90% what sounded
like TNR. The new link points to an FDA page that makes it more likely that
".87" is actually sensitivity:

 _Dx-DR was able to correctly identify the presence of more than mild diabetic
retinopathy 87.4 percent of the time and was able to correctly identify those
patients who did not have more than mild diabetic retinopathy 89.5 percent of
the time._

So, I guess, something like this:

    
    
               Predicted + Predicted - Total
      Actual + 393         57          450
      Actual - 47          403         450
      -------------------------------------
      Total    440         460         900
      
      Accuracy:             0.8844
      Error:                0.1156
      True Positive Rate:   0.8733
      True Negative Rate:   0.8956
      Precision:            0.8932
      Recall (TPR):         0.8733
      F-Score:              0.8831
    

Closest I can get with exactly 900 cases :0

~~~
ksolanki
Thanks for this analysis. The balance in real world is more like 20-80, ie 20%
of typically screened patients would have referable retinopathy (screen
positive).

------
j32fun
Ultimately, you will need a doctor somewhere along the diagnostic process so
that someone is there to assume liability for incorrect diagnoses.

~~~
bena
What's crazy is that if AI is better than doctors by some significant degree,
what do we do when the doctor and AI disagree? Like if doctors are right 85%
of the time, but the AI is 90%.

I guess we treat it as another doctor? Like if we have 4 opinions that agree,
we go with that one regardless of the source of those opinions (as long as
they meet some minimum competence threshold).

~~~
j32fun
That's an interesting thought.

Currently, what happens is that if a diagnostic test comes back and it
suggests something serious, say cancer, and the doctor does not pursue it,
then the doctor would be liable if it did turn out to be cancer.

So if a machine disagreed with a doctor, then I would assume that the doctor
will grudgingly have to investigate further until there is enough evidence to
rule out that diagnosis.

#headache

What I can see happening is that patients will go to this machine for a second
opinion. And if an opinion then returns that contradicts the primary
physician, then an entire can of (legal) worms will be open.

\--

Addendum:

To elaborate further, there is sometimes what's called the benefit of history.

Say a patient visits 10 doctors. The 10th doctor has an unfair advantage to
the first 9 simply because he/she will have the prior knowledge of which
diagnoses and treatments were incorrect.

Similarly for an AI vs Human Doctor situation, the incorporation of additional
information (for the AI) would require considerable amount of big data to
train in order to be able to recognize prior history, failed treatments, and
such.

For image specific diagnoses (eg. recognizing melanoma, retinopathy), these do
lend themselves to AI very nicely. For other diagnoses that contain a
significant amount of, shall we say, "human factors", then less so.

~~~
PeterisP
Doctors aren't liable for failing to predict the future or making imperfect
diagnosis.

If a doctor reviews the available data, reasonably concludes that it shouldn't
be pursued further, and it later does turn out to be cancer, then that by
itself does _not_ mean that the doctor is liable for anything. Malpractice
requires actual culpable negligence, such as missing something obvious, not
interpreting a questionable situation in a manner that turns out to be wrong.
The existence of a second, contrary opinion doesn't change that.

------
asperous
The company is saying you don't need a specialist, but after bayes theorem
(using 90%TN 87%TP and D(A)= 200,000 complication / 29,100,000 diabetes), the
chance you have this condition after the machine says you do is 0.83%.

~~~
maxerickson
They are marketing the device strictly to make referrals to specialists. The
FDA press release says as much anyway:

[https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/u...](https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm604357.htm)

 _If the images are of sufficient quality, the software provides the doctor
with one of two results: (1) “more than mild diabetic retinopathy detected:
refer to an eye care professional” or (2) “negative for more than mild
diabetic retinopathy; rescreen in 12 months.” If a positive result is
detected, patients should see an eye care provider for further diagnostic
evaluation and possible treatment as soon as possible._

Are you interpreting the Verge or talking about some other statement?

~~~
asperous
Yeah from the article:

> IDx-DR founder Michael Abràmoff told Science News. “It makes the clinical
> decision on its own.”

But I would guess a specialist would still need to be involved since it's not
a fool-proof system. A specialist might take other symptoms or variables into
account when making the diagnosis or order further tests. While this tool
might be useful for blanket screening considering that it is harmless, it
seems like it's hardly going to be "making the decision on its own" and
prescribing treatment.

~~~
maxerickson
It's specifically making the clinical decision I quote from the FDA press
release, whether or not to refer to a specialist for further diagnosis.

It's not something primary care doctors currently do.

------
dang
Since a lot of the discussion was objecting to the title and/or exaggerated
coverage, and the press release is more factual (let that sink in for a
second), we changed the URL from
[https://www.theverge.com/2018/4/11/17224984/artificial-
intel...](https://www.theverge.com/2018/4/11/17224984/artificial-intelligence-
idxdr-fda-eye-disease-diabetic-rethinopathy) to the press release.

------
ekianjo
Does the FDA conduct code reviews? And how do they guarantee that the code or
the training data does not change over time without them knowing?

~~~
jdiez17
Generally, the FDA (or the government body responsible for certifying medical
devices) does not conduct code reviews in the sense of looking at the code and
trying to find bugs.

The way it works is: the manufacturer of a medical device assesses the harm
that can be caused by a software malfunction, and assigns it a safety
classification (class A, B or C). Class A is used when no injury is possible,
and class C is used when death or serious injury is possible (e.g. a surgical
robot). The manufacturer also provides a "failure modes and effect analysis"
document that looks at everything that could go wrong, what is the likelihood
of the failure happening and what is the effect on the patient.

Based on the safety classification, IEC 62304 requires different levels of
rigour. For example, the standard only requires blackbox testing for class B
software, whereas for class C software it requires whitebox tests as well.

The manufacturer also needs to come up with a software development plan that
ensures that all of the requirements of the standard are met, and an
"argument" (supported by test reports, process documentation, source control
history, etc) that the software was developed according to the plan.

And that is what the FDA audits: they look at the development process of a
given feature and they check that the plan was followed. I think they rarely
delve into the details of the implementation and are generally just checking
that the safety arguments are sound and supported by evidence.

~~~
JumpCrisscross
> _class C software it requires whitebox tests as well_

What does this mean?

~~~
gowld
test the internal components, not just the externally visible performance.

------
laythea
I wonder if doctors will become like pilots? Ie. Almost redundant, but
essential, at the same time.

~~~
bobowzki
I'm a doctor and very much pro ML/AI. I'd love to have an autopilot I could
watch in awe. Still a lot of practical tasks though that will be much harder
to automate. And for the first few generation of AIs I guess someone will have
to babysit them.

~~~
YeGoblynQueenne
Ah, but who's going to treat the next few generations of AI, when there are no
more doctors left who can "fly" without their "autopilot"?

~~~
kendallpark
I think a lot of commercial pilots would take offense to this. Commercial
pilots can still fly without autopilot. Just as much as you can drive on the
highway without cruise control.

[https://pilotjobs.atpflightschool.com/2016/07/27/how-much-
do...](https://pilotjobs.atpflightschool.com/2016/07/27/how-much-do-airline-
pilots-actually-fly-an-airliner-by-hand/)

------
kendallpark
For those interested in the research side of this, Google Brain actually
published a study in JAMA on the same topic. They did clinical trials in India
and should be publishing those results eventually.

[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45732.pdf)

In terms of how well the "experts" perform: "For moderate or worse DR, the
majority decision of ophthalmologists had a sensitivity of 0.838 and
specificity of 0.981"

[http://www.aaojournal.org/article/S0161-6420(17)32698-2/abst...](http://www.aaojournal.org/article/S0161-6420\(17\)32698-2/abstract)

And here's a video that describes what's going on in plain English:

[https://www.youtube.com/watch?v=oOeZ7IgEN4o](https://www.youtube.com/watch?v=oOeZ7IgEN4o)

------
jpredham
There are several comments about the accuracy of the algorithm, but Doctors
also struggle with diagnosis. In the Deep Mind study on DR they found that for
about 20% of referable cases doctors disagreed on the diagnosis about 40-50%
of the time. In order to combat this they had at least 3 and up to 7
opthamologists grade each image.

Source:
[https://jamanetwork.com/journals/jama/fullarticle/2588763](https://jamanetwork.com/journals/jama/fullarticle/2588763)

------
jadedhacker
I wonder what factors made this decision possible? I love the idea of
automated diagnosis but the performance rates are 87% true positive and 90%
true negative in the article. Seems a bit low.

Maybe people aren't getting diagnosed at very high rates? That would be a
reasonable justification for deployment with somewhat less than perfect
accuracy. Anyone have any insight?

~~~
maxerickson
The status quo is screening at the eye doctor. This enables screening at
primary care visits. People mostly go to primary care more often than the eye
doctor.

The medical risk is that people will forgo other screening for 12 months when
given a negative result. The cost of additional screening for false positives
is the other big downside (this is all the machine does, recommend a
specialist visit or to rescreen in a year).

~~~
gowld
> People mostly go to primary care more often than the eye doctor.

Really? Standard advice is 1/yr for both.

------
denzil_correa
> But of course, not having a specialist “looking over the shoulder,” as
> Abràmoff puts it, raises the question of who will be responsible when the
> diagnosis is wrong

Ultimately, accountability and transparency will be the Achilles heel.

~~~
monkeynotes
Ultimately it has to be the company providing the technology that is liable,
but I bet you they have a terms of use clause that puts the responsibility on
the clinics using the technology.

~~~
dgacmu
The more important question is what they're actually held to. In the case of a
diagnostic test like this, the most reasonable thing is to hold them to the
false negative and false positive rates that they say the product has. No
real-world diagnostic test is perfect. Instead, we design the systems around
the tests to accommodate the fact that they do have errors. As long as the
device manufacturer is correct about their probabilities of errors, then we
can incorporate them into a system that works better than what we had before.
Or we can know that we cannot use them in a useful way, and just ignore it.

------
TaylorAlexander
I feel like this has the same promise as self driving cars - raise the floor
for the quality of service while also experiencing random unexplained failures
that its human supervisors fail to notice in time.

~~~
dboreham
Based on my experience watching doctors (good ones) trying to diagnose "weird
things", they are just manually executing an expert system algorithm anyway.
They aren't doing what an engineer might expect -- working from a basic
understanding of how the body works and looking for plausible explanations.
They're instead simply pattern matching against a database of facts.

~~~
nradov
A lot of diagnostic skill in complex cases is based on clinical intuition
developed over many years of practice. That's qualitatively different from an
expert system executing defined rules against facts.

~~~
dboreham
You'd think, but honestly in my experience it wasn't like that. An expert
system would have done as well, or better. Possibly my experiences were
colored by the cross-specialization nature of the issue -- specialists seem
reluctant to engage any thinking outside of their area, I found. Like a
software engineer would never consider that the problem they're investigating
is caused by memory bits flipping randomly.

------
PhillyPhuture
The company's board (politcos) and syndicate (there is none) is a bit weird
for a FDA approval...maybe there are some caveats or scope notes that I have
not seen.

------
ujal
"The 207,130 images collected were reduced to the 108,312 OCT images (from
4686 patients) and used for training the AI platform. Another subset of 633
patients not in the training set was collected based on a sample size
requirement of 583 patients to detect sensitivity and specificity at 0.05
marginal error and 95% confidence. The test images (n = 1000) were used to
evaluate model and human expert performance." \--

------
make3
Jeff Dean spoke about this topic in his keynote for the Tensorflow Dev
Converence
[https://youtu.be/kSa3UObNS6o?t=27m32s](https://youtu.be/kSa3UObNS6o?t=27m32s)

------
randomerr
I don't see the big deal. We've been using 'AI' in medical equipment for
nearly decade. Look at DeVinci surgery devices. Its just until recently the
technology has been called AI.

------
lucio
>AI software that helps doctors diagnose _diabetic retinopathy_ like
specialists is approved by FDA (theverge.com)

~~~
derefr
Well, sure, but the software's architecture probably isn't too particular to
this use-case; it's just computer vision. Given FDA approval for using CV for
_this_ , we'll probably quickly see many other companies attempting to drive
similar technologies to market.

Or, rather, we would if there _were_ any existing companies in this space that
could take advantage of this. The creator of this device, IDx
([https://www.eyediagnosis.net](https://www.eyediagnosis.net)), seems to be
rather unique in being an entrepreneurial medical-device manufacturer; MDMs
are a rather hide-bound lot.

Honestly, if there's going to be a wave of innovation in this space, I might
expect it to come from the inorganic-chem-focused pharma companies, since they
have the expertise in both materials science and machine-learning (from doing
novel small-molecule detection studies) required to come up with the
innovations. I expect they'd likely partner with one or more of the MDMs to
build the hardware, but they would write the software.

------
Herodotus38
I think the title is fine, but a lot of the comments are applying what the
software does to medicine in general.

What this software is used for is very specific, but also very useful in that
it is a common medical problem. It is used only to help diagnose diabetic
retinopathy (ie eye damage caused by diabetes).

This is AI Vision software used to analyze a photograph of someone's retina to
detect damage. In essence it is much more like the programs that are used to
analyze chest X rays to detect pneumonia that have been recently published.
Where this is useful is that it can probably cut out a lot of human work in
diagnosing retinopathy, however it is an incremental step. Even when I was a
resident in a primary care clinic years ago the process was somewhat automated
like this, with our medical assistants taking a photograph with a special
machine, and then this photograph would be digitally sent to a specialist (I
presume an opthalmologist but I could be wrong, maybe optometrists can be
licensed to do this) for interpretation.

What this isn't, is diagnosing a patient based on taking a history and
inputting examination findings and labs, etc... We are still quite a bit of a
way from that but I'm sure people are working hard on that as well.

EDIT: In my opinion, where AI could really make a huge difference for my work
as a hospitalist (a doctor that admits and rounds on hospital patients) is in
voice recognition software, with eventual language processing to help me write
notes faster. First, give me a program like dragon dictate but which I can use
in the patient's room (obviously one would have to figure out the HIPAA
compliance issues) that transcribes my voice and the patient or family member
into a readable text file that I can review when I write their note.

Next step would be that same program can give me its attempt to summarize our
interview into a reasonable note, which I can edit for accuracy. This would be
in effect an AI scribe. A scribe, for those who don't know the medical jargon,
is a hired person whose only job is to listen to a doctor interview a patient
and help write medical notes, they are usually young pre-medical students.
It's a relatively new position that became created as the burden of
documenting in electronic medical records limited the amount of time providers
could spend with patients. Very common in Emergency Medicine where high output
is needed, sometimes also in primary care.

Next step, is you have a company with all this protected medical transcription
data and eventual medical outcomes, and you use ML to find algorithms to try
and tease out what variables ended up being the most useful for accurate
diagnosis. Before that you could have the program prompt the doctor for
questions that it thinks would be helpful, etc... Again, huge medico-legal
barriers to this but there is a roadmap to becoming a billionaire in my
opinion.

~~~
nradov
Eric Schmidt (former Google chairman) proposed building an AI scribe during
his HIMSS 2018 conference keynote address. He claimed it should be possible
within 10 years. I hope he's right, but I suspect he underestimates the
difficulty of building reliable clinical systems.

[http://www.himssconference.org/education/sessions/keynote-
sp...](http://www.himssconference.org/education/sessions/keynote-speakers)

~~~
Herodotus38
Thank you for the link, this is what I'm interested in.

------
dboreham
Hmm...I wonder how many patients actually diagnose themselves using the
Internet (or their relatives do so)?

Source : personal experience.

~~~
icebraining
Jerome K. Jerome identified the problem some people have with self-diagnosis -
and the Internet only makes it worse.

[http://three-men-excerpt.pen.io/](http://three-men-excerpt.pen.io/)

