
AI Models Predict Breast Cancer with Radiologist-Level Accuracy - tiagobraw
https://www.ibm.com/blogs/research/2019/06/ai-models-radiologist-level-accuracy/
======
btilly
This is not exactly new. I remember seeing models that did really well many
years ago. And again caught many that humans had miss.

The problem is that they fail differently than humans do, in a way that humans
wind up not trusting the results.

It turns out that there are parts of the breast that are easy to spot tumors
in, and parts that are hard. A human scans quickly over the easy areas, and
focuses on the hard. The result is that humans make careless errors on the
easy areas, and catch hard tumors. Computers make no careless errors, but
can't catch the hard ones. Thus when a human sees what the computer caught
that the human did not, the mistake is easily dismissed. But when the human
sees the ones that the computer missed, it becomes, "It doesn't know how to do
the real work."

Ideally the two would be used together for better results than either alone.
But humans wind up resenting the computer...

~~~
twanvl
> The problem is that they fail differently than humans do

That is a great argument for giving such a model as an aid to a human doctor.
Together they will be better then either one alone.

~~~
onlyrealcuzzo
In Thinking, Fast and Slow -- the author details a double blind trial where
the did this. It was worse with humans and AI than with just AI. Humans think
they can use AI as a guide and move it in the right direction. But the
movements they made, on average, were bad.

~~~
notahacker
Surely in this type of instance (looking at a scan to answer a yes/no
question) the human and AI act independently, with the computer being a useful
aid because it separately picks up a few of the human's false negatives.
Assuming false negatives are a lot worse than false positives, this can only
be a good thing.

~~~
chongli
If they lead to an unnecessary mastectomy then false positives are pretty bad.
Not as bad as dying, obviously, but still a severe blow to a woman's identity
and sense of self worth.

It's going to be a hard pill to swallow if you have to tell a woman "sorry, we
removed your healthy breast because the computer made a mistake."

~~~
glangdale
I think the idea of "screening" is that you don't just race off to a
mastectomy the minute some AI model goes off. Of course, putting more false
positives through a fallible process of review does run the risk you speak of.

~~~
AstralStorm
It does cause unnecessary biopsies for sure. And some stress on the patients.

------
rayuela
Anything related to AI coming out of IBM should be viewed with a huge dose of
skepticism. They're honestly one of the worst offenders in overselling the
capabilities of their products, bordering out outright fraud. There is
certainly a lot of promise to the application of recent computer vision algos
on medical imaging data, but I wouldn't bet much on IBM being anything close
to a leader in this space.

~~~
Myrmornis
I don't doubt what you say. Just want to point out that this is published in a
peer-reviewed journal, so hopefully the academic community will judge it
objectively.

[https://pubs.rsna.org/doi/10.1148/radiol.2019182622](https://pubs.rsna.org/doi/10.1148/radiol.2019182622)

~~~
dragandj
OTOH, do even radiologists (or anyone else) can _predict_ cancer at all before
it happens? I thought that radiologists _diagnose_ cancer once it is already
there.

~~~
thaumasiotes
> do even radiologists (or anyone else) can _predict_ cancer at all before it
> happens?

Sure, some of the time it's easy.

Let's all recall the words of my mother's medical school instructor, "There's
a bit of cancer in everyone's prostate".

(The context was a lab exercise in which medical students were supposed to
find which of a set of slides of prostate tissue was cancerous. The reminder
was necessary because many of the slides were cancerous, just not at levels
high enough to be considered medically alarming.)

Predicting that a man will develop prostate cancer is basically the same thing
as predicting that he'll experience old age.

------
baybal2
I was lucky to date a girl who was into math, and who was coding those
"machine learning" algorithms for a radiology startup here in Shenzhen.

She had a lot of scepticism for what she did. One of biggest showstoppers she
said was the unpredictability of errors.

An algo can catch 99% tumors, including tiny ones, bur can randomly pass over
very obvious ones which a human radiologist will spot with his eyes closed.

They had a demo day with radiologists, and them throwing tricky edge case
xrays at the computer. Edge cases were all ok, but one radiologist pulled his
own xray from his bag, with a 100% obvious, terminal stage tumor, and to
company's embarrassment, the algo failed to detect it no mater how they
twisted and scaled the xray. The guy then just walked out.

~~~
WalterBright
It seems to me the use case should be to have the radiologist look at a scan
for tumors. Then, the algo should look. If they disagree, then the radiologist
should look at the difference.

It'll be the best of both.

And in the scans where the algo is wrong, have the scan added to the machine
learning database of the algo.

~~~
ttlei
If the radiologist has to look at and double check every scan that algo looked
at, then what is the point of the algo? Seems like a useless middleman that
get in the way.

~~~
telchar
Better to implement a system with a high rate of false positives (more
importantly, low rate of false negatives) from the machine learning component,
with all positive findings passed onto the radiologist. If the system can
reliably (big if) filter out 98% of the chaff then the radiologist can spend a
lot more time separating the false positives from the true positives. This
approach has worked well for me so far.

~~~
ska
This approach is problematic in medical screening applications. Mainly because
you don't want to increase the work up rate for false positives since if they
involve biopsy and a large screening population, eventually you will kill
people this way (indirectly) so there is a pressure to control FP rate.

------
dontreact
The reason I'm skeptical of this is that there is no actual comparison to
human level performance. I.E. they didn't have radiologist actually read their
images to compare against the model. Notice that the title of the paper is
"Predicting Breast Cancer by Applying Deep Learning to Linked Health Records
and Mammograms" it's only in the press release that they seem to imply a
comparison to radiologists was actually done.

~~~
thatcantbeit
[https://pubs.rsna.org/doi/full/10.1148/radiol.2016161174](https://pubs.rsna.org/doi/full/10.1148/radiol.2016161174)

This is their comparison point for actual radiologists. Citation number 6. It
doesn't look comparable, though. Radiologists are around 90% specificity and
sensitivity, which varies a good amount from the model's 77.3% and 87%,
respectively.

~~~
dontreact
This is not on this dataset though (right?), so not really a solid comparison
point. Plus lik you mentioned, they seem to be doing worse than this
benchmark.

------
michaelhoffman
No positive predictive value reported, imbalanced test data, IBM. Garbage.

------
nkurz
What makes this an "AI Model" instead of just a "Model"? That is, in what way
does it have "artificial intelligence"?

~~~
TuringNYC
"Model" \---> [[marketing department]] --> "AI Model"

~~~
avgDev
[Student in school] Implemented MinMax algorithm in checkers ---> [student
looking for work] Implemented state of the art AI algorithm, which
successfully will beat the human opponent EVERY time. ---> [HR/Marketing dept
at some corp] Wow you are HIRED!!!!!!!! ---> [Lead dev] Oof this guy can't
program for shit.

~~~
TeMPOraL
\---> [student about to get fired from work] Why on Earth did they put "AI
expertise" in the job requirements if all they want me to do is to shovel CSS
and JS, and the closest thing to AI they have in the office is a 1960s
thermostat?

------
professorTuring
The real problem here is when the society will allow a machine to diagnose
them and if the society is ready to believe that most diagnostics are
probabilistically made.

Up to date we allow humans to be at a 70% error level without problems, but we
ask machines to be 100% effective.

The very same happens with autopilot, the big numbers say they drive better
than humans but...

~~~
petschge
I remember seeing a statistical analysis here on HN that said the numbers for
Tesla autopilots are neither great compared to drivers of Teslas nor do they
seem to be fair. (They found a case where a human driver had a crash in what
would have been counted as 0 miles in the analysis, indicating that something
is inflating the "crashes per miles" metric)

~~~
AstralStorm
Using autopilot in parking?

------
blueyes
Old news from a major source of AI hype.

Here's some previous results [https://med.stanford.edu/news/all-
news/2018/11/ai-outperform...](https://med.stanford.edu/news/all-
news/2018/11/ai-outperformed-radiologists-in-screening-x-rays-for-certain-
diseases.html)

------
Myrmornis
@moderators: would it make sense to change the link to the journal article
rather than IBM's article? It's free access.

[https://pubs.rsna.org/doi/10.1148/radiol.2019182622](https://pubs.rsna.org/doi/10.1148/radiol.2019182622)

------
NikolaeVarius
Is this new AI Model under Watson?

~~~
layoutIfNeeded
There’s no such thing as “Watson”. IBM have put the Watson name on basically
everything, to the point where its information content was reduced to zero
bits.

Watson for IBM is like the i-prefix for Apple.

------
mtgx
What's the False Positive Rate?

~~~
ska
FP = system (or person) flags this as true when it is not

TP = ... flags as true and it is

FN = ... flags as false but it is true

TN = ... flags as false and it is false

To turn these into rates, you normalize them.

e.g. TPR = TP/P = TP/(TP + FN) = 1 - FNR

etc.

These are characteristics of a classification system

You will also hear sensitivity (TPR) vs. specificity (TNR) often, particularly
in medical contexts. In other contexts you'll hear Type I (FP) vs. Type II
(FN) error.

In most cases you a set of trade offs in your algorithm, and will need to pick
a balance between sensitivity and specificity.

c.f. ROC:
[https://en.wikipedia.org/wiki/Receiver_operating_characteris...](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)

~~~
adyavanapalli
I think the OP is asking about the _value_ of the FPR instead of the
definition.

~~~
RosanaAnaDana
The value of the false positive rate is that it lets you know the probability
of a true-miss. Depending on the classification exercise, you may be concerned
with false positives, where the consequence of a missed call is significantly
greater than an unwarranted checkup from a human doctor.

~~~
shawnz
The numeric value

------
mlcrypto
Doctors will be some of the first to be replaced by AI. My physicians walk
around with a computer already checking all the boxes for symptoms and seeing
what it says. I wish I could find one with a true intuition for medicine

~~~
caraffle
Apparently you're not familiar with the documentation burden in the medical
field. EHR's don't diagnose for you.

There is no "true" intuition in medicine, just years of study and practice
leading to quick recognition of common problems like any other field.

