
Artificial intelligence versus clinicians: systematic review - doener
https://www.bmj.com/content/368/bmj.m689
======
ska
This sounds about right to me intuitively. I've read a ton of ML/AI papers in
the area, and the vast majority don't get past the "maybe this is an
interesting idea, hard to know until it's tested properly". Easy to recall
interesting ones, hard to remember good one off the top of my head.

This is fine; of course the standards for "is this publishable" are vastly
lower than "is this clinically applicable", but it unfortunate how often the
lay science press fails to make this distinction.

The other thing is that raw performance numbers, even if properly validated,
are only a small part of the story. We've had ML techniques that did better
nominal sense/spec than average clinicians for very specific tasks for 20+
years now, but the workflow impact, liability, etc. combined has sometimes
limited the implementation, and even if you do roll it out the actual impact
can be minimal.

~~~
davnn
Have a look at the Stanford Machine Learning Group projects [1]. What most
articles miss in my opinion is that AI progress heavily depends in the data
being available and that‘s where there are huge opportunities in the future.

[1] [https://stanfordmlgroup.github.io/](https://stanfordmlgroup.github.io/)

~~~
ska
They have a good group and are thinking/heading in the right direction, mostly
not there yet I think.

I certainly don't disagree that there is potential for impactful use of AI in
some areas of healthcare. Some of them just aren't going to practically happen
without a) fundamentally reworking our approaches to data access and sharing
and more importantly b) spending a lot of resources on quality labeling.

Regardless of whether or not those preconditions are actually met, we'll get a
lot of breathless claims on small retrospective sets, of course :)

------
Rochus
Very interesting paper, thanks. It was time someone looked into it. I can't
take these equally populist and unsupported reports anymore that some neural
network is doing better than a medical specialist with decades of experience.
It may well be that this applies in individual cases. But obviously there is a
lack of proper studies and unbiased evidence.

~~~
davnn
> It may well be that this applies in individual cases.

Well from a machine learning perspective it will probably be in all cases
where enough, qualitative data is available and not a lot of contextual
knowledge is necessary that cannot be represented in the data. My guess is
that this will be the case in a vast majority of cases and it's just a matter
of time - but I'm definitely biased as an ML researcher :).

~~~
Rochus
It's still a "proof by induction". There is no reason to conclude from the
individual cases that it works in all cases.

------
unsrsly
A few thoughts below as a medical student with some ML background. I'm quite
interested in hearing what others think: 1\. The majority of clinical ML
applications are still focused on narrow (e.g. binary) image recognition
tasks. This makes sense because it's where CNN's excel. Also, many of these
tasks are well reimbursed. I expect the performance to continue improving as
datasets and annotations improve. 2\. There is less activity on broad image
recognition tasks, like reading a CT Abdomen and Pelvis. Here the domain of
possible diagnoses is rather large. For example, patients sometimes have
unusual anatomy from prior surgeries. Or, a cancer may obliterate common
anatomical landmarks. An experienced radiologist or surgeon can figure out
this anatomy in a few minutes, whereas ML may need new approaches to get to
this level. 3\. The history and physical exam are clinician-generated, not
just for dataset annotations but for every new patient. This is notable
because the history and physical exam determine the majority of what a patient
needs. It is not only a matter of knowing which questions to ask, it also
requires follow-up questions and a degree of mental processing before findings
are written down in a note. A lot of patients will not fully cooperate without
patience and empathy. So "AI" replacing doctors actually trend perilously
close to AGI. This is both interesting, and difficult. Of course, NLP of
human-generated notes / unstructured data is a very interesting area, but it
still requires a human in the loop.

~~~
abbirdboy
Another medical student here, also with a very similar ML/programming
background. Like the other commenter, I couldn't agree more with your analysis
of the main issues plaguing current DL applications in medicine. When I was
working on deep learning projects as a undergrad before coming to medical
school, I naively assumed that solving a simple image classification problem
for cancer detection would be enough. However, as one learns in medical
school, imaging is only one component of an entire clinical vignette. Even
before a patient undergoes a specific test, the history and physical exam
really drive the initial protocols. Sometimes, without having this background
knowledge, the classification from a simple program has no utility. A
radiologist or pathologist typically has access to this information and can
interpret this in the context of the clinical suspicion being put forth. I
still believe AI has a role to play in easing this workflow, but "replacing"
physicians will take a lot more than detecting some disease "better" or more
"accurately" than a physician.

------
AlanYx
This paper references two tools for assessing the risk of bias: the Cochrane
risk of bias tool and PROBAST. Does anyone have any experience with these
tools? Are they something that are mainly only practical/useful in the medical
field, or are they useful in other domains applying AI models?

~~~
5440
The most common model that we are putting in front of FDA several times per
week for many companies is the AUROC model with minimum FDA requirements for
specificity and sensitivity. I have some concern that the commentators on this
article aren't actually aware of how many of these ML based software packages
are actually being cleared by FDA at the moment through the DeNovo and 510k
processes.

~~~
ska
There are a lot more systems going through 510(k) for sure, partially due to
new developments, but partially driven really by marketing.

On the other hand there have been ML/AI systems since the mid-late 90s at
least, and they haven't for the most part shifted clinical practice much. The
reasons for this have not shifted radically in the last decade, although some
are moving a bit now.

FDA approval to market doesn't mean anything beyond you have an argument that
you haven't introduced new risks (safe) and in at least some cases you can
improve something without making others worse (efficacy) but this may in a
very narrow indication.

I guess my point is, the purpose of the FDA has never been to ask "does this
really work better? should it be the new standard of care?", but rather to
balance the risk in new technology or drugs against the potential reward. Even
with a PMA it's risk reduction, not proof of how it will play out in the wider
world if interactions and practice.

------
ptrenko
Haven't had the time to read the paper but spoke a lot about something similar
with a friend.

In terms of crisis, the medical system just doesn't scale. You think leaders
were worried about running out of ventialtors?

No! You can start pumping them in hundreds of thousands if you really need it.

The problem is that you have only so many doctors and trained nurses to work
with. Limited capacity. Training takes 10 years.

No scalability.

It doesn't matter where the tech is right now. The world will push 100% for
whatever can be done via apps, sensors and unskilled workers.

Check out Jonathan Rothbergs vision for a diagnostic kit by everyones
toothbrush

------
bawana
The sad truth is that most patients cannot communicate their symptoms clearly
and a computer will fail them. More importantly, the nonverbal communication,
the compassion, and the reassurance from personal interaction are what drives
the demand for personal service. Just look at the number of negative reviews
on health grades. Many people are so unhappy with the way they were treated,
even though the actual care they receive in this country is good- access to CT
scans, antibiotics, home nursing, hospital service, etc..

~~~
zone411
If you make a comparison with the same amount of time for both, that's true.
But software can ask questions for an hour. A doctor won't.

------
motohagiography
Important study to do, but feel like I have to assert that in principle, ML is
not as useful for diagnosis as it is for aggregating longitudnal data because
the consequences of a false positive/negative in the individual diagnosis are
life threatening, and even if it catches something doctors don't, it just
means a doctor has to look at them.

However, across a population, nobody is being harmed by the ML analysis, and
it can pick out patterns a person could not at speed and at scale. Spending
research to get your individual diagnostic ROC curve a few percent higher is a
waste of time if the consequences of it being wrong are greater than the time
savings it provides.

It's fine for veterinary medicine, and maybe prisoners and keeping people with
hidden symptoms out of crowded places, but for regular people, using ML has
got a host of ethical problems that aren't being dealt with at the policy
level. You can use rules engines (literally, DROOLS) instead of non-
deterministic ML schemes to do diagnoses and prescriptions.

Have said before, ML is only useful for problems with asymmetric upside, and
it is worse than bad for ones with asymmetric downside.

~~~
tastroder
> It's fine for veterinary medicine, and maybe prisoners and keeping people
> with hidden symptoms out of crowded places, but for regular people,

Excuse me but did you just compare the application on prisoners and animals
versus whatever "regular people" are supposed to be? What the hell?

~~~
motohagiography
Countries treat prisoners as less than human everywhere in the world. It's
awful, but it's real, and substandard health care from machine learning
applications is precisely the kind of thing states everywhere would likely
use. Most people are not so easily shocked by this.

