
Computers do better than pathologists in predicting lung cancer type, severity - rch
http://med.stanford.edu/news/all-news/2016/08/computers-trounce-pathologists-in-predicting-lung-cancer-severity.html
======
Strilanc
Yet another press release that fails to reference the study it's talking
about. Not to mention giving no quantitative details about its lead
('Trounced'? By how much?).

The study is "Predicting non-small cell lung cancer prognosis by fully
automated microscopic pathology image features" by Kun-Hsing Yu et al. It's
open access:
[http://www.nature.com/ncomms/2016/160816/ncomms12474/full/nc...](http://www.nature.com/ncomms/2016/160816/ncomms12474/full/ncomms12474.html)

~~~
daveguy
Computers trounce* pathologists...

* by an un-quantified amount using only histology slides -- a limitation not enforced in actual medical practice.

~~~
cicero19
yeah.. seriously.. what academic journal would accept that kind of statement.

~~~
ChristianBundy
Nature, apparently.

~~~
leereeves
Nature Communications, which publishes a dozen articles every working day.

(2191 so far in 2016)

------
aschampion
I worked on a similar pipeline for renal cell carcinoma a few years ago [1],
although we only published a small subset of results since parts of the
pipeline (e.g., finding representative tiles, survival prediction) had better
results being produced elsewhere in the lab.

Regarding the hook in the headline -- computers surpassing pathologists --
it's a bit like automated driving in that even if true the immediate problem
is the social and economic system. That is, we're not going to be removing the
pathologist from the diagnostic and prognostic process anytime soon for many
reasons, so how instead do we leverage machine learning in concert with the
human observer to improve the diagnostic system? For that reason, decision
introspection may be as valuable a topic of research as improving
classification accuracy: justifying a particular automated classification to
the pathologist, directing them to representative regions, and describing
regions of feature space in biological terms.

[1]
[http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6945104](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6945104)

~~~
_0ffh
This! As far as I can remember, the experience in chess is that a
human+computer team beats either of them working alone (sorry, can't find a
reference just now).

~~~
daveguy
I would love to believe this is true, unfortunately I do not think it is.

Computer chess algorithms completely blow away human competitors. The
strongest human, Magnus Carlsen has an elo rating of 2857 [0].

Stockfish, the strongest chess algorithm (open source, btw) has an elo rating
of 3445 [1].

Computer chess algorithms are so much stronger than humans that if the human
second guesses the algorithm -- the human is probably wrong.

You may have been thinking of this bbc article [2] in which an amateur cyborg
player beat grandmaster cyborg players -- the amateurs were crunching
additional metadata about what situations were best for their play. However,
they didn't beat stockfish, they beat other cyborg players.

[0]
[https://ratings.fide.com/card.phtml?event=1503014](https://ratings.fide.com/card.phtml?event=1503014)

[1]
[http://www.computerchess.org.uk/ccrl/404/](http://www.computerchess.org.uk/ccrl/404/)

[2] [http://www.bbc.com/future/story/20151201-the-cyborg-chess-
pl...](http://www.bbc.com/future/story/20151201-the-cyborg-chess-players-that-
cant-be-beaten)

~~~
billforsternz
The top humans and top computers never play each other anymore, and so the
rating pools are independent and not comparable. I suppose Stockfish would
grind Magnus down in a serious match, but only because the human player is
susceptible to fatigue. The teams that tune the engines for engine v engine
matches would never dream of letting them make up their own moves in the
opening phase, instead they use human openings, an acknowledgement that humans
understand the game better.

~~~
billforsternz
pbhjpbhj's comment doesn't have a reply link for some reason so I will reply
here. Left to their own devices computers will play crude simple development
moves in the opening. That's because there are no tactics until the pieces
come into contact and it's all strategy, where humans still have an advantage.
Decades of experience have resulted in a corpus of classic subtle opening
strategies that the computers don't rediscover - for example in some Kings
Indian positions Black has precisely one good plan - a good human player will
initiate this plan with move ...f5 automatically - an engine will juggle ...f5
with a bunch of other slow moves, (all of which are irrelevant) and might not
play it. If you look through the pages of "New In Chess" magazine you will see
the top masters analysing their games and saying things like - "The computer
does not understand this type of position - it rejects $some-good-human-move
but eventually when you lead it down the right path it changes its mind".

All of this is not to say that computers have not overtaken humans in chess.
They have. But primarily because they have superb qualifications as practical
players - they never make crude errors and all humans on occasion do. This
trumps all other considerations. But the vast gap you see when comparing the
human lists and the computer lists is exaggerated - 500 Elo points means a 95%
expectation. I am 100% convinced that if Magnus could be properly motivated
(think big $$$ and somehow convincing him that scoring a decent proportion of
draws as White would be a "win") he could deliver for humanity :-)

~~~
Lordarminius
> I am 100% convinced that if Magnus could be properly motivated (think big
> $$$ and somehow convincing him that scoring a decent proportion of draws as
> White would be a "win") he could deliver for humanity :-)

You are 100% wrong. Computers overtook humans at chess in 1995. Unless there
is some insight Magnus Carlsen or any other GM has which they are not sharing,
it will remain that way.

~~~
billforsternz
If you think there is no way Magnus Carlsen could steer a few games to draws,
you really don't know much about chess.

~~~
Lordarminius
I am a 2200 elo rated player. There will be more to the game than Magnus
steering "a few games to draws". I suppose it would depend also on how many
games are agreed for the match. For a suitably large number of games, the
computer could simply force many dull draws then lash out with a strong
tactical game. Humans have the extra dimension of emotions and fatigue to deal
with. Switching from positional play to long tactical sequences does not play
well to human strengths; in addition getting a draw from a "mission
programmed" computer may not be trivial as there is the added dimension that
it does not need to choose the most direct route to the draw.

Another factor is that it is trivial to change the computers repertoire of
openings and there are a wide choice of these. Humans including Magnus require
weeks to months of preparation before they are ready to play new openings or
deviate from prepared lines.

Finally, (and I freely admit that this is my own personal opinion), Magnus
Carlsen may not be our best choice of human to play against a computer. There
is an unmistakable emotional fragility to him which manifests when he is
losing (cf his games with Anand); a good deal of his strength lies mainly in
the early middle game and ending but computers are superior in the latter; and
he often wins games out of sheer stamina- a strategy that wont work with the
silicon beast.

~~~
billforsternz
The original point I was making was simply that the >500 point Elo delta is an
exaggeration. So I only need Magnus to steer a few games to draws to be
correct.

------
clumsysmurf
My friend worked in a pathology lab, run by a recognized world expert in the
field (wrote college texts, gave lectures, etc) yet the performance was
'mediocre' due to reoccurring mistakes made by humans:

* Doctors disagreed with each other frequently. Most senior won.

* Results from one patient were given to another patient

* ... or results were lost entirely

* Lab destroyed samples before examination or stained incorrectly

* Samples never received, lost in mail or never picked up from airport

These folks did have a tough job, looking into microscopes for many hours per
day. At some point I imagine things just start looking the same as fatigue
sets in.

~~~
huherto
May be the next medical breakthrough is really going to be TODO lists.

Which reminds me of Lotus Notes. 20 years ago it widely used to automate
workflows. But I haven't seen anything similar widely used since.

~~~
ddeck
It already is:

From: The effects of safety checklists in medicine: a systematic review: [1]

 _Safety checklists appear to be effective tools for improving patient safety
in various clinical settings by strengthening compliance with guidelines,
improving human factors, reducing the incidence of adverse events, and
decreasing mortality and morbidity. None of the included studies reported
negative effects on safety._

[1]
[http://www.ncbi.nlm.nih.gov/pubmed/24116973](http://www.ncbi.nlm.nih.gov/pubmed/24116973)

~~~
derefr
To me, this is a generalization of the benefit of software/automation: what
we're really discovering is that when we _force people to think through all
the edge-cases in a process_ in order to explain that process to something as
dumb as a computer—or a checklist—we dig up a lot of implicit institutional
knowledge that, crucially, _not everyone has_. (Everybody that had it just
assumed everyone else knew!)

The result of such a process can be a checklist, or a workflow diagram, or a
program. The form of it isn't really important, so much as the fact that the
business logic being applied (by a human or a computer) to each case now
embeds all the previously-scattered knowledge of the domain _explicitly_ ,
making each node applying it behave as the _union_ of its institution's
knowledge-bases, rather than as the _intersection_ of those knowledge-bases.

~~~
tormeh
Also, a checklist doesn't get tired and forget a step.

------
muxxa
I wonder how the pathologists would fare if they were also put through the
same process as the ANN, i.e. given training data along with immediate
feedback on whether their prognosis was right or wrong, then tested on the
reserve data. Pathologists give daily prognosises but only get feedback, if at
all, many years later.

~~~
50CNT
I don't know much about the medical field, so excuse my ignorance in this, but
couldn't we train Pathologists on the same data we train ANNs on? That way we
could establish similar speed in feedback loops.

------
yread
It's nice that the article is open access and even includes the R code use to
perform the analysis

[1]
[http://www.nature.com/ncomms/2016/160816/ncomms12474/full/nc...](http://www.nature.com/ncomms/2016/160816/ncomms12474/full/ncomms12474.html)

[2] [http://www.nature.com/article-
assets/npg/ncomms/2016/160816/...](http://www.nature.com/article-
assets/npg/ncomms/2016/160816/ncomms12474/extref/ncomms12474-s5.txt)

------
Fede_V
I am surprised they only used ~2000 samples. That's a very tiny amount of data
to train a decently sized CNN, even with data augmentation.

I presume using a pretrained net probably won't help you since the features
found in pathology slides are so unlike those of natural images.

~~~
shoyer
Somewhat surprisingly, the paper doesn't actually use deep learning.

~~~
dandermotj
Everyone is missing this. They used simple random forests and survival models
with basic k-fold cross validation. There's no breakthrough in ML here!

------
charlesism
The world is full of constant hype about "breakthrough" studies, and cures for
diseases that are "coming soon". Somehow the cures seldom materialize, and too
often, if anyone bothers to check, they can't replicate the studies.

I wonder if the secret sauce that makes science effective isn't so much the
scientific method, as the spirit of skepticism behind it. Without that, it's
easy to make errors. And wishful thinking is pervasive because there are
strong incentives: career advancement, corporate influence, ego, and so on.

My view, assuming AI doesn't kill us, is that it's going to save all of our
asses (from ourselves).

~~~
bbctol
While I have serious reservations, I do appreciate philosopher Paul
Feyerabend's idea of science as "epistemological anarchism." In the end, all
rule-based concepts of how to accumulate knowledge and understanding fall
victim to methodological problems: the only way to really do science is a no-
holds-barred, anything goes attitude of a mind against the universe.

------
WhitneyLand
Why isn't this already is practice given ML advances in recent years?

Why are mammograms still being subjectively interpreted?

Are there not already companies today where you can upload medical imaging
results and get back results that beat humans?

~~~
mgreg
>Why isn't this already is practice given ML advances in recent years?

Well it would be nice to first validate the results somewhere else (peer
review should actually be completed) and no doubt some real world trials
proceed and their results evaluated. This is what any FDA approval, at a
minimum, would entail. Keep in mind what the author's themselves note in how
this study may differ from the "real world":

"One limitation of this study is that cases submitted for TCGA and TMA
databases might be biased in terms of having mostly images in which the
morphological patterns of disease are definitive, which could be different
from what pathologists encounter at their day-to-day practice."

Putting this into practice also requires looking at the bigger picture; not
just the accuracy of the diagnosis vs. humans.

For example I have found that in studies doctors are much more conservative in
their diagnosis than an academic algorithm with no implication on patient
outcome. For instance if the question is a) "cancer" or b) "not cancer" the
doctor, fearing patient harm, malpractice suits, career derailment, will be
biased to identifying "a) cancer" because the perceived costs of a false
positive (treating for a nonexistent cancer which may have significant ill
effects) are lower than a false negative (not treating a real cancer leading
potentially to death). This will reduce the human doctor's accuracy on
diagnosis.

What the "right" bias is in an individual case can bring in many more factors
and moral questions out of scope for this study.

This is not to argue that tools like these should not be developed and
utilized but practical application is often more difficult than just solving
the technology challenge. I'm certain that we'll see many advances in
healthcare due to ML.

------
danielrm26
Human error comes from a lot more than just pure assessment of whether
something is cancer or not.

Fatigue, willpower drain, emotion, and many other factors can make a top-tier
human expert into a mediocre one (or worse) in a way that is very hard to
detect.

Even if computers aren't perfectly matching the ideal human (which they will
inevitably pass, of course) they're still massively valuable in that they can
maintain their level of expertise when humans cannot.

------
njharman
Um, computers do better than humans in the vast majority of applications to
which they are programmed for.

By which I mean, the news is not that computer > human, but that medical
industry/profession/jobs are taking first baby steps at being automated away.

~~~
WhitneyLand
No, I think any task where computers are newly enabled to beat humans is
newsworthy. I'd like to know about each of the dominoes as they continue to
fall.

------
tvladeck
The article mentioned they only had a dataset of 2,186 slides. Isn't this a
_tiny_ dataset to do any learning on images? They also mentioned they blew up
the predictors to over 10,000 features. This sounds like a recipe for
overfitting to me. I can't access the actual paper (getting a 500 error), but
does anyone know how they built these algorithms?

~~~
niels_olson
The images are 1-5 gb each. They tile them. Then they did some simple sorting
(remove all tiles with lots of white), (emphasize tiles with lots of blue),
then they did random forests, survival models and k-fold cross validation.

The test set is interesting: they used a completely unrelated dataset of
tissue microarray spots. This are tiny, 1-2 mm circles. Whereas the TCGA
images are full size tissue cassettes, up to 1" on the short side.

------
nikcheerla
There's a LOT of biological data out there. With a little bit of effort,
researchers can design AI systems that beat doctors at most specialized tasks,
in both computer vision and genetic data analysis. The question is how such
systems can actually be used in practice.

------
naveen99
Pathology slides are analogue data (lot of data, more than radiology scans).
Even tertiary care hospitals do not routinely digitize slides unless they have
to be returned to another hospital or used for a teaching conference.

------
shazam
Does anyone know about how to get involved in work like this? I'm outcome-
oriented so disease diagnosis would be motivating for me. I'm looking to help
out on an open-source project along these lines.

~~~
yread
[http://cellprofiler.org/](http://cellprofiler.org/) is open source and was
used for this paper

