
Using Deep Learning to Help Pathologists Find Tumors - rusht
http://research.baidu.com/Blog/index-view?id=104
======
whafro
I work in this field (not directly on the ML, for the company born out of the
winner of Camelyon16) and the last two years of progress has been amazing to
watch. Tumor detection has become incredibly accurate, across basically every
tissue/tumor type, and we're now making real progress on the next major goal:
determining the best therapy for a given patient.

It's a bit of a dirty secret in this space that pathologists have a pretty
high error rate on a lot of these tasks — it's just tough work for human eyes
to do literally hundreds of times every day. Applying computer vision
techniques can not only improve accuracy and reproducibility over human
assessment, but you can do types of analysis in seconds-to-minutes that would
literally take years for a human. We're just scratching the surface.

There are lots of ML challenges here, but just as many general
tech/engineering/design challenges. So if you're interested in working on
bringing work like this to the masses, we'd love to talk at PathAI.

~~~
poutrathor
To give a counter point of view here, a close relative of mine is a
pathologists. We talked a lot about this subject and so far he is unimpressed
by the results. Moreover, it seems that the ability to provide fast (in
seconds to minutes) analysis implies powerful computational ability and data
network. Finally, the data acquisition right now takes more or less 20 to 45
minutes depending on the patient and the targeted area, which means the
promised analysis quickness is not such a real pro.

I trust that science, research and ongoing work will end up providing
interesting results, but many many startups will burn their cash before being
able to provide real world usage services.

Like you said, there are LOTs of challenges here. Certainly not a "low hanging
fruit", not that a profitable business either since most countries will squash
costs anywhere they can because the health cost keeps growing, and a very
difficult legal environment to deal with.

However, I am very thankful for your hard work and will to push toward a
better future for health.

~~~
taneq
I thought the primary importance of machine vision for this kind of cancer
screening was that it doesn't give so many false negatives due to fatigue?
IIRC there was a study that went back over cancer patients' screening tests
and found that most of them had failed to identify visible, diagnosable tumors
well before their disease was actually identified.

~~~
raducu
A very important use case is that a lot of places don't have easy access to
good diagnosticians, and software + hardware are easy to scale while humans,
not so much.

------
mkstowegnv
I went to a talk by someone who had switched fields to one that involved
analyzing portions of cells. After showing a series of slides with diverse,
confusing blobs and lines, he said "when I first started this work I would
look at a section and not see anything at all. But I have improved to the
point that now I can look at a section and see anything I want to".

------
toolslive
I did a project like this early 2000s, and it's amazing how far you get by
just combining frequency filtering and knn-clustering. Nothing fancy required.
really.

~~~
sacado2
Another virtue of non-fancy methods is that they can provide a basic
explanation of their reasoning, which is very important in the medical field.
I think one of the main drawbacks of deep learning is its opacity.

~~~
ogrisel
You can do k-NN in the feature space of the network (activation of one of the
last layers) to generate partial "explanations" of the neural network
decisions.

------
Gatsky
At the moment, a big limitation of this approach is the input data. Images of
tumours are generally very thin sections of a complex 3D tissue that is
processed in a way that introduces artefacts and then stained with 2 colours.

To truly leverage the power of machine learning, an end to end solution where
the tissue is processed in a more data rich manner would be better (eg
spatially aware single cell assays, non destructive thick slice imaging). This
would feasibly replace the current system entirely, as it truly would do
something no human could do, not just do it more accurately.

------
phonebucket
While open sourcing the model is nice, it would be better still to open source
the data set for the wider community to make more meaningful contributions.

Their GitHub repo states the following: "You need to apply for data access,
and once it's approved, you can download from either Google Drive, or Baidu
Pan."

~~~
yil8
We, Baidu Research, do not own the Camelyon16 Challenge dataset, and people
need to apply on Camelyon16 Challenge website to download the original
pathology slides. I guess my wording was bit confusing on github, which has
been corrected, lol

~~~
ihnorton
The data from Camelyon '16 and '17 apparat to be available without
registration on GigaScience:

[http://gigadb.org/dataset/100439](http://gigadb.org/dataset/100439)

~~~
yil8
Cool! Thanks for pointing out.

------
louden
It would be nice to see the sensitivity and specificity of the technique and
for humans. False positives and false negatives are not equal in medicine, so
we should report in such a way that people can evaluate them.

In this type of cancer, a lower specificity is an acceptable trade off for a
very high sensitivity.

~~~
yil8
There was indeed a professional pathologist involved in the Camelyon 16
Challenge, where s/he spent 30 hours reviewing 130 slides, and ended up 72.4%
sensitivity with 0 false positives. Our algorithm achieves ~91% sensitive at 8
false positives per slides, seems a win according to your "a lower specificity
is an acceptable trade off for a very high sensitivity."

------
sooheon
How is the "grid of patches" different from one more level of convolution?

~~~
yorwba
Using another level of convolution would produce outputs that are
statistically independent if they are farther apart than the size of the
convolution kernel. In a conditional random field, the dependence of outputs
on each other can be modeled as well.

For example, a conditional random field could express "either these patches
both contain a tumor or none of them does" (which is helpful when there's
something suspicious on the patch boundary) and the consequences of committing
to either possibility can propagate over the whole field. In contrast, a
convolutional layer would have to make the decision independently for each
local area.

~~~
curiousgal
Hi yorwba, are there any ressources you recommend for a deep understanding of
neural networks? Something theoretical and beyond practical frameworks?

~~~
yorwba
My personal approach is to read papers that seem interesting. Of course I
usually do not have the necessary background in everything that's mentioned,
but I treat those cases as black boxes. E.g. if the paper says that they use X
to do Y I'll simply assume that you can do Y using X. If I think that the
details of X are important, I dig deeper. Sometimes just by reading the
corresponding Wikipedia article, sometimes by looking at the references in the
paper. Then repeat recursively.

That approach has the advantage that you'll learn about techniques roughly
proportional to their current popularity, but it has the disadvantage that
explanations in papers tend to be brief and you have to put them into a
coherent whole yourself.

If you prefer textbooks, I heard about
[http://www.deeplearningbook.org/](http://www.deeplearningbook.org/) but
didn't get around to reading it. In addition to neural networks, you'll
probably also want to read about classical statistics and probability theory,
since that's the origin of concepts like conditional random fields, which can
be mixed with neural networks but are unlikely to be covered by literature on
deep learning.

------
godzillabrennus
Glad to see they open sourced their work.

------
leozou
great work

