
Building a CT Scan Covid-19 Classifier Using PyTorch - rerapp
https://blog.paperspace.com/fighting-coronavirus-with-ai-building-covid-19-classifier/
======
rstevens24
Great post! Machine learning definitely has a lot of potential to assist in
medical diagnostics, and with all the training data coming out, it's a field
ripe for innovation.

I work at Innolitics, and we do a lot of work with machine learning in the
medical imaging space. We've honed in on a set of tools that works well for
us; I thought it might be worth sharing in case anyone else is wanting to
explore this space in light of COVID.

The referenced UC San Diego dataset has its images stored as PNGs, but if
anyone is interested in doing more ML work with medical images, you'll
probably find most of them in the DICOM file format. I can highly recommend
using the dicom-numpy library for easy conversion of DICOM files into numpy
arrays: [https://github.com/innolitics/dicom-
numpy](https://github.com/innolitics/dicom-numpy). For more general example
datasets saved in the DICOM format, The Cancer Imaging Archive is always an
excellent resource:
[https://www.cancerimagingarchive.net/collections/](https://www.cancerimagingarchive.net/collections/)

Another advantage of using DICOM files is that there's lots of metadata you
can extract from each file to train on a wider clinical context. The PyDicom
library makes that very straightforward:
[https://github.com/pydicom/pydicom](https://github.com/pydicom/pydicom)

The Python + PyDicom + Keras or PyTorch stack is really powerful and easy to
get started with. We use it at Innolitics frequently and put together some
tutorial articles to demonstrate how to get started:
[https://innolitics.com/articles/ct-slice-
localizer/](https://innolitics.com/articles/ct-slice-localizer/)

I'm excited to see more projects like this! More data and improved tools are
only going to improve our ability to gain new insights into COVID.

~~~
skwb
What advantage does dicom-numpy offer? I've mostly developed with pydicom for
my medical imaging pipeline as it allows me to retain important dicom
information (pixel spacing, etc). In fact, the 'PixelArray' attribute returns
a numpy matrix that I can then use.

~~~
rstevens24
dicom-numpy's biggest advantage is that it combines individual slices into a
single 3D numpy volume. This makes it really easy to immediately jump in to
performing operations at the volume level rather than the slice level. It also
performs some sanity checks for you, such as checking for missing slices or
uneven slice spacing.

For me, I've also found dicom-numpy useful for returning the ijk-to-xyz affine
transformation matrix, which describes how the voxels are oriented in patient
coordinate space. dicom-numpy builds on top of PyDicom, so they are definitely
not mutually exclusive! We use them both extensively.

~~~
skwb
Does it handle cine images easily?

------
jphoward
The use of gradient accumulation here is interesting, and I'd argue maybe not
optimal.

He uses a batch size of 8 (because his GPU only has enough RAM to allow 8 CT
scans to be tracked through the network at once). He acknowledges that this is
smaller than he would like, and so he uses gradient accumulation so that he
only adjusts the weights of the network every 8th forward pass. Effectively,
he averages the results over 8 batches.

The thing about batch size is it's a trade off. It turns out, even if your GPU
could support you putting your entire dataset through at once, this isn't a
good idea. The explanation for this can be simplified as "if you give it the
same data every time, it'll give you the same answer every time". This may
sound fine, but actually a bit of randomness is very helpful to get the
network outs of local minima, i.e. local ruts. This is why we do "mini batch"
stochastic gradient descent. Now his dataset is only 200something patients and
300something scans, meaning he only gets 4-6 batches out of his dataset for
every epoch (every cycle). I would expect (though I have no proof) that using
a lower batch size might increase stochasticity here.

The problem is, smaller batch sizes also have their problems. The main reason
for this is due to the ubiquity of a layer in CNNs called batch normalisation
(BatchNorm) layers. These layers calculate on the fly (during training) what
the mean and standard deviation across every feature in the current batch is,
and rescale them so together they mean of 0 and a standard deviation of 1. The
problem is, when your batch size is very small, your means and standard
deviations might be wildly inaccurate (e.g. if your batch size is 4, it
wouldn't be that surprising if every sample in the batch is a COVID patient,
or every sample is normal). The batchnorm layers will 'remap' those images
onto a normal distribution, and make a batch of 4 COVID patients look more
normal, and nice versa. For this reason, we try not to train networks with
batch sizes below 10, because when BatchNorm is involved, things tend to start
behaving badly. Unfortunately, BatchNorm doesn't give a damn about gradient
accumulation, because they are still different batches. Gradient accumulation
does not get around the biggest problem of small batch sizes, and in my
experience very rarely helps. Instead, you're better of using a different
normalisation layer.

~~~
jcreinhold
Regarding smaller batch sizes and batch normalization: Have you found another
normalization layer to work better for small batch sizes?

I agree that the mean and variance of the small batches won't be
representative of the true mean and variance, but in practice, I've used batch
norm for small batch sizes successfully (e.g., <8). In medical imaging, due to
memory constraints, I commonly see batch sizes of 2 (or even 1, although it's
not really "batch norm" at that point).

The paper "Revisiting small batch training for deep neural networks" [1]
discusses the benefits of small batch sizes even in the presence of batch norm
(see Fig. 13, 14). They only look at some standard CV datasets, so it isn't
conclusive by any means, but the experimental results jive with my experience
and what appears to be other researchers experience.

[1]
[https://arxiv.org/pdf/1804.07612.pdf](https://arxiv.org/pdf/1804.07612.pdf)

------
sungam
Interesting project and I am sure that AI will come to play an important role
in radiology in the future. However classifying CT scans for the diagnosis of
covid-19 is very straightforward for even trainee radiologists and performing
the scan, rather than interpretation is the rate limiting step so there is no
clinical need for this tool.

------
zone411
Speaking of COVID-19 and PyTorch, I just created a prediction model for new
cases and deaths based on all sorts of local data and demographics, mobility,
dining, shelter data, contact data, testing, weather, races, income, political
affiliation, etc. Some initial results for the cases in the upcoming week are
here [https://www.city-
data.com/predict/predict_2020-07-06.csv](https://www.city-
data.com/predict/predict_2020-07-06.csv)

------
abhisuri97
Good writeup, but I'm unsure about the clinical use case here. Isn't it pretty
rare for a doctor to first learn about a Covid diagnosis from a CT scan?
Rather, I think a tool that can isolate the location of GGOs and monitor their
progression across a few metrics (overall volume, density, etc) might be
slightly more useful for a clinician as they consider disease progression. Of
course, I guess that would mean that someone has to segment CT scans to mark
where the GGOs are.

------
aeyes
Interesting project from a research standpoint, for Covid however I don't see
the use case. Still it is interesting to see what can be done.

A family member does this for a living and they have been working on a lot of
Covid cases lately, also in emergency situations. Even if you rush it, you
have the patient on the bench for at least 10 minutes. Out of the 10 minutes,
it only takes takes 1 minute to review the result. In the case of Covid it is
something that is very, very easy to analyze visually - even with little
experience.

We can still improve the process a lot, there is a huuuge difference depending
on equipment. Image quality, image development, the time it takes to get the
image developed (yes, sometimes that is done in a different room) vary a lot
depending on platform. The software could be improved a lot but vendors don't
seem to be developing old platforms a lot - which is sad because equipment
doesn't get replaced every couple of years.

Here we don't do CT for Covid (would be even slower), just plain old X-ray.

~~~
caycep
I think it may be more of a "blue sky" research type thing. Not practical now
by all means - human eyeballs are better. But in the future, if the use case
becomes such that automated CT detection has a market...someone who invested
in this early would hold the patents, copyrights on the libraries, etc.

------
spellcard199
I have a question. I know nothing about ML and neural networks so I'm going to
excuse myself if the answer to this may be obvious.

The specificity of CT when interpreted by humans is reported to be <56% [1].
The specificity for this model seems to be around 80%, which looks too good to
me (I did 79/(79+19)=0.8, taking the numbers from the table at the bottom in
OP).

Is it that the non-Covid-19 scans in the training data were easier to
recognize than the ones doctors see every day or is the model so much better
than humans at recognizing non-Covid-19 cases? If it was the latter, the
reduced sensitivity (human: 96% [1], model: 85%) would not look bad at all,
right?

[1]
[https://pubs.rsna.org/doi/full/10.1148/radiol.2020201709](https://pubs.rsna.org/doi/full/10.1148/radiol.2020201709)

~~~
beojan
It could just be that doctors make a different tradeoff between false
negatives and false positives.

