
Could we use ML to detect Covid-19 infection? - danielquinn
One of machine learning&#x27;s best tricks is in classification.  We use it for everything from image search to automated driving.  This makes it a classic fit for infection detection as there&#x27;s only 2 options: infected or not.<p>What I&#x27;m thinking is that we could train a model against the hundreds of thousands of testing kits now in circulation with known results, wire that model into a web API, and then patch a device like this one:<p>https:&#x2F;&#x2F;www.owlstonemedical.com&#x2F;products&#x2F;reciva&#x2F;<p>into said API.  You send a breath&#x2F;spit sample to the server, and the server would respond with a yes&#x2F;no based on the model.<p>We could use this to test more people, more often, effectively for $0 24h&#x2F;day.  Doctors can be tested throughout their shifts, attendees for events could be tested before allowing entry, people could be easily vetted for exiting quarantine too.<p>I&#x27;m a web developer, and am confident that the web API part would be easy and anonymous, and from what I know about ML, training a model like this probably isn&#x27;t difficult... if you have the data.<p>What I don&#x27;t know much about is how easy it is to convert biology to data: getting testing kit information into a model, and converting breath&#x2F;spit into data.  I don&#x27;t know how easy this is, but if it&#x27;s doable, we may have something seriously helpful.<p>Can someone with domain knowledge in this area chime in with their point of view?
======
viraptor
The device you linked detects specific VOCs, not viruses or even proteins. I
don't believe this would be appropriate for infection detection. Or at least
not a _specific_ infection detection without a known baseline.

The cost also seems suspicious. Every patient needs their own mask - you can't
reuse those. Also, with the 4 cartridges per test, as recommended, that's
unlikely to be $0.

FTR: Current coronavirus testing uses either an RNA match from a saliva
sample, or a specific antibody reaction test - both are very specific to that
single virus.

~~~
danielquinn
I'm less concerned about which device would be used (this is just one that
came to mind that tries to translate breath into data), and more concerned
about whether it was possible to translate breath into data at all.

A breathalyzer might be just as good a comparison and is more reusable: if we
can detect & capture breath/spit from enough infected & uninfected, surely a
pattern would begin to present itself that could be used to diagnose, no?

Current testing is very specific, yes -- so would this system. But where
current tests look for something specific: RNA or specific antibodies, this
would simply look for "is this sample sufficiently like an infected case or an
uninfected one?" Presumably there's more to what's going on in an infected
person's lungs than just the existence of the antibodies. Could not ML help
find these patterns and identify them as markers of infection?

------
0x01101
Not really - if you can collect enough data to use as input to an ML
algorithm, you can most likely diagnose the virus directly
([https://en.wikipedia.org/wiki/Laboratory_diagnosis_of_viral_...](https://en.wikipedia.org/wiki/Laboratory_diagnosis_of_viral_infections)).
ML seems redundant in that case.

And if ML wasn't redundant, you have no training set containing the relevant
bio-markers and a binary variable indicating COVID-19 presence.

~~~
danielquinn
Hmm. I think I see what you're saying. Presumably if one of the markers you'd
want to capture is "presence of antibody for covid-19", and if the device
you're using can do that, you wouldn't need ML -- obviously.

But I was thinking a little more macroscopically. We don't have any device
that can see antibodies or anything as obvious in real time. Instead, we have
to collect samples and test them separately in controlled environments while
we look for specific markers we know to be signs of infection.

What ML does well though is see patterns in data we don't. Rather than find
specific markers, I was thinking we could feed a series of different values
into a model to see if a pattern emerged. We can't observe the virus directly,
but we can measure the percentage makeup of different chemicals in someone's
breath and see if there's a pattern in those we know to be infected vs those
we know aren't -- the training set would be the pile of tests already
conducted. Then we can apply that model to new data and test the results.

~~~
0x01101
> see if there's a pattern in those we know to be infected vs those we know
> aren't

You don't know who is infected and who isn't. That was my second point about
not having a training set.

~~~
danielquinn
But we do know. Thousands of manual, slow tests are being run every day. The
idea would be to collect those samples and their results for use in the model.

~~~
0x01101
> Thousands of manual, slow tests are being run every day.

Yeah - separate labs spread all over the world, using different protocols
collecting different data points on different populations and answering to
different companies and/or governments each of them with their own laws and
regulations. You can't just "get all the data" in this case.

