
AI Device Converts Coughs into Flu and Pandemic Predictions - laurex
https://www.psychologytoday.com/ca/blog/the-future-brain/202003/ai-device-converts-coughs-flu-and-pandemic-predictions
======
tastroder
Paper:
[https://dl.acm.org/doi/abs/10.1145/3381014](https://dl.acm.org/doi/abs/10.1145/3381014)

From the abstract: "We conducted an IRB-approved 7 month-long study from
December 10, 2018 to July 12, 2019 where we deployed FluSense in four public
waiting areas within the hospital of a large university. During this period,
the FluSense platform collected and analyzed more than 350,000 waiting room
thermal images and 21 million non-speech audio samples from the hospital
waiting areas."

Huh, my ethics board would ask me if I'm okay if I proposed that. Field work
with people that are likely in a vulnerable position, to further some "AI"
proposal of unproven effectiveness, great.

Also from the abstract: "FluSense can accurately predict daily patient counts
with a Pearson correlation coefficient of 0.95." Odd measure, buzzwordy press
release article, let's look at the paper.

From their contributions (page 3): "Thermal imaging-based crowd density
estimates were found to have a strong correlation with the reference daily
waiting room patient counts from hospital records (with a Pearson correlation
coefficient of 0.95)" You can count people on thermal images? It's slightly
more interesting of an experiment than that but okay. Not really surprising
result. So that's the number from the abstract. Might make sense if daily
patient count is roughly normally distributed. The distribution of patients
during a pandemic is likely to be completely different than that of "regular"
times. Luckily, the article doesn't mention Corona / covid, that seems to be
added fluff for the psychology today version. The term "pandemic" only appears
in their references. Though it might have made for far more interesting
reading if they waited a month and investigated how this behaviour changed in
the onset of the pandemic.

"Our results suggest that multiple and complementary sets of FluSense signals
exhibited strong correlations with both ILI counts and laboratory-confirmed
influenza infections on the same campus." Multiple, okay (table 7 looks like
they tried every easily accessible method for ensembles in sklearn). So how
about the cough model / audio stuff the article boasts? "This real-world cough
model shows resilience to complex noises and diverse settings and achieves an
F1 score of 88%" Fair enough, sounds reasonable for a CNN pipeline based on 1
second samples. Their model seems to really have problems with door screeches.

Their privacy assumptions appear somewhat lacking. The thing is a raspberry pi
with physical access to basically everybody in the vicinity, making everything
else almost irrelevant. They say they use "two-stage encryption" to store non-
speech samples to disk, the raspberry pi they are using contains no modules
that would make this secure to physical attacks. Section 3.1 mentions that
they have a "high-fidelity" classifier to discard speech snippets in this
sensitive area, that appears to be 93% for speech detection with
"augmentation" (background noise etc.). In a waiting room I would not want a
7% failure rate here. They mention HIPAA on page 13 and say that their real
system, from the rather small sample of 2500 snippets only contained 79
unintelligible speech samples, not sure why that would be okay but hey, I can
put my own SD card in there anyway so who cares right?

Their cough model seems to work well in the lab setting (public datasets
suited for the purpose). Their ground truth for real world validation appears
to be 2500 one-second long samples they annotated manually (5.4), evaluation
is against a human annotator (classes: cough/ no cough). Table 4 seems to be
where the 88% f1 score come from, it says "after transfer learning". Page 11
says "We used approximately half of our labeled data for the transfer-
learning.", leaving 1250 test samples? This real world model is then evaluated
on 200 samples from days under different conditions (location, how crowded the
day was), achieving F1 scores of .8 (less crowded days), .88 (crowded days).
For covid,, my country's situational report says that only 53% of clinical
cases have cough symptoms in the first place. That seems to leave much room
for any real world use to "detect pandemic indicators".

The paper itself is not uninteresting but it seems somewhat... unclear on a
few of the assumptions made throughout, which imho puts some doubt on the
efficacy of the final output of their overall model in a real setting. The
link to covid this PT article feels to need to put in there would be highly
questionable but luckily the authors refrain from making that connection.

