
Facebook/CMU Covid-19 U.S. county-by-county symptom map - bookofjoe
https://covid-survey.dataforgood.fb.com/
======
MiguelVieira
These kinds of maps always end up far less enlightening than I'd hoped. Maybe
because of noise from counties with low populations? Like, "Oh, there's
elevated levels in Finney County, Kansas. Hmm."

Anyway I find this map a lot more useful, which lets you filter by population,
show counts per capita, and limit counts to the last 7 days:

[https://rchern.github.io/covid-19/](https://rchern.github.io/covid-19/)

~~~
therealcamino
Well, to be fair they are measuring something different, precisely because the
confirmed positives is skewed by the scarcity of testing. Since that doesn't
seem to be getting fixed, other approaches are needed. So discrepancies versus
number of confirmed positives is one of the things you might look for in this
data as a sign that more investigation is needed. It's not intended to be an
accurate accounting of covid-19 cases: "The estimates can be helpful for
policymakers and health researchers to forecast potential COVID-19 outbreaks.
These estimates don’t represent confirmed COVID-19 cases and shouldn’t be used
for diagnostic or treatment purposes, or guidance on personal or business
travel."

------
bredren
I'm doing some map stuff right now on a project and its cool FB is using OSM
for their data. Here's a recent article on their public embrace of OSM:
[https://www.engadget.com/2019-07-23-facebook-opens-up-its-
ai...](https://www.engadget.com/2019-07-23-facebook-opens-up-its-ai-tool-to-
openstreetmap-users.html)

------
onychomys
They in part collected this data from a little survey request modal at the top
of your facebook timeline main page thingy, so just be aware that it's not at
all representational of the population as a whole. Old people and young people
and people who read Hacker News are less likely to have answered than would be
expected.

~~~
dpeck
depends on how you define "old". My personal observation-driven wild guess
would be that the 40-65 year old demographic would be overrepresented in
anything presented by Facebook. They are by far the most active there.

------
thadk
I dug through the source code and extracted the JSON files and converted them
to a tabular/CSV format for use outside.
[https://observablehq.com/@thadk/facebook-covid-survey-
data-f...](https://observablehq.com/@thadk/facebook-covid-survey-data-from-
map)

If you're one of the creators, please release the data files for this kind of
thing. Google started releasing only PDFs of mobility, its data was scraped
through much effort, and only later did they begin releasing the official
CSVs.

------
DeonPenny
If you read that recent random sampling from Stanford the numbers are probably
50x-85x higher

~~~
zaidf
Stanford should completely retract that study. It has produced far more
misleading interpretations than anything useful.

~~~
ravenstine
Why? Even if it's not totally random(they used Facebook ads), surely the data
still tells us something?

~~~
losvedir
Not necessarily. Their population survey had 1.5% positive results. The false
positive rate of the test is not known for certain but I believe the 95%
confidence interval goes up to 1.7%. If that were the case, _all_ the positive
tests could even be false positives.

We did learn from the study that the population rate isn't outrageously high,
like 30%, but I'm not sure what more we can confidently say beyond that.

~~~
walterbell
What are typical false positive rates for other tests/studies?

~~~
DeonPenny
China's false postives for Covid tests are 30%, I chest CT has a false
positive of 6% [https://www.advancedsciencenews.com/a-more-sensitive-test-
fo...](https://www.advancedsciencenews.com/a-more-sensitive-test-for-
covid-19/).

So these test a more accurate than either of the more accurate test to
diagnose Covid-19 from what I've found

------
jamilbk
For this data to be useful, I think we first need to establish that _people
who use Facebook and participate in its surveys_ represent a relatively random
sample of the general population. Or adjust for all the ways in which it does
not.

Otherwise, it's likely biased.

~~~
jaboutboul
They address this:

“ To help CMU measure results, Facebook shares a single statistic known as a
weight value that doesn’t identify a person but helps correct for any sample
bias, adjusting for who responds to survey invitations. Making adjustments
using weights ensures that the sample more accurately reflects the
characteristics of the population represented in the data. The weight for a
survey respondent in the sample can be thought of as the number of people in
the population that they represent based on their age, gender and state of
residence.”

------
usaar333
I wish they had some confidence intervals on here. The variance between Bay
Area counties (or over time) is quite high - it's rather dubious that Alameda
County was at 1.8% while SF was at 0% a week ago.

Indeed, numbers > 1% for symptomatic patients in the Bay Area a week ago are
implausible (likely over a magnitude too high) given confirmed case counts
since then.

~~~
SpicyLemonZest
Every attempt I'm aware of to do a general population survey has indicated
that confirmed case counts really are at least an order of magnitude lower
than the real spread. I don't think it's a possibility we can reject out of
hand - frankly I'm starting to swing to the position that it's been proven
true.

~~~
usaar333
Most general population surveys have suffered from heavy self-selection bias
or insufficient specitivity. In the early days, yes, the Bay Area was probably
missing 90% of cases.

Testing has ramped up though (it's pretty easy to get a test if your
symptomatic), so I don't believe we're missing 90% of symptomatic cases [as
defined by this survey] at this point (which is part of the reason that new
cases/day have remained stable while hospitalization numbers are dropping).
Think about it: if people take a survey that says they have covid symptoms,
wouldn't they get tested?

~~~
SpicyLemonZest
I'm just not sure you're applying consistent standards here. Going to a
coronavirus test center on medical advice suffers from much stronger self-
selection bias and generally unreported specificity.

------
dmos62
The time dimension is the most interesting aspect for me in these charts.

Anyone aware of good world-wide corona virus datasets? I told myself I'd start
looking 1-2 months from now, because they'll probably be easier to find and
higher quality by then, but these posts are wetting my appetite.

~~~
mmastrac
This one is kind of nice, but only available at the country level:
[https://github.com/datasets/covid-19/blob/master/data/countr...](https://github.com/datasets/covid-19/blob/master/data/countries-
aggregated.csv)

------
orsenthil
Why did they have to include "Flu" in the dropdown? I am afraid if that will
loose a little focus of this tool and the seriousness of the pandemic.

~~~
austinwm
"The estimates can be helpful for policymakers and health researchers to
forecast potential COVID-19 outbreaks. These estimates don’t represent
confirmed COVID-19 cases and shouldn’t be used for diagnostic or treatment
purposes, or guidance on personal or business travel. Facebook’s research
partners are committed to only using survey results to study and help contain
COVID-19."

The dataset is explicitly not intended for general public consumption, and
having access to flu and COVID symptom overlap is important for epidemiology
research and disease surveillance.

------
etimberg
Why would anyone trust facebook with this data?

~~~
pwython
"Facebook uses aggregated public data from a survey conducted by Carnegie
Mellon University Delphi Research Center. Facebook doesn’t receive, collect or
store individual survey responses."

But, what are you afraid Facebook would do with these survey results?

~~~
Nextgrid
Target snake-oil covid-19 "cures" to people who responded as having symptoms
or being particularly at risk?

I've seen various scams (fake "hacking" services, weight loss products, etc)
both organic (liked by tons of very obvious compromised/fake accounts) & paid,
and when reported they said it doesn't violate our terms of use, so I wouldn't
be surprised if this scummy company does the same thing again.

~~~
pwython
Facebook isn't the wild west anymore. Last month they banned ads for anything
remotely COVID-19 related, including hand sanitizer, disinfecting wipes,
medical masks, etc.

"...we are now prohibiting ads for products that refer to the coronavirus in
ways intended to create a panic or imply that their products guarantee a cure
or prevent people from contracting it."

[https://about.fb.com/news/2020/04/coronavirus/#exploitative-...](https://about.fb.com/news/2020/04/coronavirus/#exploitative-
tactics)

~~~
Nextgrid
Presumably hacking services (whether real or scams) were also banned and yet I
saw them and even reported them and nothing has been done.

~~~
pwython
I don't doubt scammers are able to temporarily sneak their ads past the
approval team. But Facebook certainly doesn't make it easy these days, nor
allow morally questionable targeting. They're constantly shutting down
controversial categories (crypto, cbd, etc), it's always a hot topic in
Facebook ad buyer groups.

