Johns Hopkins Covid-19 Github Data Repo

braindongle · on March 3, 2020

If you want to do your own analytics on these data, here's one approach, which I'm using. Start your project as a vanilla Git repo, then add the JH repo as a submodule.

  git submodule add https://github.com/CSSEGISandData/COVID-19

Then you can do pulls to have the latest data.

  git submodule update --remote COVID-19

Have cron run that daily and your data are always fresh.

saber6 · on March 3, 2020

Is there any real downsides to pulling in external repos as submodules?

zro · on March 3, 2020

It can be a bit of a gotcha when trying to share your work with others. They'll have to clone with --recursive or the submodules will come up as empty folders for them.

tlrobinson · on March 3, 2020

Another interesting repo I came across:

https://github.com/midas-network/COVID-19/tree/master/parame...

fierarul · on March 3, 2020

I just used this repo yesterday to create a chart of the ongoing cases on http://covid.410go.net

It shows that China numbers have been going down for the past 2 weeks. This is not highlighting in the news...

crtlaltdel · on March 3, 2020

i've seen more than one article that at the least references this, such as https://www.latimes.com/world-nation/story/2020-03-03/virus-...

fierarul · on March 3, 2020

Interesting. I never read news that China cases are actually decreasing and all the charts show the total cases which is flatlining (in China) but not going down.

The rest of the world seems to go exponential, otoh.

Out_of_Characte · on March 3, 2020

Its, hopefully, not mentioned in the news because the CCP is fudging the numbers. There's no way of knowing the exact situation on the ground in wuhan but it's nowhere near the reported cases.

tunesmith · on March 3, 2020

Why do people keep saying this? I've seen interviews with disease specialists that are plugged in to the community and say that the China numbers are not only as accurate as they can be[1], but that it would be really hard to fudge them given the transparency they've already engaged in - my sense from that was that the numbers can be cross-checked in various ways, and that it would be mathematically obvious if there was major fraud going on.

[1]This is a different issue than numbers being limited due to testing shortages, or people with minor symptoms that recover at home, or asymptomatic people not being included in the count - problems that a lot of countries are sharing. We could also argue that US has been "fudging" the numbers because of how they've tested far fewer people than Canada.

Karunamon · on March 3, 2020

Those numbers for China are likely not trustworthy, keeping in mind they come from the government there. They're still keeping WHO inspectors out.

Out_of_Characte · on March 3, 2020

Not WHO inspectors, WHO has lots of chinese allegiance, The CDC is the organisation who is blocked from entering.

https://www.foxnews.com/health/us-health-officials-on-corona...

https://www.washingtontimes.com/news/2020/jan/30/china-asks-...

https://www.nationalreview.com/2017/06/world-health-organiza...

Google also seems to heavely favour WHO's website compared to alternative sources of information. Probaly all in order to combat "fake news" some dystopian world we live in.

shi314 · on March 3, 2020

Then why is WHO chairman praising China?

remarkEon · on March 3, 2020

Same reason Steve Kerr did.

jdoliner · on March 3, 2020

Not sure if this has been on HN already but Johns Hopkins also has a covid dashboard up, presumably using the data present in the repo: https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594...

mkchoi212 · on March 3, 2020

This is pretty great! Being able to see the progress of the virus over time could be interesting. Could maybe reveal some patterns about human movement + interaction??

athrowaway3z · on March 3, 2020

The different protocols for testing patients around the world are producing to much noise to make general claims.

b1gtuna · on March 3, 2020

Is it possible to view the number of tests ran? I keep seeing in the news that the current numbers for US are probably very off as very little tests are performed.

TallGuyShort · on March 3, 2020

Interestingly, the CDC website no longer has these numbers. A week ago, their main COVID-19 landing page had a table of positive tests, negative tests, and pending tests (total tests was in the neighborhood of 450). That seems to have disappeared, with a subset of those statistics here: https://www.cdc.gov/coronavirus/2019-ncov/cases-in-us.html.

Frost1x · on March 3, 2020

I've heard this argument but it skews in both directions, I imagine. You'll have mild cases that recover where no test is ran that skews the data to the "more harmful than being presented" but I suspect there are also cases, especially for those with other preexisting conditions, where deaths also didn't run a test. Those cases would skew the data in the opposite, "less harmful than being presented" side. You essentially have a huge mess of amalgamated/non-systematically collected data. Even in controlled settings you get a lot of censored data to deal with.

RT-PCR tests are reasonable these days in cost, but it isn't cheap by any means and I suspect many early cases weren't checking or didn't have the capability to check. The question is: which way is the data skewed more towards?

I know absolutely nothing about the chinese healthcare system. Even if you do, there's a significant amount of guesswork. Only now as it spreads are we starting to get more controlled tests that give us more accurate data.

For now, I have a tendency to trust the raw data until I see definitive evidence of skewness otherwise.