
Show HN: Log-Scale Covid-19 Plots - 6b6b64
https://bitbucket.org/kkd/covid19_plots
======
gmuslera
Why all graphs are cummulative instead of new cases for that day? It's harder
to notice how it is growing that way (i.e. more or less new cases than the
previous days) and harder to see if it is going exponential, lineal or
whatever. And, of course, forces to use log scales because the accumulated
number is already high.

At least for networking graphs is more meaningful to see difference from the
actual from the previous commulative total, than watching the total for a
network interface.

Another, at least for me, misleading use of graphs is to show the cummulative
total people that ever got sick, instead of the current amount of sick people
(taking out the recovered, and maybe the dead ones).

I know that how the measurements are done in most places is rigged, as not
everyone is checked, and there are a lot of people that are asymptomatic, but
that happens with both the new cases and the cummulative ones. Showing the
cummulative numbers don't take out that rigging.

~~~
em500
One reason is that daily increases are very volatile/noisy. The standard way
to handle this is to use n-day moving averages, but they have a time lag and
are harder to interpret.

Financial Times has the best graphical trackers I've seen. They used to show
cumulative counts, but switched to 7-day moving averages of daily increases"
recently.

[https://www.ft.com/coronavirus-latest](https://www.ft.com/coronavirus-latest)

I agree that displaying increases/growth rates are more informative overall,
but it does take more time to understand the metrics. (Just compare the
descriptions: "Total Deaths" vs "7-day moving average of daily increase in
Total Deaths")

~~~
computerphysics
Not really. Data for Spain can be perfectly used and fit without averages on
daily increases => [https://media-
exp1.licdn.com/dms/image/C5622AQH1JmaVMs3mqQ/f...](https://media-
exp1.licdn.com/dms/image/C5622AQH1JmaVMs3mqQ/feedshare-
shrink_2048_1536/0?e=1588809600&v=beta&t=oMud1mnDc9G9xn9wIa_nMQ83ZZzvVhS5y7ZMB9WMZWA)

~~~
em500
It's unclear what that graph is supposed to show. It can't be cumulative cases
(because there are days with decreases), but the scales are completely off if
they're supposed to show daily increases/diffs (Spain certainly did not see
daily increases anywhere near 100k in the past few days).

------
stared
Pet peeve - line charts with many lines, yet legend in another place. Even
worse, when the order of labels is not the same as the order of values. They
take more cognitive power to parse than needed.

Compare and contrast with labels next to the lines, vide
[https://www.ft.com/coronavirus-latest](https://www.ft.com/coronavirus-latest)
(this example is already quoted in some other thread, and I find it a gold
standard of coronavisualizations).

~~~
disgruntledphd2
To be fair to the FT, they have _much_ better graphs than most other
newspapers.

I especially like the small multiples by country with the gray lines for
different countries (UK, Italy, Spain last I checked).

It's such a wonderful chart idea that I'm already planning on stealing it.

~~~
stared
The Economist has a good baseline for charts.

The New York Times has some stunning data visualizations (from the UI/UX
perspective), see e.g. "You Draw It" series:
[https://www.nytimes.com/interactive/2017/01/15/us/politics/y...](https://www.nytimes.com/interactive/2017/01/15/us/politics/you-
draw-obama-legacy.html)

------
jetru
[https://aatishb.com/covidtrends/](https://aatishb.com/covidtrends/) is a
pretty cool related resource

~~~
ehsankia
Here's a quick video explaining those graphs:
[https://www.youtube.com/watch?v=54XLXg4fYsc](https://www.youtube.com/watch?v=54XLXg4fYsc)

Log-Log graphs of new cases vs confirmed cases (which that graphs) is by far
the best way to represent the data that I've seen.

~~~
arcticbull
Agreed that's much better, and addresses all the concerns in my sister post.
Thanks for sharing!

------
lend000
Nice use of matplotlib. I'd like to apply this to US states as well.

Although even between states the variation in test rate is so great that it's
hard to gather much from it.

I personally use hospitalizations as a more accurate metric of total
infections in the US. For example, Washington is being hailed as doing a great
job to slow the virus down, but ~80% of the confirmed tests result in
hospitalizations, because they still just don't test you otherwise [0].
Compare that with a more realistic hospitalization rate (many states with a
lot of cases are around 10% -- who would have guessed it would be Louisiana
and Florida doing the broad testing?)

[0]
[https://en.wikipedia.org/wiki/Timeline_of_the_2020_coronavir...](https://en.wikipedia.org/wiki/Timeline_of_the_2020_coronavirus_pandemic_in_the_United_States)

------
koonsolo
These graphs give a bit of an indication, but you cannot really trust them.

Since there is a shortage on tests, at some point countries might decide to
only test people coming into the hospital.

Another thing about the deaths is also troubling: In the Netherlands doctors
were complaining that deaths with symptoms of Corona were not counted as
corona deaths, because they were not tested and found positive (again a
problem with the shortage of tests).

So a bending of the curve might just be explained by a new strategy of who to
test.

I think the best way to count is to look at total hospitalizations, and
subtract the average of normal years. And with corona deaths the same way:
subtract the total with the average in a normal year.

~~~
adventured
It also needs per capita figures, which is dramatically more important than
absolute figures, unless everyone happens to know the population figures of
each country by memory.

You end up missing critical data points like the per 100k population mortality
rates (from Friday morning):

New York 12, Louisiana 6.6, New Jersey 6, Michigan 4.2, Washington 3.5,
Connecticut 3.1, Massachusetts 2.2, Colorado 1.7, Georgia 1.67, Nevada 1.27,
Illinois 1.23, Delaware 1.2, Pennsylvania 0.7, Ohio 0.7, Florida 0.68,
Kentucky 0.68, Alabama 0.65, South Carolina 0.6, Wisconsin 0.53, California
0.5, Oregon 0.5, Maine 0.5, Idaho 0.5, Virginia 0.48, Arizona 0.45, Kansas
0.44, New Hampshire 0.36, Iowa 0.34, New Mexico 0.33, Minnesota 0.32, Nebraska
0.32, Missouri 0.31, Texas 0.24, North Carolina 0.15, Hawaii 0.14

Italy 23, Spain 22, Belgium 8.8, France 8, the Netherlands 7.8, Switzerland
6.2, UK 4.5, Sweden 3, Denmark 2.1, Ireland 2, Portugal 2, Austria 1.8,
Germany 1.3, Norway 0.9, Canada 0.37, Finland 0.34, Australia 0.11, New
Zealand ~0

Most of the US is seeing very low per capita mortality rates and no surge in
cases. You wouldn't know that by the headlines though.

~~~
LaszloKv
In this video John Burn-Murdoch (the creator of the FT charts) discusses why
they decided against showing numbers per capita.
[https://mobile.twitter.com/janinegibson/status/1244519429825...](https://mobile.twitter.com/janinegibson/status/1244519429825802240)

There's also this tweet additionally showing how population size of a country
has no relationship to pace of disease spread.
[https://mobile.twitter.com/jburnmurdoch/status/1246185741304...](https://mobile.twitter.com/jburnmurdoch/status/1246185741304168449)

~~~
lopmotr
There's another surprising reason why per capita numbers aren't useful - for
exponential growth on the typical type of graph starting at some "initial"
number of cases, it makes no difference! For example, if one country was
counted a two equal half-sized countries, their graphs would be the same shape
but shifted to the right by a few days. However, they would also reach their
"initial" number of cases where the graphs start at a few days later -
shifting them left by the same amount! The result would be the same line as
the full-sized country.

------
sdfjkl
Well done, log scale, rebased to common starting point - the best way of
depicting international comparisons so far.

Zeit.de has an interactive version of these (in German), which adds a
rectangle for "days since numbers last doubled", a good indicator of how
severe the situation in a country is.

Scroll down past the map to the third graph with international data, click on
"Todesfälle" (deaths): [https://www.zeit.de/wissen/gesundheit/coronavirus-
echtzeit-k...](https://www.zeit.de/wissen/gesundheit/coronavirus-echtzeit-
karte-deutschland-landkreise-infektionen-ausbreitung)

~~~
dragonwriter
> Well done, log scale, rebased to common starting point - the best way of
> depicting international comparisons so far.

Yeah, its what the Financial Times and many others have been using for quite a
while, log graphs with a starting point of the Xth death or Yth case. Its a
standard way of presenting this kind of data.

~~~
FabHK
But the FT doesn’t display that graph anymore, only it’s first derivative. I
liked the original graph (maybe I am better at deriving than integrating...)

------
ryan_glass
It would be interesting to see graphs of deaths for other reasons to compare
numbers. For example deaths from starvation/malnutrition are likely increasing
in India due to lockdown
([https://www.theguardian.com/world/commentisfree/2020/mar/29/...](https://www.theguardian.com/world/commentisfree/2020/mar/29/india-
lockdown-tragedy-healthcare-coronavirus-starvation-mumbai)) and deaths due to
cancer may increase due to patients' treatment being missed.

------
6b6b64
This repo replicates log-scale plots seen in the Financial Times and other
news sources, here grouped by world region. Plots are updated nightly and
visible on the repo's homepage.

------
s1t5
If you look at the last few "confirmed" plots, all the lines are in the bottom
left corner but the x-axis still goes to above 70 (I guess for the sake of
consistency) - this means that most of the space on the screen isn't used and
it makes it very difficult to make out anything from the lines.

And when you need to recreate the same plot with diffirent subsets of the data
like you've done here, that's a great use case for an interactive dashboard
which allows the user to select the countries/regions and also zoom in and
out.

------
cafxx
Same, but interactive and more powerful:
[https://www.datacat.cc/covid/](https://www.datacat.cc/covid/)

~~~
bastijn
Alternatively a dashboard around covid-19 can be found at
[https://covidly.com](https://covidly.com). Including graphs.

------
app4soft
FTR, There is simple Covid-19 plot utility for Ukraine data, but could be used
for any other data set.[0]

[0]
[https://github.com/marianpetruk/covid19_Ukraine](https://github.com/marianpetruk/covid19_Ukraine)

------
alkonaut
Suggestion: use fixed style for the same label across multiple graphs. If a
country is blue+dashed in the cases plot it should be the same style in the
deaths plot. Use a style from the label hash or a global table or something.

------
dhimes
If you're still working on it, I'd love to see the second derivative from the
data. That's what I'm wondering about most these days: Are we at the
inflection point or not?

~~~
FabHK
The turning point is when the second derivative is zero, which would indeed be
easy to spot with a second derivative graph. But it’s also very easy to spot
with the first derivative graph (as published by the FT now): It’s when the
first derivative hits its maximum.

~~~
dhimes
Yes- but I'm talking about the inflection point- where the curve goes from
concave up to concave down. These are modeled as gaussians- so if that
modeling works we would be a standard deviation from the peak- assuming the
crisis is being well managed.

~~~
FabHK
By turning point I mean the inflection point. You turn from “driving left”
(convex) to “driving right” (concave).

By the way, the derivative of the logistic function is the logistic
distribution, and that’s not the Gaussian bell shape, it has much fatter
tails: Gaussian tails drop much faster, with exp(-x^2), while the logistic
drops with exp(-|x|). (Makes sense, as the logistic curve grows exponentially
at the beginning, and the derivative of the exponential is the exponential.)

~~~
dhimes
haha yes indeed. I had never heard the term "turning point" used like that,
but I guess it makes sense if you think about it as the turning point of the
first derivative.

But I learned what a logistic function is, so thank you. I guess that is
needed for the cumulative count.

You might be interested in this. I posted it a few days ago- it's why I'm
talking Gaussians.

[https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v...](https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1)

------
lower
I like this set of graphs with commentary:

[http://nrg.cs.ucl.ac.uk/mjh/covid19/](http://nrg.cs.ucl.ac.uk/mjh/covid19/)

------
hedora
Very cool. I’d love to see the same plots scaled by population.

~~~
jaakl
I tried, it does not work that way if you think. Or at least your “population”
spec should be not arbitrary current administrative region or country, but
specific virus spread area, which is very difficult to get data. You’ll get
weird absolute numbers for EU minicountries (San Marino and Luxembourg are
top) and very different figures per China, Hubei and other regions there. And
if you finally compare in graphs you’ll have same graph anyway.

~~~
noctilux
Agree. I was confused by this at first but if the slope on lines without
dividing by population is:

(log y2 - log y1) / (x2 - x1)

Then if we scale y2 and y1 both by c, which is 1 / population:

(log( c * y2 ) - log( c * y1 )) / (x2 - x1) =

(log c + log y2 - log c - log y1) / (x2 - x1) =

(log y2 - log y1) / (x2 - x1)

So scaling by population does not change the slope of the graphs, only the
intercept.

~~~
lopmotr
It doesn't even change the intercept because bigger countries reach their
threshold number of cases sooner so that shifts them back again. Hence most of
the lines are all roughly on top of each other regardless of country size.

------
colechristensen
Do log scale on both axes, suddenly all the plots will be nearly straight
lines and you can start to reason about uncontrolled growth and how well
mitigation is working.

~~~
FabHK
If you plot (exponentially growing) cases or deaths against time, you’ll get a
straight line on a log-linear graph (as here). If you make both axes
logarithmic, you get an exponential curve again (but squished).

Maybe you’re thinking of the double log curve of total cases vs new cases that
was on HN recently.

------
mercer
How does this compare to [https://studylib.net/coronavirus-
growth](https://studylib.net/coronavirus-growth) ?

------
leemailll
[https://coronavirus.1point3acres.com/en](https://coronavirus.1point3acres.com/en)

------
jek0
Thanks, those log-scale plots are way more interesting than linear one.

Small mistake: Israel isn't in Europe but in Eastern Mediterranean.

~~~
ferzul
the region titles are ... unconventional. greece, clearly in the eastern
mediterraneon, is not listed, while iran on the indian ocean is. this group is
normally called mena, roughly.

likewise, the category called southeast asia contains countries normally
considered to be south asian - like india - and omits the larger part of
southeast asia.

the creation of “western pacific”, grouping australia-new zealand, east asia,
and the larger part of southeast asia, deserves credit surely since it is far
more useful than the neocolonialist category of oceania, which mostly serves
to allow europeans to swamp pacific islanders with statistics from australia

~~~
FabHK
From the Readme:

> The countries have been grouped according to regions defined by the World
> Health Organization.

And, yeah, they’re a bit weird.

[https://en.wikipedia.org/wiki/WHO_regions](https://en.wikipedia.org/wiki/WHO_regions)

------
m0zg
All of these need to be normalized per capita. Otherwise you don't see the
true extent of the problem. Also from looking at the stats recently, here's
what I find more useful than the raw number of "cases":

\- Number of deaths per capita

\- Number of "severe" cases per capita (good indicator of the future number of
deaths)

\- Number of tests per capita (good indicator for whether or not "number of
cases" means anything at all)

~~~
dragonwriter
> All of these need to be normalized per capita. Otherwise you don't see the
> true extent of the problem.

Normalizing per capita replaces the true _extent_ of the problem with the true
_relative local impact_ of the problem; both are significant.

~~~
m0zg
But relative local impact _is_ the extent. If you live in a village of 100
people and 10 die that's pretty bad. If you live in NYC and 10 die - that's
statistical noise that nobody will even notice.

~~~
dragonwriter
> But relative local impact _is_ the extent.

No, absolute scale is the extent, that's pretty much what “extent” means.

Relative local impact is the...well, relative local impact.

Both are important, though which is _more_ important depends on what you are
doing with the measure.

------
chvid
Why not relative to population size? And China and USA should split into
regions/states.

------
easytiger
These numbers are effectively meaningless.

Definition of deaths varies widely, testing policy varies widely, no
statistical extrapolation is being done in an attempt to avoid undermining
public health policies that might be based on total fantasy.

~~~
endorphone
In outbreak areas, total morbidity increases by multiples to magnitudes during
the peak (despite enormous, historic measures of containment). Areas with
processions of army trucks full of bodies, or the defense department bringing
in refrigeration trucks.

"But what about the co-morbidity?". People don't die "of" COVID-19. They have
heart failure, kidney failure, or other triggers of death because COVID-19
pushes their body to the limit. Pointing to resources that claim that "only"
some small percentage actually died of COVID-19 is pure ignorance, because
then the declaring doctor was simply being efficient because if they truly
looked in there would be another triggered cause of death. Just as no one died
of "AIDS", they died of things like Kaposi sarcoma, but if someone said "see,
it wasn't AIDS at all" they would be laughed out of the room.

The majority of deaths are people who are health compromised in some other way
(not _all_ deaths, and there have been an abundant number of completely
healthy people who have perished), but that is known by everyone and is not
news, nor does it diminish the tragedy.

"no statistical extrapolation"

This is the most interesting, and ridiculous, claim of all. Enormous
statistical measures and extrapolations are being done daily...that's how we
are where we are. What is this meaningless claim even trying to say, other
than that you, easytiger, know more than every health authority.

~~~
easytiger
Way to have an argument with things I didn't assert.

Categorically, in many countries (UK, Italy), the figures released represent
people who tested positive for covid 19 before or after death. Those figures,
without any question, do not offer an opinion on how, if at all, it had any
effect on the death. Further to that in the UK official death (released
monthly) stats only count mentions of covid19 on a report. It doesn't indicate
anything

In some countries, the only demographic where you are guaranteed to be tested
is if you die, and in a hospital.

Are those facts with which you have an issue?

