
Log-log plot of new vs. total Covid-19 cases by country - IndrekR
https://aatishb.com/covidtrends/
======
dredmorbius
Log-log of daily growth vs total cases is a neat graphics hack, one I'd not
seen before, let alone thought of.

FT have been headlining a semi-log plot of _deaths_ per country, normalised to
days after the first ten deaths were reported.

[https://www.ft.com/coronavirus-latest](https://www.ft.com/coronavirus-latest)

Because dead bodies tend to behave characteristically, are inconvenient both
directly and via surviving relations, and present a smaller testing target
(about 1% of total cases) as well as representing a full course-of-illness
endpoint, these data should be generally more reliable and cross-regionally
consistent than confirmed cases. Deaths are, however, lagged by about two
weeks.

FT also provide numerous other graphical representations, including an
excellent small-multiples (Tufte fans) matrix of multiple countries' case
trajectories.

My view is that all serious reporting should lead with similar visualisations.

Wikipedia's COVID-19 pages have similarly featured semi-log plots from early
on, as does Worldometers.

[https://en.wikipedia.org/wiki/2019–20_coronavirus_pandemic#D...](https://en.wikipedia.org/wiki/2019–20_coronavirus_pandemic#Diagrams)

[https://www.worldometers.info/coronavirus/](https://www.worldometers.info/coronavirus/)

(Numerous additional pages with regional and specific behavioural
characteristics within both sites.)

Anoter data visualiser, allowing arbitrary multi-country comparisons:

[https://rys.io/covid/](https://rys.io/covid/)

~~~
weinzierl
> Because dead bodies tend to behave characteristically, are inconvenient both
> directly and via surviving relations, and present a smaller testing target
> (about 1% of total cases) as well as representing a full course-of-illness
> endpoint, these data should be generally more reliable and cross-regionally
> consistent than confirmed cases. Deaths are, however, lagged by about two
> weeks.

Completely agree. Just to add another source of differences between countries
for this metric: Some countries test post-mortem (Italy) others don't
(Germany).

~~~
KarlKemp
I’m not going to believe that without a good source. Testing in Germany is at
5000,000/week now, so anyone admitted to hospital would definitely be tested.
A friend of mine here in Berlin had symptoms. They called and a team came to
their place the next morning. One day later, they were called and got the test
results (negative).

The explanation for the relatively low fatality rate I’ve heard is simply
widespread testing catching many mild cases and relatively young patients
because a lot of initial cases at least were linked to ski holidays and
carnival events. Personally, I would add the hypothesis that Germans have far
fewer interactions with family members across generations than especially
Italy, but also the US. No one I know lived with their parents after finishing
school. In the US, when Harvard shut down for the semester, the undergrads all
went home to family. Here, students tend to live in regular apartments. And
even those in student housing live there year-round, instead of vacating them
during breaks and heading home.

I believe Austria and some other smaller European countries are also seeing at
least similar CFRs, making Germany somewhat less exceptional.

~~~
lmeyerov
We had a suspicious surprise death in the family 2w ago that the German
hospital did not test (early outbreak + sudden death) nor autopsy and we
suspect real chance COVID was a secondary factor that merited at least a
check. Internal medicine md phd in our family, so no joke. Ok they didn't
test, but we were surprised they did not autospy. Sad way to confirm SOP.

For others: In the US, you can ~always request an autopsy.

------
fxj
This kind of analysis is also called a phase space plot: the function is
plotted against its derivative. And when the function is an exponential growth
then the derivative is the same which gives a similar plot for all the
different countries. When the function deviates from the exponential like in
China you can spot the difference very early in these kind of plots.

[https://en.wikipedia.org/wiki/Phase_space](https://en.wikipedia.org/wiki/Phase_space)

or for an example:

[https://en.wikipedia.org/wiki/Duffing_equation](https://en.wikipedia.org/wiki/Duffing_equation)

or in math notation: the governing equation for exponential growth is :

y' = ay

which is a linear function where the slope is the growth rate. The plot shows
y' vs y. This is the straight line in the plot. Any deviations from
exponential growth can be easily spotted now.

~~~
jdc
[https://en.wikipedia.org/wiki/Logistic_function](https://en.wikipedia.org/wiki/Logistic_function)

~~~
fxj
For the logistic function the relation is

y' = y - y^2

This is why the linear relation later on bends back to lower values until y'
becomes 0 for the end of the infection.

------
cptroot
There's a nice video about this on the minutephysics channel:
[https://www.youtube.com/watch?v=54XLXg4fYsc](https://www.youtube.com/watch?v=54XLXg4fYsc)

~~~
drivers99
Yes! That’s where I saw it, so I started making my own graphs like that for
the US states that I and my relatives live in. In excel you have to use a
scatter plot and then set both axes to logarithmic. But it doesn’t let you use
more than one data series on the same graph. (At least, not in the Mac version
which tends to be inferior.) And it’s not animated so I was thinking of
throwing something together in Python.

------
jacquesm
Title should read 'confirmed Covid-19 cases by country', that makes a very
large difference. Those figures are not to be trusted to begin with so any
kind of processing you apply to them does not result in graphs that output a
picture that you can then draw conclusions from.

Each country has their own standards in what is a confirmed case and what
isn't and some countries actively discourage accurate reporting.

~~~
9nGQluzmnq3M
It makes a difference in the absolute numbers, but it doesn't really matter
for the trends, since each country's testing policy is relatively consistent
with itself. In other words, the graph shapes & trends are stil comparable,
whether a country's testing captures 1%, 10% or 100% of its cases.

~~~
jacquesm
In some countries it will even matter for the trends. There are some countries
that actively cook the numbers to make their politicians look good, and where
there are very sudden kinks in the graphs you can be sure that the whole story
hasn't been told. As or the testing capacity, that's a big factor too and it
is non-linear in many places: testing criteria are changed based on how much
stock there is of test materials and how much capacity on the machinery.

The closer to capacity, the stricter the criteria.

------
sorenjan
The number of confirmed cases is not really a great value to track, because
countries test differently and change tactic after a while.

Look at this graph for instance, number of tests per million people vs number
of confirmed cases per million people. They're highly correlated, which means
the more you test the more confirmed cases you'll have. In some countries it's
the opposite, the more cases you'll have seeking medical assistance, the more
tests you do.

[https://ourworldindata.org/grapher/tests-vs-confirmed-
cases-...](https://ourworldindata.org/grapher/tests-vs-confirmed-cases-
covid-19-per-million)

~~~
bravura
You missed the point, that this sampling bias applies equally to the x axis
and the y axis in this plot. So this plot allows you to fairly compare growth
across different countries, regardless of their testing regime.

~~~
_nalply
No, there are two different biases at work and they might not cancel out each
other.

------
friendlyghost
Does anyone know of a source for _hospitalisation numbers_ by country? I know
some states in the USA provide this
([https://covidtracking.com/data/](https://covidtracking.com/data/)) and some
countries in Europe do the same, but I can't find a site that collects all
this data.

It seems to me that if you want to eliminate the effect of the totally
different testing strategies (which moreover vary substantially over time),
then hospitalisation numbers are far more indicative of the spread than
positive test results. At least in the sense that you can compare them on
different days.

~~~
mzs
I think "influenza-like illness (ILI) and severe acute respiratory infections
(SARI)"* numbers would be the best, it would remove testing differences and
could be compared to expected numbers from prior years.

In the US states send the data to the National Syndromic Surveillance Program
(NSSP) but I can't find a public source for the numbers that IL sends. Here
are some plots that I make for IL (a state that does not report
hospitalizations yet but will likely do so starting some point this week):

[https://msliczniak.github.io/COVID19IL/plots/index.html](https://msliczniak.github.io/COVID19IL/plots/index.html)

* [https://www.who.int/influenza/surveillance_monitoring/ili_sa...](https://www.who.int/influenza/surveillance_monitoring/ili_sari_surveillance_case_definition/en/)

------
DrNuke
The most transparent number is a direct comparison between deaths last year
and deaths this year, in the same timespan (eg. Jan 2019 vs Jan 2020, and so
on). But there is a lot of gamesmanship at play among nations, both for
internal security and external geopolitics reasons, so these numbers are too
much too often false after convenient miscalcultation or plain manipulation.

~~~
s1artibartfast
I'm not sure if we will be able to separate the confounding factors such as
economic impact at the population level. Do you have an idea how this can be
done? I have seen a few studies using this method to look at excess mortality
during the great recession [1],[2], but there wasn't a pandemic at the same
time.

[https://www.thelancet.com/journals/lancet/article/PIIS0140-6...](https://www.thelancet.com/journals/lancet/article/PIIS0140-6736\(16\)00577-8/fulltext)

[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3070776/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3070776/)

~~~
paganel
> but there wasn't a pandemic at the same time.

There certainly was a pandemic, the 2009 H1N1 one. [1] In the last few months
I've started to suspect that that was what killed my grandma in July 2010, she
died very suddenly because of some respiratory issues (I live in Eastern
Europe).

[1]
[https://en.wikipedia.org/wiki/2009_flu_pandemic](https://en.wikipedia.org/wiki/2009_flu_pandemic)

~~~
s1artibartfast
That is an interesting connection I will have to keep in mind. I don't think
that would be an issue for the first link at least, which only looked at
additional cancer deaths in the US and OECD due to the recession.

------
Someone
I’m not sure what to make of this. “Lots of distributions give you straight-
ish lines on a log-log plot”
([http://bactra.org/weblog/491.html](http://bactra.org/weblog/491.html)) so it
isn’t surprising that the slopes of the lines are somewhat constant over time.

Because taking the logarithm is such an equalizing operator, I also doubt
whether it is surprising that lines seem to overlap for each country. Zooming
in, there still is a difference of about 20% in new cases/total reported cases
between countries, even in the range of 5k-10k total confirmed cases. Taken
over the course of multiple days, that can make quite a difference.

~~~
quietbritishjim
The graph doesn't make much sense without a bit of explanation. (It certainly
didn't to me, anyway.) The minutephysics video linked to from cptroot's
comment, and also here [1] for your convenience, does a great job of that.

In short, you're right it's not surprising that the lines are log-log linear
for uncontrolled growth of the virus, and that it's similar for lot of
countries. What's interesting is the few (so far) cases where it drops below
that log-log linear line, which indicates a containment strategy that's
starting to work.

[1]
[https://www.youtube.com/watch?v=54XLXg4fYsc](https://www.youtube.com/watch?v=54XLXg4fYsc)

~~~
heavenlyblue
But that containment strategy isn't working for Italy at all.

~~~
fxj
It takes at least 2 weeks until the number of deaths starts to decrease. Look
at the graph in the weeks to come and you will notice the difference.

~~~
heavenlyblue
They’ve started containment 9th March, how is that now two weeks ago?

------
pnathan
I strongly distrust any figures from pretty much any country without a
complete and transparent testing regimen - (Hi South Korea, you know what
you're doing!). The wide variance of testing protocols, even within countries
- is going to kill our ability to really do much with these numbers.

I'd be far more interested in _death_ rates. I.e., what was the normal death
rate, and what is it now? It's not sexy, it needs to be seasonally adjusted,
and it's subject to noise, but it's a much better heuristic than "covid case",
because the base number isn't as gameable.

------
hn23
But are all these graphs comparable? Testing and reporting is not comparable
in these countries.

~~~
dredmorbius
A takeaway is that the underlying epidemiological behaviour is consistent
_despite_ significant regional differences in management and monitoring.

The accompanying video makes this point explicitly. Do watch it if you've not.

[https://invidio.us/watch?v=54XLXg4fYsc](https://invidio.us/watch?v=54XLXg4fYsc)

I'd also suggest _extreme_ attention be paid to locations with improbably low
case and death reports, or where severity mix and/or mortality are strongly
out of line with expectations.

~~~
paul_f
The video shows a plot of "new cases", which might be measuring the rollout of
testing, not the actual increase in number of people with the virus. It is
also not showing data per-capita. In other words, this is not terribly useful
IMO

~~~
hwillis
> The video shows a plot of "new cases", which might be measuring the rollout
> of testing

Maybe, but probably not given how consistent the trend is over time, response,
and as the disease progresses.

> It is also not showing data per-capita.

Per-capita would be much LESS representative. The virus spreads locally, at a
scale far smaller than country borders. The point is to show the progression
of an outbreak. If you show per-capita, you would be showing the number of
outbreaks per country and minimizing the growth of any individual outbreak.
For instance China, with 4x the population of the US, would be moved much
farther down the graph than the US. That would only make the data look
fuzzier, and convey zero useful information.

In the end it would probably be a very minimal difference given the
logarithmic scale.

------
nojvek
Even within US (Washington - my state) has way lower numbers than New York
even though it was hit first. The first reported case and death weren’t far
from where I live.

So either Washington has flattened the curve or we’re doing a lot fewer tests
than New York. I know a couple of friends who have covid-19 symptoms but
haven’t been tested since there aren’t enough kits and they are quarantining
themselves at home.

So my guess is Washington cases number are at-least 2X higher than what’s
reported.

Overall at this rate US will hit a million reported cases in a couple of
weeks. It seems we are the country doing a great job at testing and reporting
but the virus is spreading like wild fires in metros.

~~~
acid__
Well, consider the population difference too — NY State has 20M residents to
Washington’s 7.5M, which is less than the population of NYC alone. Be wary of
comparing raw counts when the underlying population sizes are so different!

~~~
silverdrake11
I've been working on this which is confirmed cases per state population
[https://us-covid19-per-capita.net](https://us-covid19-per-capita.net)

It might make more sense if it tracked deaths per state population instead of
confirmed cases b/c of the different testing rates.

------
coding123
One plot I was hoping for by now is a back-in-time plot. Something that
assumes for each case we find positive today, we assume that person has been
polluting the world with covid-19 viruses in an effort to track events and
spread based on todays data with the notion that they really caught it 2 weeks
ago.

The idea is that even if we started social distancing about 7 days ago, we
won't see any benefits to that for another 7 days (since the average symptom
time is about 2 weeks). And so any spikes you see in these graphs is all of
the people that got infected 2 weeks ago, and aren't really sick until right
now.

------
endogui
Wouldn't it make more sense to compare new cases vs active cases? At the end
of this plot you can see China's new cases increasing again, but the number of
total cases is so large that the x axis doesn't move.

~~~
andreareina
Hard to use active cases since it's not monotonic. If you switch to linear the
resurgence becomes more visible.

------
mensetmanusman
If only it was true.

All of these articles need to say ‘tested’ and ‘published’ cases. If you
aren’t testing randomly, or if you aren’t publishing (China), then the data is
really showing the rate of testing of sick people.

------
amai
When comparing countries only relative numbers (cases per capita) should bei
used. Everything else is misleading.

~~~
mzs
lg(cases/pop) = lg(case) - lg(pop) = lg(case) + C

~~~
amai
You are right if you are just interested in the slope. But most people don't
just look at the slope, they look at the numbers and what to know, which
country is doing better or worse. And that is when the constant (which is
different for each country) matters.

~~~
mzs
lg(1000 * 1000 * 1000) - lg(1000) = 6

------
jve
Can zoom on mobile to tick more countries, but can't unzoom if my viewpoint is
on the plot:(

------
IndrekR
There is also explanatory video about that graph on minutephysics Youtube
channel: [https://youtu.be/54XLXg4fYsc](https://youtu.be/54XLXg4fYsc)

------
Yajirobe
Wait, but if you plot the total number of cases with respect to time on a log-
normal scale, you're still going to get a straight line, right? So why is
plotting against time a bad idea?

~~~
fxj
The plot is function against derivative of function. This is also called a
phase space plot which is quite often used in the dynamic systems theory.

[https://en.wikipedia.org/wiki/Phase_space](https://en.wikipedia.org/wiki/Phase_space)

~~~
Yajirobe
That doesn't answer my question.

------
matt_the_bass
One thing the I don’t see discussed is imho analysis should not rely on a
single plot. Multiple different plots describe different thing.

Let’s use “all the plots”

------
theemathas
What happened to Qatar? This one country's plot looks weird after clicking
"select all".

------
outside1234
Its crazy how correlated the outbreaks are sort of regardless of policy
differences outside of China.

So correlated that it also makes you wonder if Japan hasn't been sweeping a
lot of cases under the rug trying to salvage the Olympics. It will be
interesting from here to see what happens with their cases and if they
magically "rejoin the line."

------
jimhefferon
Is there a good reason, that I am not seeing, why the total population is not
relevant?

~~~
PeterisP
As far as we know, no country is approaching saturation yet - e.g. if 10% of
population has been infected, then that would slow the infection speed just by
10% which would not be noticeable in these scales. The total population will
become obviously relevant if, say, 30% or 50% of the country is infected, but
that's kind of the worst case scenario that we'd like to avoid.

------
polote
Tell me why I'm wrong, usually the more a chart is 'perfect' the more the data
is 'averaged' and so the less you can actually see what's going on

Also this chart is about number of cases, which is not a good way to compare
evolution between countries as all countries have different test method

~~~
ferzul
it's the ratio of new cases to total, which accounts for the comparability.
most countries remain comparable with themselves, and those that do change
method, since the growth is exponential, it tends to drown out any
incomparabilities quickly enough.

anyway, the proof is in the pudding. the data isn't smushed to the point of
making individual variability invisible

------
gandutraveler
Why is the growth rate with time faster in every other country except China ?

~~~
Retric
China had longer for it’s quarantine to show up in the data. Remember, time is
not an axis of this graph, just # of new infections vs total infections.

------
thecleaner
Software question - what library did you use for the chart ?

~~~
crispinb
[https://github.com/aatishb/covidtrends](https://github.com/aatishb/covidtrends)
(linked at bottom of page)

------
nardi
Please add per-capita options!

~~~
jimmyswimmy
On a plot of log(cases), per capita is just an offset. It shifts the line up
or down on the plot, and doesn't make a difference. In a post upthread, mzs
reminds us of the math:

log(cases/pop) = log(cases) - log(pop) = log(cases) + C

Log plots tend to visually compress relatively small effects such as these, so
you don't really notice it, but it is much of why the lines aren't completely
on top of one another.

In other words - a per-capita plot should look the same, but the lines would
be harder to distinguish, because they'd be mostly right on top of one
another.

~~~
amai
You are assuming people are only interested in the slope. Then the constant
doesn't matter. But it does matter when you want to know which country is an
infection hot spot or not. A thousand cases in the US is something completely
different from a thousand cases in Vatican city.

------
3xblah
Did anyone else notice how worldometers.info, which maintains a table listing
number of cases, recoveries, deaths, etc. for each country, ordered by number
of cases, had China listed first even after USA surpassed it in number of
cases. No explanations were given. Then they removed China from the table
altogether. What is going on behind the scenes at that website.

~~~
sknzl
Also Italy was gone at some point and re-appeared. I guess it’s just a glitch.

~~~
3xblah
I see now China is at the very bottom, below Timor-Leste.

~~~
3xblah
The "basic" list seems to be properly sorted:
[https://www.worldometers.info/coronavirus/countries-where-
co...](https://www.worldometers.info/coronavirus/countries-where-coronavirus-
has-spread/)

------
HellDunkel
This means absolutly nothing. Please stop making dramatic charts out of pure
sensationalism.

~~~
kortex
Log(dX)/log(X) shows how the outbreak behaves compared to pure exponential
growth ( y = x) so any deviation shows how effective a country is at
preventing exponential growth. It's super useful.

~~~
HellDunkel
ok i may be completly off here but focusing on counting Covit-19 test
positives seems rather useless- what we should be focusing on is context. lots
of work to be done here. start by comparing deaths with similar cause (flue)
from same timespan last year. or even simpler: look at recovered cases. Also:
the effects caused by actions by goverments will be barely visible.

