
Estimating Covid-19's Rt in Real-Time - pcr910303
https://github.com/k-sys/covid-19/blob/master/Realtime%20R0.ipynb
======
Mvandenbergh
I rather prefer the approach here:
[https://epiforecasts.io/covid/posts/national/united-
kingdom/](https://epiforecasts.io/covid/posts/national/united-kingdom/) which
does not attempt to calculate r for the most recent days and shows days for
which uncertainty is great in progressively lighter colours. Part of the
excellent work done by the London School of Hygiene and Tropical Medicine's
Centre for Mathematical Modeling of Infectious Diseases much of which can be
found here:
[https://cmmid.github.io/topics/covid19/](https://cmmid.github.io/topics/covid19/).

That model uses the EpiEstim R package to estimate R based based on the number
of case onsets and a serial interval distribution. EpiEstim implements the
methods of Cori et al:
[https://academic.oup.com/aje/article/178/9/1505/89262](https://academic.oup.com/aje/article/178/9/1505/89262)
which also uses Bayesian methods.

It looks like the linked model is based on Bettencourt & Ribeiro:
[https://journals.plos.org/plosone/article?id=10.1371/journal...](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002185)

~~~
mzs
Very nice R, thanks!

[https://github.com/thibautjombart](https://github.com/thibautjombart)

[https://github.com/cmmid](https://github.com/cmmid)

------
standardUser
I don't see how we can meaningfully estimate the reproductive rate with such
variable testing rates and patterns. Some places barely test people with
severe symptoms, other places are testing anyone they can that may have been
exposed. Some states have tested less than 0.5% of the population, others have
tested 3% plus. Testing criteria and rates and test availability are all
changing from place-to-place in ways that I have not seen sufficiently
tracked. So we can't even correct for those variations if we tried.

~~~
cperciva
Testing rates and patterns don't make as much difference as you might think.
They'll dramatically affect the _absolute_ numbers, but to a first
approximation the curves will have the same _shape_ and thus estimates of R
won't be thrown off.

The bigger problem is that R varies across the population; if you're testing
primarily old age care homes you'll end up estimating what R is in that
community rather than estimating R in the broader (mostly untested) community.

~~~
rwcarlsen
The curves won't have the same shape if test availability and circumstances
under which people are tested change. And I think those things are definitely
changing over time.

~~~
cperciva
Sure, if testing changes, it throws things off. (Unless you can figure out how
to adjust for the changes in testing rate and methodology.)

I was talking about the issue of different jurisdictions testing differently.

------
shalmanese
One big limitation right now is that we're basing estimates on confirmed test
positive date as opposed to first onset of symptoms which is a much cleaner
signal. Unfortunately, first onset of symptom data doesn't appear to be
collected anywhere in a systematic fashion.

See this study from China where the use of onset of symptom data allowed them
to estimate R much more accurately:
[https://twitter.com/XihongLin/status/1236075174069440512](https://twitter.com/XihongLin/status/1236075174069440512)

From the official WHO Report, page 7, you can see how recording onset date
effectively allows you to "see" a week into the future:
[https://www.who.int/docs/default-source/coronaviruse/who-
chi...](https://www.who.int/docs/default-source/coronaviruse/who-china-joint-
mission-on-covid-19-final-report.pdf)

~~~
mzs
Indeed, for example I-NEDSS lacks an onset date for nearly two of five
confirmed cases in Kane co:

    
    
      $ head -1 kane-2020-04-19-2020-04-19-17:31:23-CDT.csv
      Age Range,Age,Case Count,City,Death,Reported Date,Sex,State,Symptom Onset
      $ awk -F, '!$9 { t++ } END { n = NR - 1; printf("%s %s %.1f%%\n", t, n, t * 100 / n) }' kane-2020-04-19-2020-04-19-17:31:23-CDT.csv
      233 607 38.4%
    

[https://www.kanehealth.com/](https://www.kanehealth.com/)

------
korethr
A curious thing I notice in the final plotted graphs. Some of them show a
continuous smooth decline, and when the trend dipped below 1, it continued to
do so in a smooth fashion. Others, it dipped below 1, then bounced back up
again, sometimes doing this a couple times. Assuming this is not a result from
errors in the calculation or inaccuracies in the data, I wonder what the cause
of this is. Was there some situational or behavioral change in those states
where the trend bounced back up, vs. those ones where it continued to smoothly
decline?

~~~
twoodfin
My (totally uneducated) guess is these spikes represent changes in testing
patterns for which the model is not adequately compensating.

For example, Massachusetts recently (around the time of the spike in its
modeled Rt) made an aggressive push to assess the spread of COVID-19 in
assisted living facilities.

------
impostervt
I found this site very helpful:

[https://rt.live/](https://rt.live/)

~~~
Animats
They're over-fitting noisy data. Huge changes in R over the course of a week,
as for Idaho, has to be an illusion. One problem is that some states and some
hospitals don't report daily data on weekends, and new cases and deaths are
charged to Monday.

Low-pass filtering helps smooth the curves, but adds lag, of course. Centered
low-pass filtering is useful but can only be applied retrospectively. There's
no magic bullet for dealing with noisy data.

~~~
mzs
It does do smoothing, plus the ranges are very plausible for this Bayesian
approach:

    
    
      $ head -1 rt.csv 
      state,date,ML,Low_90,High_90,Low_50,High_50
      $ awk -F, '$1 == "ID" { printf("%s: %.2f - %.2f\n", $2, $6, $7) }' rt.csv | tail -7
      2020-04-13: 0.07 - 0.54
      2020-04-14: 0.10 - 0.59
      2020-04-15: 0.32 - 0.92
      2020-04-16: 0.50 - 1.16
      2020-04-17: 0.51 - 1.17
      2020-04-18: 0.47 - 1.11
      2020-04-19: 0.32 - 0.94

------
mindslight
Does it strike anyone else as a bit weird to focus on R? It's obviously useful
to know whether R is less than or greater than unity (subcritical or
supercritical). But when R is greater than 1, then the growth rate _over time_
seems more fundamental.

~~~
dboreham
Perhaps the idea is to inform policy decisions, both government and personal.
If I know the R0 is 10, I'm probably not going to leave my house. Otoh if it's
2 then with an N95 mask I'd think about grocery shopping.

~~~
btilly
To think this way is to misunderstand exponential growth.

If you're concerned with personal risk, the biggest thing that matters is the
current spread. Of course if everyone is concerned with that, nobody will be
concerned until it is too late to prevent a large portion of us from getting
it.

However if you're concerned with the trajectory, all that you care about is
whether R is less than or greater than 1. For both 2 and 10 it will hit a
large part of the population, all at once. But 1.01 and 0.99 is the difference
between "we all get it" and "only a small fraction of us get it".

------
AbrahamParangi
Might be worthwhile to estimate a per-state transfer function from deaths to
cases to adjust for differential case reporting/testing. Deaths may be a
clearer signal (though shifted and spread in time).

