
Why it’s so hard to make a good Covid-19 model - gHeadphone
https://fivethirtyeight.com/features/why-its-so-freaking-hard-to-make-a-good-covid-19-model/
======
Thrymr
What this article misses is that simple models of complex systems in science
are most useful for understanding the dynamics of phenomena, not for making
accurate quantitative predictions. This is not a model of a mass accelerating
in a vacuum where Newton's laws are sufficient to a high degree of accuracy,
or even a numerical model of the aerodynamics of an airplane, where the
physics are well understood but there are far too many particles to solve
analytically. This is a parameterized model for the behavior of millions of
people, each one's reaction to exposure to viral particles, legal and social
norms, personal and economic situations, and so on. Of course we are not going
to be able to predict the future of an unprecedented event like this
quantitatively. It's not just that data are hard to obtain accurately; the
models themselves are so simplified that they can't capture much of the
important dynamics going on.

~~~
mikorym
There is another point too, which also to some extend goes against your
point—that we model epidemiology in standard ways, and that in order to use
those standard ways we need the parameters in the set of differential
equations. And these do make predictions based on the equilibria. I am not
saying they solve everything, but they are routinely used to for example
calculate how many people need to be vaccinated to stop outbreaks of measles.

The article does mention most of the parameters; but we don't know what values
to assign to those parameters. The last scientific writing that I was reading
speculated about the _R_0_ , but I believe we are still not sure about that.
In any case, the point of the parameters is to _find R_0_ , so we cannot
expect to have an accurate _R_0_ without them.

We also don't know mortality rates in a general population (look at Italy vs.
China, when specifying to age groups). But anyway, all I wanted to say is that
our models are simplistic, but surprisingly useful.

~~~
thereisnospork
The point of the parent is you can't actually find R_0 because R_0 depends on
the behaviors of millions of people which cumulatively are based on billions
of factors. R_0 isn't a constant, it's an equation. Mathematically, unless you
get really lucky or have a very idealized system, the standard models underfit
the actual system.

The models can still be useful though, e.g. 'we must reduce r0 below
r_critical to eliminate spontaneous spreading given herd immunity %'s' is
worth knowing.

~~~
jvanderbot
This is not as absolute as is being presented. We can speak about spreads and
aggregate behavioral patterns just fine if we take times to infinity and zoom
out far enough. The next few weeks or localized transmisson (which is
"impossible" to model) are problematic, agreed.

------
KaiserPro
Its pretty simple: there isn't any decent data.

The data we have is strongly biased to older and sicker people.

There is no systematic surveillance of a geographic area, only panic testing
of those who are showing symptoms.

Until there is sampling of a a borough, city or town, from start to finish, we
will have wildly wrong models.

The only thing that we can plot reasonably accurately is the exponent of the
fatalities, but even then its because its based on mostly hard data (unless
its china...)

~~~
jmoss20
There have been a few of those samples.

[https://www.cebm.net/covid-19/covid-19-what-proportion-
are-a...](https://www.cebm.net/covid-19/covid-19-what-proportion-are-
asymptomatic/)

------
jefftk
Note that this article is from March 31st, and while that wouldn't normally be
very long ago things are moving extremely quickly.

~~~
tunesmith
The main shift I've noticed (as a layman) is that people are thinking the
illness is more contagious and less fatal than first thought. That explains
the downward revisions in future deaths, but it also means that physical
distancing is more effective and more important than first thought. Because a
carrier that stays home is infecting ~5 less people rather than ~2 less
people.

The thing I'm worried about today is population centers deciding to relax
mitigation before they've vastly increased testing capacity.

~~~
sharken
Physical distancing is important if you are old (>65 years) or have a pre-
existing condition.

The remainder should treat Corona as they do with the common flu, that is what
the data is telling us.

References [https://www.globalresearch.ca/swiss-doctor-
covid-19/5707642](https://www.globalresearch.ca/swiss-doctor-covid-19/5707642)

[https://www.epicentro.iss.it/coronavirus/bollettino/Report-C...](https://www.epicentro.iss.it/coronavirus/bollettino/Report-
COVID-2019_17_marzo-v2.pdf) (Italian)

------
lonelappde
The article is a long listing of ways to say GIGO. It's hard to get good clean
accurate data.

~~~
paganel
Just this evening some Italian scientist announced on his twitter account that
the number of new positive cases does not correspond to the number of new
tests carried out and announced for that day, because those tests could have
actually been made 2 or 3 days before or something like that.

Which, presumably, instantly invalidated all the charts and data-modelling
based on the "number of new positive cases" / "total number of tests made"
(with a lower value being seen as better).

But the Region of Sicily (or its official twitter account, anyway) replied
that in their case the number of new positive cases and the total number of
tests made are indeed correlated, which of course means that everything is a
mess in terms of data coming in and its significance.

Later edit: For those who know Italian this is the tweet [1] I was writing
about, and it looks like I was remembering wrong, the guy is not a scientist
per se, more like a "data scientist", he seems to be working at a company very
similar to fivethirtyeight (but presumably focused on the Italian market).

[1]
[https://twitter.com/lorepregliasco/status/124827958933764505...](https://twitter.com/lorepregliasco/status/1248279589337645058)

~~~
tomrod
That's right, difference between the date of specimen collection and date of
announcement. General approach is to announce as soon as positive confirmation
hit. There is a vintage to this data. Source: working in this dataspace as we
speak.

Keep up the crowd-sourced wisdom HN, it helps prevent us data science folks in
thick of it from keeping blinders on!

------
francisofascii
In jest, I am imagining a bunch of data scientists working late into the
night, running the numbers and various scenarios, and then one person finally
stands up and says, "F __K it, let 's just shut down the whole country and
hope for the best."

~~~
sesuximo
That's totally unrealistic. A data scientist is never in the room where
decisions are made

------
m0zg
> Why it’s so hard to make a good Covid-19 model

Because there's no downside to the authors for overpredicting deaths and
resource use, and _a lot_ of downside for underpredicting. So all the "bad"
things get taken into account, and all the "good" things are ignored.

One thing I've reinforced in my view of the world is that common sense is very
uncommon indeed. "2 million deaths" my ass. I hope people reconsider their
trust in other models that are "hard to make".

~~~
socalnate1
2 Million deaths was without all the drastic measures we've taken. I'm having
a hard time understanding what exactly you are suggesting we (society) should
have done here. "Used our common sense" to do what?

------
ck2
Good luck trying to predict how much of a population is going to refuse to
isolate and interact anyway.

Also relies on governments and politicians to not fudge testing and death
counts, which is never going to be accurate.

btw the financial times has maybe the best graph on the stats however flawed:

[http://com.ft.imagepublish.upp-prod-
us.s3.amazonaws.com/2251...](http://com.ft.imagepublish.upp-prod-
us.s3.amazonaws.com/225197b0-79d4-11ea-af44-daa3def9ae03)

~~~
watwut
The actual model used by UK imperial college estimated compliance with
karantene of sick people 75% and compliance with general stay-at-home 50%.

What I find frustrating is that actual models used by epidemiologists back in
february and March incorporated pretty much all "gotchas" non-epidemiologists
discovered today. And it does not matter, because non-epidemiologists still
assume they are first ones to ever discover them.

------
oehpr
I've found this topic pretty interesting, and I've enjoyed trying my hand at
it myself.

One of the things I've been playing with is Insight Maker
[https://insightmaker.com/](https://insightmaker.com/) This site is a totally
free platform where you can set up the kinds of simulations this article
describes (stock and flow models). You can even specify your uncertainty in
your baseline assumptions and run sensitivity analysis to see what the
relative impact of each factor is on the model, and the range of potential
outputs you could have. This system is very much like
[https://www.getguesstimate.com/](https://www.getguesstimate.com/), except
much more flexible and way less intuitive.

Insight maker isn't a professional tool, it's really more of an advocacy and
outreach platform, but despite that it's really quite powerful.

I think after you've read this article and internalized the difficulties in
modeling pandemics (and have re-affirmed to yourself that you are not an
epidemiologist, unless you are, more power to you if so), you might have some
fun trying to build the model this article describes.

------
tunesmith
I'm still curious about the relationship between R0 and doubling period,
because you can kind of see a relationship between the two by examining
contagion period.

In general we've seen doubling periods that seem to suggest 5-6 days, but
occasionally as fast as 3 days, unclear how distorted those numbers are by
testing and mitigation and misattributed deaths.

If it has a natural R0 of 6, then it means that an infection will infect 6
others within that contagion period.

If we're thinking R0 is not 2-3 but is instead 5-6, then to make consistent
with the doubling periods we are seeing, it'd mean that people are contagious
for a longer period of time than we first thought.

------
YetAnotherNick
I tried to do some exploration with the data to get some idea on the final
number. One of the best plot I found that could tell the final number is
plotting the percentage of cases in the next week with the cases till now. It
is like negative half parabola and the time it meets the x axis will give the
final number per country.

This is the final plot:
[https://i.imgur.com/5p4Xife.png](https://i.imgur.com/5p4Xife.png). You could
make a good guess from the data how many people it will affect in the lifetime
per country by continuing the same pattern till it reaches x axis.

------
fxj
I really like the neher lab prediction tool.

[https://neherlab.org/covid19](https://neherlab.org/covid19)

You can choose different scenarios and compare it with real data. So you can
match the parameters of your model to the actual measured data (number of
deaths) and it also has data about the population and the state of ICUs and
hospitals that you can use in your prediction.

In the end it is just a tool which can give completely wrong predictions, but
you get a felling for what could and could not happen.

------
nowweknow
The results of lockdown in Spain and Italy are not in this post and they
provide crucial information. The lockdown in Spain has reduced in two weeks
the daily infection rate from 42% to 4%, so there is hope for the future. But
the problem is that the economy can't cope with the lockdown so we have a big
problem. Also since the R0 can be reduced so much with political decisions,
the emphasis should be on the political and economical ground.

------
brummm
Garbage in, garbage out! Can't build good models with data that's terrible.
Inconsistent testing, different reporting methods, lying from governments.

~~~
sesuximo
Garbage in policy out

------
systemvoltage
Completely tangent observation: What's the point of using sketch scribbles
over the diagrams? They could just make it in powerpoint and simplify. It
would be easier to read as well. Decoration for the sake of decoration? Why?

~~~
simonsarris
Charitably: To express the tentative nature of the information or model.
"Here's our working theory."

(I think it makes it somewhat hard to read in this case)

------
mrfusion
Don’t all these models ignore the effects of warmer weather? A few studies
have suggested there’s at least some effect.

------
thedudeabides5
We know it's hard Nate, it's hard for everyone else too.

That being said, I'd pay good money to hear his latest numbers...

------
scared2
It's novelty. Basically all models are based on assumptions in closed system.
As one gets more information they add it to the closed system eventually
making it easier to predict. As we know reality is extremely heterogeneous and
an open system with too many interactions. This in principle makes all
practically wrong. But it does not mean that they are not useful. (At least
they are useful to make scary graphs that convince presidents).

I had a lame model that is currently predicting lower death rate but in a
similar trend [1]. In this model my assumption is the lockdown to continue for
45 days. The result which I regret to have seen shows a scary number of 600k
deaths after 39 days. So I'm hoping for something spectacular to happen such
as vaccine, a drug, the sun etc that I would use to change my prediction.

[1]
[https://news.ycombinator.com/item?id=22814927](https://news.ycombinator.com/item?id=22814927)

------
tgafpc2
Because people have unrealistic expectations.

------
arjun_tina
Best article I've read through this entire Pandemic.

------
3fe9a03ccd14ca5
What about our models for global warming?

~~~
ben_w
Those two things are unrelated.

Unlike a virus, the laws of physics don’t appear to mutate.

Also, you can’t measure the quality of your model of virus progression with
earth observation satellites or hundreds of thousands of years worth of ice
core data.

Also, rather unfortunately in this case, the lack of meaningful action by
world governments means that none of the climate models have to account for
feedback caused by the world actually acting on the results of those the
models.

~~~
guscost
It’s going to be fun to watch this flood of nonsense convince precisely nobody
for the next few years.

------
sgt101
Very odd little diagrams to show model components in this article - why don't
they use digraphs like everyone who knows how to do stats?

oh...

------
ape4
Who thought it would be easy. One person can decide to break quarantine and
infect other people.

