
Forecasting s-curves is hard - osipov
https://constancecrozier.com/2020/04/16/forecasting-s-curves-is-hard/
======
vajrabum
Leaving modeling up to the professionals is completely the wrong lesson. Time
series forecasting from business, epidemiological or economic data forecasting
is hard so you should always take the results with a large grain of salt. The
professionals get it wrong too and just like the weather nearby points from
the model are more likely to be accurate than distant ones.

Smoothing helps. Estimating error helps. Domain knowledge helps. Experience in
applying the model helps. There are techniques other than the one demonstrated
in the article. Sometimes those help. There are also professionals who create
models which are used to support the bias of the modeler or the modelers
employer (i.e. cherry picking).

Politicians and pundits more often than not take the results of these models
and draw conclusions which are unwarranted or at least highly uncertain
without mentioning the uncertainty or only giving it lip service.

~~~
diehunde
True. Also, if you read Superforecasting[1] you'll learn about several
experiments where "non-experts" beat experts by a lot by just being well
informed. Even when the experts had access to confidential data.

[1] [https://www.amazon.com/Superforecasting-Science-
Prediction-P...](https://www.amazon.com/Superforecasting-Science-Prediction-
Philip-Tetlock/dp/0804136718)

~~~
stev_stev
I’m very skeptical of evidentiary patterns laid out in NYT bestseller style
books.

~~~
MagnumOpus
You are right to be. Tetlock spun the conclusions of the project far boyond
what was warranted, especially because the project was heavily flawed.

I participated in the initial part of the project, figured out the winning
strategy very quickly, and then dropped my participation immediately. How to
be "super at forecasting" in Tetlock's eyes? Have lots of spare time in the
middle of the day when professional are at work, because that is when
questions get posted, and easy points can get made. (To win a prize that is
less than the hourly rate of a professional.) Hmm, no wonder that the
conclusion was that people with no expertise but plenty of spare time can beat
the pros...

~~~
diehunde
You mean beat the people that actually do that for a living?

------
tel
This is accounted for in more professional methods which estimate error.
During a period of exponential growth, the error 6 weeks out is very sensitive
to tiny errors occurring in immediate measurement.

It's not so much that fitting exponential or S-curves is _hard_ as much as
even a very good fit is likely to have very significant error bars.

The hard parts about modeling the current situation involve the poor and
limited data. The dynamics of the system are highly variable and unobserved
and thus require clever tricks to reduce uncertainty by expert knowledge,
clever alternative signals, and leveraging other models.

So ultimately, it's just not the forecasting but the response. People rarely
handle uncertainty very well and a situation with uncertain exponential growth
ends up with exponential uncertainty. Making policy decisions where the
predicted outcome spans orders of magnitude is tremendously challenging.

~~~
acqq
Exactly. And I have a very recent example, yesterday somebody responded to one
of my replies with:

"Cuomo claimed to need 40k ventilators after the lockdown was put in place. He
ended up needing only a fraction of that. Obviously, without the lockdown,
there would likely be more needed. Of course, modeling has errors, but this is
an extremely large error, and one that has direct policy implications." (1)

My response was that it wasn't "a large error" but what only 6 more days of
exponential growth would have made still insufficient:

[https://news.ycombinator.com/item?id=22911465](https://news.ycombinator.com/item?id=22911465)

Also as an illustration of the "uncertainty" one can see the shaded areas
clearly here:

[https://covid19.healthdata.org/united-states-of-
america](https://covid19.healthdata.org/united-states-of-america)

As I write this, the model on that page was last time updated based on the
data up to the 15th of April, only 4 days ago, and its best estimate for
number of deaths in USA on 19th of April was 37,056. Also as I check the
[https://www.worldometers.info/coronavirus/country/us/](https://www.worldometers.info/coronavirus/country/us/)
it's already 40,423. But the 95% error area for 19th is 39,721-56,108. It's
that hard. Also at this moment the projection for 1st of June is 60,262
(34,063-140,106) and we can compare all these values after some later updates,
e.g. in 6 and 12 days.

1) careful observers know that there is one more talked about person who is
used the same pseudo-argument before.

~~~
acqq
Rechecked the page: currently data up to 20th April, 22th of April page
updated:

June 01, 2020 projection: 67,444 (48,058-122,203)

Today's number of deaths in the U.S., according to worldometers: 47,808. Today
is April 23.

~~~
acqq
Next recheck: currently death data up to 25th April.

June 01, 2020 projection: 73,621 (56,562-128,167)

Total number of deaths in the U.S., worldometers: 56,803. Today is April 28.

~~~
acqq
Next: currently last death non-projected data: May 01: 65,249.

June 01, 2020 projection: 112,073 (91,586-155,417)

Total number of deaths in the U.S., worldometers: 74,114. Today is May 06.

------
bhouston
I was thinking the same thing. Exponentials are so sensitive to almost nothing
at the beginning and then it is too late to react. The only solution to not be
late in reacting is to over react.

I was correct in my predictions of the disease coming to North America and the
stages it would entail but I was wrong about the timeline because so as it
happened it happened so fast.

~~~
kurthr
Similar situation, and I knew it could happen very fast (3 day doubling
periods do that), but without data I still wouldn't have predicted (and can't
now) where it would get bad and where it would middle around.

What we do now know is that it changes VERY fast so you can't wait for your
ICUs to fill up or a lot of people will die... you have to act early. And if
it came once this fast... it can again, if our guard goes down (and it will).

------
littlestymaar
> In other words, data enthusiasts (such as myself) should leave the modelling
> up to the professionals.

This is really important to understand, and unfortunately often overlooked,
especially by economists, who often work with good data and solid mathematical
background but no prior domain knowledge and no reference to literature of the
given field.

~~~
etangent
I have not seen a single useful model from professional epidemiologists so far
-- at least nothing that would guide me better than "look at China and Italy
and consider that it might also happen here."

~~~
xzel
I would argue that they're all useful but its very tough to be accurate when
confounding actions are occurring to the problem. For example, in the US, the
increase in social distancing measures and mask usage couldn't be easily baked
into the original models. Forecasting accurately is very tough. Look at what's
been said about Renaissance and its Medallion fund, they're right a little
over 50% (they just leverage very heavily). Think about that, the most
consistent and dominate hedgefund might only be right about 51% of the time.
The issue with just using historical data is the data might be different. But
I'm sure all of their models were using data or at least some domain knowledge
from the spread in Italy and China.

~~~
icelancer
>> For example, in the US, the increase in social distancing measures and mask
usage couldn't be easily baked into the original models.

The IHME model - one of the most cited models - explicitly factored this into
their model.

~~~
archgoon
"Starting April 17, we began using mobile phone data to better assess the
impact of social distancing across states and countries. These data revealed
that social distancing was happening to a larger degree than previously
understood, and even before social distancing mandates went into effect."

This suggests that social distancing happened 1-2 weeks sooner than their
model anticipated. This will give you a much smaller number.

What is the policy you believe should have been followed instead? Are you
saying that social distancing should have been abandoned? Why do you believe
that this would result in fewer deaths?

~~~
icelancer
No, I am saying they factored it into their models and also undershot their
priors that were readily available. If they looked at OpenTable data, they
would have realized people were voluntarily socially distancing before state
authorities told them they should. In fact, OpenTable data showed that even on
the day that the Mayor of NYC told people to go to sit-down restaurants and
see movies in person (in early March), y/o/y data showed a net 30% reduction
in reservations for restaurants in NYC.

People were voluntarily taking precaution well before our government
bureaucracies enforced it, and it was available in public datasets. IHME just
didn't look hard enough, apparently. Despite what the media keeps pushing by
finding pockets of idiots, people aren't stupid. The most at-risk populations
realize this and tend to shelter in place on their own and take precautions.

de Blasio on March 11th telling people to go out:
[https://ny.eater.com/2020/3/11/21175497/coronavirus-nyc-
rest...](https://ny.eater.com/2020/3/11/21175497/coronavirus-nyc-restaurants-
safe-dine-out)

OpenTable data showing a voluntary reduction of restaurant activity _despite_
de Blasio's encouragements: [https://ibb.co/r2R9xnT](https://ibb.co/r2R9xnT)

~~~
archgoon
What policies are you saying were misguided and should not have been taken
based on the predicted elevated death toll?

How does one evaluate the OpenTable data and feed it into the model they were
using to estimate the amount of social distancing?

~~~
icelancer
Extended lockdowns are _positively_ correlated with deaths per million
residents.

[https://twitter.com/boriquagato/status/1251943418860728320](https://twitter.com/boriquagato/status/1251943418860728320)

I'm saying that lengthy lockdowns, shelter-in-place, and shutting down non-
storefront businesses did not provide much value, if any, and possibly had a
negative effect.

The IHME trended into 95%+ CI territory over half the time. So when posters
here say "well the variance was high," that's accounted for in the confidence
intervals, which they also missed on wildly.

The models justified harsh authoritarian action, and the actual data way, way
undershot it. Popular sentiment is "oh well, it saved lives at least" but few,
if any, are looking at the acute - and more importantly, chronic - economic
costs of these policies that may or may not have even helped beyond just
telling people what the risks were.

>> How does one evaluate the OpenTable data and feed it into the model they
were using to estimate the amount of social distancing?

Pretty simple; it was clear that news of COVID-19 alone was enough to cause
people to stop going to restaurants and start basic social distancing
protocols on their own without government mandating it.

EDIT: Shockingly, a bunch of downvotes without explanation are forthcoming.

~~~
cycomanic
One thing that you and others arguing in the same way seem to neglect is that
there is a huge economical cost to a significant fraction of your society
dying. This is not an either or situation. I want to see your economic models
that the economy would have been fine with 0.5-1% of the population dying
within a short period (actually everywhere the health system got overloaded
the rate was significantly higher). Let's not even talk about the loss in
confidence in the government because they failed to react.

~~~
xzel
This is a great view point that I'd never thought of or seen put forward. Is
there any good information into the economic impact of the Spanish Flu or even
the Bubonic Plauge?

------
api
"It may not be surprising that in the exponential growth phase the estimate is
very bad, but even in the linear phase (when 40+ points are available) the
correct curve has not been found. In fact, it is only once the data starts to
level-off that the correct s-curve is found. This is especially unhelpful when
you consider that it can be quite hard to tell which part of the curve you on;
hindsight is 20-20."

Does this perhaps explain why we are so bad at calling the top of economic
bubbles and similar phenomena? Maybe there literally is not enough information
until the very end. It's not that we're dumb. It's that we can't do the
mathematically impossible.

I have a sense that this might be a very important and profound principle that
might explain a lot of seemingly irrational behaviors.

~~~
KarlKemp
I believe our supposed inability to predict economic bubbles (and other
financial crises) is mostly just tautological: those crises we _can_ predict
never happen because they are "self-falsifying prophesies": If enough people
believe the stock or housing market is overheating, they will stop buying or
might even speculate against further rising prices.

Thus, any bubble that does manage to grow to significant size before bursting
is necessarily "unforeseen". (of course because of that thing with the
monkeys, the keyboards, and online message boards, the will always be plenty
of people that _did_ see it coming, but I would wait to buy their book until
they do it a second time).

I know it's always trendy to hate on economists (some people have taken that
idea all the way to creating cryptocurrencies and reinventing economics along
the way). But comparing, say, the 2008 crisis with the 1920 or even the 1970s,
I can't shake the feeling that maybe economists have become slightly better
over time. The gold standard fandom that was all the rage for a while
essentially rests on the idea that interventions by central banks are _worse
than doing nothing_ , and the evidence now seems overwhelming that we can do
better than this (admittedly low) benchmark.

~~~
shazzzm
I think Taleb makes a similar point in his books - if you can forsee an
economic crisis, you take steps to avoid it and therefore it doesn't happen.
But then everyone asks why you were wrong in the first place, as the crisis
didn't happen

------
pif
> However, in my experience “intuition” and “mathematics” can often be hard to
> reconcile.

They can be hard to do, but you must absolutely reconcile them for your effort
to be fruitful. Until they are separated, your intuition or your mathematics
is wrong, and you can't know which is.

------
6gvONxR4sf7o
What's even harder is that you're usually trying to forecast for a reason
other than pure academic curiosity. Like, should I be worried about the
coronavirus? Should we stay inside? When can we reopen business? But the
decisions people take determine the curve and the predicted curve determines
people's decisions. If you decide not to social distance, you're changing the
future. For that reason, forecasting is better done under different scenarios.
For example, don't tell people approximately X (+/\- a lot) will die. Tell
them approximately X (+/\- a little less) will die if we don't social distance
and approximately Y (+/\- a little less) will die if 95% of us self-
quarantine.

It's harder than just fitting a curve, but that's kind of the point. If you
want actionable predictions, it _is_ harder.

------
m3kw9
Is hard because for example to predict case load, at any point on the curve
you could have policy changes, surprise increase of case load for various
reasons(sudden increase of tests), these are just a few of possible hundreds
of high impact events that can impact length and slope at any point

~~~
panarky
Yes, exactly. It's tempting to try to compute coefficients X weeks in, so we
can forecast X+26 weeks in the future.

But the coefficients aren't fixed. Coefficients change drastically due to
public policy, individual actions in response to news and social media,
culture of local communities, degree of compliance with public policy, travel
between regions with different rates of infection, etc.

So you're not fitting a curve to the data, you're modeling dynamic human
actions which are not nearly as easy to forecast.

------
BenoitP
Well, considering all s-curves are exponentials at the beginning, it naturally
flows that you can only make an accurate model after the inflexion point.

Since we're on HN, I'll plug a question I've had for a long time:

Do IPOs/liquidity events/exits/etc all happen when the right price has been
determined; ie when we can see what size the company will be, and it is no
longer useful to capitalize it further? When all growth paths have been
explored. Is that time the inflexion point?

What does the conversation look like with the VCs when they see the inflexion
point?

~~~
claudiusd
> considering all s-curves are exponentials at the beginning

Sorry, but I hear a lot of people saying this and it's driving me crazy.
S-curves are S-curves from the beginning, not exponentials. It can be useful
to use an exponential growth model at the beginning of the curve for short-
term forecasting, but these two models will diverge dramatically at the
S-curve inflection point.

Not that we shouldn't plan as if exponential growth will occur in a crisis
like we're in now, but many people I know don't understand these dynamics and
it has lead to a lot of undue panic.

~~~
JadeNB
Thank you for this response. All smooth curves are also approximately linear
at all points, but that doesn't mean that we can't usefully predict and model
an appropriate non-linear fit.

~~~
pbhjpbhj
Not disagreeing, what you say seems right to me, but it ("All smooth curves
...") also seems like the sort of result that there might be a counterexample
to, like a smooth curve that at no scale has a linear approximation. Maybe a
fractal with a smoothly varying generator??

For curves of infection rates I don't doubt your verity.

~~~
umanwizard
“Smooth” is a term of art in math that means continuously differentiable
infinitely many times. These functions are a subset of those functions that
are differentiable once.

Differentiable once means, by definition, approximable everywhere by a line at
small enough scale.

~~~
pbhjpbhj
Yes, I know that much.

Imagine a sine wave, except when you look at it at 1000x magnification it's a
sin + cos. It looks smooth, and at x = π you think that the derivative will be
-1 but in fact it's 0 because you don't have a sine crossover there you have a
cosine trough.

Except at 1000000 times magnification (ie another 1000) the cosine curve that
forms that apparent sine curve is itself a sine curve. So everything is
switched again.

f(x) is something like sin(x)+cos(ax)/a+sin(a^2x)/a^2+cos(a^3x)/a^3+ ...
(sin(x.a^j)/a^j+cos(x.a^j+1)/a^j+1+ ...

for a=some arbitrary large number. Something like that, I'm a bit rusty,
sorry.

At whatever scale you look at the curve the derivative is always wrong: you
zoom in on the sine, at the peak it's got a cosine, so the d/dx is -1; but
zoom in and the cosine has a sine at the crossing point, so the d/dx is 0; but
zoom in and ...

The curve is provably smooth, it's sinuses all the way down, but nowhere can
you tell the derivative as it's fractal???

That's what I had in mind.

Anyway, I thought their might be clever curves of that type.

~~~
umanwizard
A curve with the properties you describe is not smooth, by definition, since
it is not differentiable.

> At whatever scale you look at the curve the derivative is always wrong: you
> zoom in on the sine, at the peak it's got a cosine, so the d/dx is -1; but
> zoom in and the cosine has a sine at the crossing point, so the d/dx is 0;
> but zoom in and ...

This is pretty much the definition of something not being differentiable.
"Differentiable" means that the approximations to the derivative (i.e.,
difference quotients) converge to some fixed value as the scale they're
measured at approaches the infinitely small.

You might be interested in the Weierstrass function, which seems to be the
sort of thing you're getting at with your idea:
[https://en.wikipedia.org/wiki/Weierstrass_function](https://en.wikipedia.org/wiki/Weierstrass_function)
. Continuous everywhere, but differentiable nowhere.

Edit: the specific function you wrote down is not differentiable (at least not
everywhere). For example, at x=0, its derivative, if it had one, should be
cos(0) + cos (a * 0) + cos (a^2 * 0) + ... , but that series clearly diverges.

~~~
pbhjpbhj
cos(0) being 1, sin (0) begin 0; that series is 0 + 10e-3 + 0 + 10e-9 + 0 +
10e-15 + ... when x=0 (excuse my sloppy notation, I'm on a phone). Looks
convergent to me, somewhere between 0.001000001000001 and 0.001000001000002 ?

Have you studied fractal dimensions formally, might I know the background
you're speaking from?

Yes, I'm imagining, as a first example, something akin to a Weierstrass
function but without the discontinuities.

[https://www.desmos.com/calculator/c6tbl4zr9j](https://www.desmos.com/calculator/c6tbl4zr9j)

~~~
umanwizard
Yes, the original function converges (not its derivative). The terms in the
derivative are no longer divided by a^k. The a^k from the argument to cos/sin
cancels then out (it becomes a multiplier, because of the chain rule).

So the series for the derivative is 0 + 1 + 0 + 1 + ..., which doesn’t
converge.

Not sure what this has to do with fractal dimensions. This is a simple
question of definitions. The word “smooth”, in math, literally implies, by
definition, that the function is everywhere approximable by lines.

If you don’t think it does, can you state the formal definition of “smooth”
that you’re using?

The Weierstrass function doesn’t have discontinuities, btw.

------
krastanov
I am bothered the animation does not include confidence intervals or error
bars for the fit. The way these confidence intervals would shrink as more data
points are available would tell a just as important part of the story.

~~~
chillingeffect
yes, a better problem definition would in this case reveal that the estimate
becomes quite good (in my definition) around the 50% time mark, when the
transition from ^x to ^-x takes place. More specifically, I mean that although
the stable point is still off by e.g. 25%, the important thing is that the
stable _time_ of the curve is well-estimated. you know it's no longer
increasing exponentially...

------
surroundingbox
I suppose that the s-curve with 3 parameters that the author is talking about
is the logistic function. In general, if you consider a differentiable
function of three parameters and try to determine and interval for the values
of the parameters of that model then the length of that interval is bound by
the ratio of the error in the data over the derivative with respect the
parameter. For example estimating the parameter k (wikipedia logistic growth
rate) with points such that x near x0 = (wikipedia midpoint of the sigmoid) is
hard, since the derivative of the function with respect to k at x=x0 is zero.
So mathematically this seems to be a well known fact when one try to estimate
parameters from datapoints.

~~~
cracker_jacks
> the length of that interval is bound by the ratio of the error in the data
> over the derivative with respect the parameter

This is interesting! Could you expand on this a bit? Why is the length of the
interval bound by the ratio of the data error over the derivative?

~~~
surroundingbox
The general case require some work and conditions. But to give a hint, the
case of only one parameter is an application of the mean value theorem (1).
Suppose a model (y = f(p,x) ) with only one parameter p0 and an exact point
(x0,y0) (that is y0=f(p0,x)) and a data point (x0,y1) such that y1-y0=error in
the data. And that there is a value p1 of the parameter such that f(p1,x0) =
y1, then y1 - y0 = f(p1,x0) - f(p0,x0) = f'(sigma) . (p1-p0), so that p1-p0 =
(y1-y0)/f'(sigma) that is (error in the parameter) = (error in the
data)/(derivative with respect to the parameter) where sigma is between p0 and
p1. The general case is a generalization of this idea using the mean value
inequality.

(1)
[https://en.wikipedia.org/wiki/Mean_value_theorem](https://en.wikipedia.org/wiki/Mean_value_theorem)

------
JoelJacobson
I made a R web-app in Shiny which does curve fitting of a Four Parameter Log
Logistic function (which is the S-curve discussed in the article) against the
John Hopkins data:

[https://joelonsql.shinyapps.io/coronalyzer/](https://joelonsql.shinyapps.io/coronalyzer/)
[https://github.com/joelonsql/coronalyzer](https://github.com/joelonsql/coronalyzer)

"Sweden FHM" is the default country, which is a different data source, it's
using data from the Folkhälsomyndigheten FHM (Swedish Public Health Agency),
which is adjusted by death date and not reporting date as the John Hopkins
data is.

------
FabHK
One thing to note (from looking at the graph) is that the noise seems to be
additive with constant standard deviation (and presumably floored floored such
that the sum doesn’t go negative).

That means that there is huge relative error initially (we have 10 infections
+/\- 100), and very little relative error eventually (we have 1000000
infections +/\- 100).

I assume the forecasts would be better if the error were multiplicative (in
other words, with standard deviation proportional to the current value).

However, I think the main point stands: the forecasts get much better once one
approaches the inflection point.

------
autokad
I did the covid19 week1 and week2 kaggle competitions (I think they had 4 of
them) below. If you are interested, this is a fun way to play around with the
data, and it shows how hard it is.

This I tried: \- Weibull Gamma distributions, but it was impossible to find
good parameters for the distributions without exploding. It would only work if
I put in an additional parameter saying 99% of the population wouldnt get it.
It would come up with good shapes of growth but the predictions were far too
below actual values in the future.

\- Logistic curves. usually great for countries that already ran up the curve
but terrible for ones still in the exponential phase (as the article states).
Also kind of useless for countries that didnt even begin their journey up the
curve.

\- light gbm: good for predicting the next day but terrible many days out. It
seems other counties curves do not help that much

\- SARIMAX: really good but later predictions would explode, like showing 4
million deaths in france, etc.

I tried to get around these by ensembling them together, but overall I did
very poorly at predicting coronvavirus. I still want to get better at this, so
if anyone has any good suggestions, please share. Also you can check out what
other kagglers have done as well

[https://www.kaggle.com/c/covid19-global-forecasting-
week-1](https://www.kaggle.com/c/covid19-global-forecasting-week-1)
[https://www.kaggle.com/c/covid19-global-forecasting-
week-2](https://www.kaggle.com/c/covid19-global-forecasting-week-2)

~~~
earthicus
I briefly studied mathematical biology, and remember there being some debate
about whether tumor growth was more accurately modeled by logistic growth or
Gompertz growth [1]. I'd be curious to know whether your fits get better or
worse if you replace your logistic-based model with a Gompertz-based model.

[1]
[https://en.wikipedia.org/wiki/Gompertz_function](https://en.wikipedia.org/wiki/Gompertz_function)

~~~
autokad
Thanks, Ill take a look into it. It seems hn has no place for discussion these
days and will down-vote anything.

------
tlarkworthy
This article is about estimating a an s-curve in the real world. The point it:
You cannot get a sense of the end by observing the beginning... even in the
super idealized case of the data coming from a noisy s-curve. This learning
obviously transfers over to the real world, where the data is going to be
strictly worse than the idealized case (i.e. it won't be a perfect s-curve).
Its a great article, with applications to pandemic forecasting.

------
paulpauper
>S-curves have only three parameters, and so it is perhaps impressive that
they fit a variety of systems so well

no it does not. it just so happens that the solution of certain differential
equations produces an s-shape. it has nothing to do with having only 3
parameters. Very sophisticated models with many of parameters and conditions
can produce this shape.

~~~
ericjang
The author was probably talking about the logistic function being
parameterized by 3 parameters, x0, L, and k [1]. The point the author is
probably making is that if you are fitting a perfect logistic model to data, 3
data points should be sufficient to determine 3 parameters that unambiguously
parameterize the curve.

A separate set of 3 parameters also parameterizes the SIR compartmental model
([https://en.wikipedia.org/wiki/Compartmental_models_in_epidem...](https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SIR_model)),
the solution of which also looks like a logistic curve. But this is a model-
based (dynamics-based) solution whereas one may be interested in just fitting
the logistic model based on the assumption that it's going to be logistic.

[1]
[https://en.wikipedia.org/wiki/Logistic_function](https://en.wikipedia.org/wiki/Logistic_function)

------
hinkley
A turning point in my understanding of S Curves came when I encountered an
article that showed an S Curve as the cumulative area under a normal
distribution.

Which is great _if_ your velocity on the project retains a normal distribution
over time. But missing requirements, bad risk analysis, and team dynamic
changes can make for a long velocity tail.

Which means the last 10% of the project accounts for the other 90% of the
project time.

If however you have managed to do the important work first, you drop less
important features and ship 95% of the planned features on time.

~~~
FabHK
It’s a common misconception, but the S curve generated by disease, for
example, with exponential growth at the beginning, corresponds not to the
normal distribution, but the logistic distribution, which has fatter tails
than the normal.

The bell curve drops to zero with exp(-x^2), that is extremely fast.

The logistic distribution (derivative of the S-curve described here) drops to
zero with exp(-|x|), that is exponentially, but not as fast as the normal
distribution.

[https://en.wikipedia.org/wiki/Logistic_distribution](https://en.wikipedia.org/wiki/Logistic_distribution)

------
nmca
Would be intrigued to see what the story is like with priors on the parameters
and a credible interval on the output.

------
mirimir
I've loved s-curves for decades. Way back in the lab, to analyze ligand-
binding assays. And not that long ago, as a litigation consultant.

> For technological changes, can the final level-off be reasonably estimated?

It helps a lot if it's 0% or 100% :) Given that, you get a decent long-term
fit, after about half the time to plateau.

------
anonytrary
Forecasting solutions to a differential equation is hard, especially when
there are infinitely many solutions. It is a matter of keeping constants up to
date in light of new information. If those constants are wrong, the whole
model is essentially useless.

~~~
LolWolf
> Forecasting solutions to a differential equation is hard, especially when
> there are infinitely many solutions.

Not sure what you mean by infinitely many solutions, but this is not true in
general. A silly example is

y'(t) = Ct,

where, for most distributions, you would only need a few points to get both C
and the initial condition to reasonably high accuracy (~O(sqrt(n))). More
complicated examples exist that have much more interesting dynamics, but their
general trajectories are just not as sensitive/chaotic (w.r.t. the initial
parameters).

> If those constants are wrong, the whole model is essentially useless.

I think what makes this hard is not _if_ they're wrong, but rather, being even
just a tiny bit wrong makes the whole future prediction change drastically. In
other words, two possible inferred parameters which are statistically
indistinguishable given our current observations will yield incredibly
different outcomes under many of these models.

> It is a matter of keeping constants up to date in light of new information.

Indeed! :)

------
nabla9
To predict the s-curve for epidemic, you need to know R0.

The upper limit of the curve is 1 - 1/R0. If you use mitigation and effective
reproductive number the number is 1 - 1/Rt, where Rt varies with mitigation
effort.

------
sradman
One of the only good things to come out of this pandemic is the increased
emphasis on S-Curves. Models tend to use the predictive power of exponential
curves to estimate the steep part of the S-Curve but these predictions are
short term and are best applied to planning scenarios.

What is missing from this article is the relationship between S-Curves and
Bell Curves. We can use the Rules of Thumb associated with the Normal
Distribution to think about peak growth rate and standard deviations.

The health data.org curve fitting is a decent Fermi Estimate based on observed
data. Models are always wrong but sometimes they are useful and I hope we
start to discuss the underlying key assumptions used in each case rather than
focusing on their imperfect predictive power.

------
clairity
the article correctly points out that 3 parameters need to be estimated but
then jumps directly to modeling those 3 parameters with 3 points and that that
will always be wrong.

that’s not the right intuition. you could model a logistic curve with just 3
points if the error in those measurements tended toward infinity. the further
apart the points are, the less tight the error bars need to be.

the problem with real-world modeling/curve-fitting is that measurements are
super noisy and the errors in them are significant.

------
YetAnotherNick
I tried to do some exploration with the coronavirus data to get some idea on
the final number. One of the best plot I found that could tell the final
number is plotting the percentage of cases in the next week with the cases
till now. It is like negative half parabola and the time it meets the x axis
will give the final number per country.

This is the final plot:
[https://i.imgur.com/o54t0Ts.png](https://i.imgur.com/o54t0Ts.png). You could
make a good guess from the data how many people it will affect in the lifetime
per country by continuing the same pattern till it reaches x axis.

------
whiw
>Many of us will have learnt in school that if there are three parameters to
be found, you need three data points to define the function.

4 points are needed in my universe.

------
graycat
Yes, as in the OP, S curves can be challenging to work with, in particular, to
use, say, early in the history of smart phones, to make long term projections
from the number of smart phones sold each day for each of the last 30 days.

But there is some good news: The data used can vary, and in some cases good
projections can be easier to make. Can see, e.g., for COVID-19 the recent

[https://news.ycombinator.com/item?id=22898015](https://news.ycombinator.com/item?id=22898015)

[https://news.ycombinator.com/item?id=22897967](https://news.ycombinator.com/item?id=22897967)

[https://news.ycombinator.com/item?id=22900104](https://news.ycombinator.com/item?id=22900104)

[https://news.ycombinator.com/item?id=22902667](https://news.ycombinator.com/item?id=22902667)

In the third one of those we have that the projection from a FedEx case is the
solution to the first order ordinary differential equation initial value
problem

y'(t) = k y(t) (b - y(t))

There for data we used y(0) and b. Then we guessed at k. Had we used values of
y for the past month, we could have picked a better, likely fairly good, value
for k.

Lesson: Fitting an S curve does not have to be terribly bad.

The key here is the b: The S curve of the solution is the logistic curve, and
it rises to be asymptotic to b from below. Knowing b helps a LOT! When have b,
are no longer doing a projection or extrapolation but nearly just an
interpolation -- much better.

For FedEx, the b was the capacity of the fleet. For COVID-19 the b would be
the population needed for herd immunity (from recovering from the virus, from
therapeutics that confer immunity, and a vaccine that confers immunity).

Knowing b makes the fitting much easier/better. To know b, likely need to look
at the real situation, e.g., population of candidate smart phone users,
candidate TV set owners, market potential of FedEx (as it was planned at the
time), or population needed for herd immunity for the people in some
relatively isolated geographic area.

Then in TeX source code, the solution is

y(t) = { y(0) b e^{bkt} \over y(0) \big ( e^{bkt} - 1 \big ) + b}

Can also use a continuous time discrete state space Markov process
subordinated to a Poisson process. Here's how that works:

Have some states, right, they are discrete. For FedEx, that would be (i) the
number of customers talking about the service and (ii) the number of target
customers listening. Then the time to the next customer is much like the time
to the next click of a Geiger counter, that is, has exponential distribution,
that is, is the time of the next arrival in a Poisson arrival process (e.g.,
the time of the next arrival at the Google Web site). So at this arrival, the
process moves to a new state where we have 1 more current customer and 1 less
target customer. Then start again to get the next new customer.

The Markov assumption is that the past and future of the process are
conditionally independent given the present state; so that justifies our
getting to the next state using only the current state -- given the current
state, for predicting the future, everything before that is irrelevant.

What is a Markov process, what satisfies that the Markov assumption, can
depend on what we select for the state -- roughly the more we have in the
state, the closer we are to Markov. In particular, if we take the whole past
history of the process as the state, IIRC every process is Markov. But Markov
helps in something like the FedEx application since that state is so simple.

We get to use continuous time since the time to the next change of state is
from a Poisson process whose arrival times are the continuum -- that is, we
don't have to make time discrete although it is true that the history of the
process (one sample path) has state changes only at discrete times.

So, for state change, and for some positive integer n we have some n possible
states, then for i, j = 1, 2, ..., n, we can have some p(i,j) which is the
probability of jumping from state i to state j, that is, we have an n x n
matrix of transition probabilities.

[p(i,j) is the conditional probability of entering state j given that the last
state was i.]

For two jumps, square that matrix. Now there is a lot of pretty math -- get
some limits and eigenvectors of states, etc. Actually fairly generally there
is a closed form solution to the process. Alas, often in practice that closed
form is useless because the n and the n x n are so large, maybe n^2 in the
trillions. E.g., in a problem I solved for war at sea, there were Red weapons,
Blue weapons, on each side some number types and some number of weapons of
that type. The states were the combinatorial explosion. Then there were the
one on one Red-Blue encounters where one died, the other died, both died, or
neither died. The time to an encounter was the next arrival of Poisson
processes, also Poisson. Well, that was an example where there was a closed
form solution but n and n x n were wildly too large for the closed form
solution but running off, say, 500 sample paths via Monte-Carlo was easy to
program and fast for the computer. So, sure the software reported the average
of the 500 sample paths. On a PC today, my software would be done before could
get finger off the mouse button or the Enter key.

This approach is fairly general. And since what I did included attack
submarines, SSBN submarines, anti-submarine destroyer ships, long range
airplanes, etc., there should be no difficulty building such a model for
COVID-19 that included babies, grade school kids, ..., nursing home residents,
people at home, people working nearly alone on farms, ....

Back to S curves, IIRC dropping out of the math for the n x n matrix and its
powers is an S curve. So, in a broad range of cases, always get an S curve
although a different curve depending on, yes, the p(i,j) and the initial
state. Uh, when no one is left sick, the Markov process handles that as an
_absorbing state_ \-- once get there, don't leave.

For the n in the billions, the n x n is really a biggie. So, for the submarine
problem I did,

J. Keilson, _Green 's Function Methods in Probability Theory._

asked "How can you possibly fathom that enormous state space?". That is a good
question, and my answer was: "After, say, 5 days, the number of SSBNs left is
a random variable. It is bounded. So it has finite variance. So, both the
strong and weak laws of large numbers apply. So, run off 500 sample paths,
average them, and get the expectation within a gnat's ass nearly all the time.
Intuitively, Monte Carlo puts the effort where the action is.". Keilson was
offended by "gnat's ass" but liked the math and approved my work for the US
Navy. That question and answer are good to keep in mind.

There is more in, say,

Erhan Çinlar, Introduction to Stochastic Processes, ISBN 0-13-498089-1,
Prentice-Hall, Englewood Cliffs, NJ, 1975.

For why the arrival times have exponential distribution and why we get a
Poisson process, Çinlar has a nice simple, intuitive, useful axiomatic
derivation. There is more via the renewal theorem in

William Feller, _An Introduction to Probability Theory and Its Applications,
Second Edition, Volume II_ , ISBN 0-471-25709-5, John Wiley & Sons, New York,
1971.

~~~
srean
There's quite a bit more to COVID-19 prediction than standard derivation of a
logistic curve found in introductory differential equation books. SIR models
would perhaps be the simplest place to start.

[https://en.wikipedia.org/wiki/Compartmental_models_in_epidem...](https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#Bio-
mathematical_deterministic_treatment_of_the_SIR_model)

~~~
graycat
Nice reference. The discussion is mostly about _SIR_ (for susceptible,
infected and recovered) models but also with several generalizations to handle
more details, e.g., losing immunity and getting infected again. A lot of this
material goes back, say, to 1927.

First cut it appears that the differential equation I gave is an SIR model
except with R = 0, i.e., once are a customer of FedEx then remain one.

At one point article finds the logistic curve as I did -- maybe for the same
situation.

The article does touch on stochastic models, but I didn't see discussion of a
Markov assumption or Poisson process for time to the next infection.

There is an unclear reference to operator spectral radius, and maybe that is
related to the eigenvalues, vectors I mentioned.

Whatever, especially with the Wikipedia reference, it appears that we are a
step or two beyond the OP.

The work I did and described here I did decades ago and is not at all related
to the math of my startup now. So, here I was able to describe some of my old
work, but I'm doing other things now.

For the differential equation I gave, the solution as I derived it needed only
ordinary calculus and not the additional techniques of differential equations.
I just did the derivation in a hurry at FedEx just as a little calculus
exercise. I didn't consult a differential equations book. I discovered that
the solution was the logistic curve only by accident years later.

The Wikipedia article is nice.

~~~
srean
Yes that's I nice way to run into logistic. Some introductory books use that
as a motivating example, usually the second one. SIR is too simple to be
accurate, but is ok enough to paint with broad brush strokes. Sometimes simple
models are still useful as long as one does not read too much into the output.

Statistical epidemiology is quite heavily invested in stochastic differential
equation. Wikipedia would hardly suffice as an academic, up to date and an
exhaustive bibliography.

~~~
graycat
Sounds like the next step up would be stochastic optimal control, the field of
my Ph.D. dissertation.

Ah, been there, done that, got the T-shirt, and doing other things now!

~~~
srean
Heh ! that's why I said this
[https://news.ycombinator.com/item?id=22937262](https://news.ycombinator.com/item?id=22937262)

BTW you mention Prof. Bertsekas a few times, did you ever run into Pof.
Tsitsiklis ? He has done some work in the area of epidemics around 2015.
Mentioning just in case you knw him

------
yters
There is no such thing as an 'exponential curve' in our finite world. And the
derivative of the sigmoid is a bell curveish thing.

~~~
empath75
When you’re talking about a highly contagious disease it’s more or less a
distinction without a difference. It’s exponential until very close to the
point where a significant percentage of the vulnerable population is infected.
It’s not super interesting to point out that it’s no longer growing
exponentially, when half the population has it already.

~~~
yters
It is interesting to think about how long the exponential trend will continue
before becoming a sigmoid.

------
darksaints
The reason it is hard is that it is using a stateless model to approximate an
inherently stateful and often chaotic process.

Take tech adoption for example. Often incorrectly represented as s-curves,
they are the result of the inherent cost/benefit of the technology combined
with the stateful diffusion process of communication and inherently human
resistance to change. And being inherently stateful, you can have chaotic
influences in that diffusion process. For example, the idea of microservice
architecture had an absolutely massive diffusion jump the moment Amazon sent
out that now-famous email mandating adoption. It wasn't linear, it wasn't
exponential...it was a discrete step, and a very large one at that. These are
everywhere too, because communication doesn't propogate like bacteria grows,
it propagates due to extremely non-linear levels of influence. Bill Smith, 45
year old mid-level programmer for a tiny Midwest bank, will never have the
tech-adoption influence levels of a Steve Jobs or Alan May or Linus Torvalds.

A better option for modeling would be to use Monte Carlo methods or systems
methods. Something that acknowledges the inherent statefulness of the process.

~~~
KarlKemp
> Often incorrectly represented as s-curves, they are the result of the
> inherent cost/benefit of the technology [...]

You're taking from two completely different levels of abstraction here. Tech
adoption rather obviously happens in an S-shaped curve, at least sometimes.
See the article for examples. And of course that shape, and its exact
parameters, are the result of some underlying processes.

These two things aren't contradictory. Outside air temperature follows a
roughly sinusoid curve. It's the result of the earth turning and therefore
alternating between night and day. But it's still sinusoid.

And the S-curve does acknowledge state, or it would just be exponential.

Yes, there are different approaches to disease modelling, such as agent- and
rule-base simulations. Unfortunately, they tend to be really bad because we
just don't enough data to satisfactorily simulate societies at the level
necessary for this application.

~~~
darksaints
But having an s-shaped curve is not the same thing as being a sigmoid
function, in the same way that having a bell-shaped density is not the same
thing as having a gaussian distribution. There a tons of processes out there
that can approximate well with either of those two, while being extremely
different in extrapolation.

One thing that can immediately disprove mathematical sigmoid modeling: curve
symmetry. If the early adoption exponential growth is not exactly the same
shape as the late adoption slowdown, then you don't have a sigmoid function.

And that's the problem: if you're using a single function to model the result
of two (or more) separate processes with distinct mechanics and parameters,
you're going to have the same exact pitfalls as you would trying to model a
bimodal process with a single probability distribution. Namely that they might
fit well with interpolative methods, but completely fail with extrapolative
methods (like forecasting!).

