
Time Series Prediction – A short introduction for pragmatists - makaimc
https://www.liip.ch/en/blog/time-series-prediction-a-short-comparison-of-best-practices
======
wjnc
And now for the great thing ... Prophet uses Stan underneath [1] and thus is
built on foundations of 'regular' Bayesian statistics. Andrew Gelman has
written about Prophet as well [2].

After reading this blog I am tempted to get the ML for time series book
though. I'd love to try and compare some less than trivial examples with
covariates involved.

[1] [https://peerj.com/preprints/3190/](https://peerj.com/preprints/3190/)

[2]
[https://statmodeling.stat.columbia.edu/2017/03/01/facebooks-...](https://statmodeling.stat.columbia.edu/2017/03/01/facebooks-
prophet-uses-stan/)

~~~
ganeshkrishnan
Facebook prophet is really impressive and has gotten us better precision with
shorter iterations even with large datasets compared to CNN+lstm.

We run it over millions of inventory over years of data and it has given
satisfactory results in majority of the cases

~~~
ayayecocojambo
have you tried xgboost? in our problems none of the LSTM/Fbrpohpet/arima
performed better fine-tuned xgboost model.

~~~
ganeshkrishnan
Just looked it up. Looks super interesting, I will surely give it a shot.
Thanks

------
chroem-
Basis function regression is a very under-appreciated method for producing
time series forecasts. I've found that it beats most of the methods described
in this article. Maybe I should make a blog post...

~~~
amrrs
Sorry for ignorance. Did you mean to say Simple Linear regression or something
else? Do you have any reference for that?

~~~
chroem-
Yes, it's an extension of linear regression. You can incorporate different
basis functions to model trend, seasonality, the effects of external
regressors, different frequency components, etc. It gives you a lot more
control over the forecast model.

~~~
riyadparvez
Are you referring to Generalized Additive Models (GAM)?

~~~
semi-extrinsic
I think GAMs are more for the case where you have many underlying input
variables and a single resulting response.

~~~
XuMiao
This is a less studied senario for time series research: contextual
forecasting. With enough contextual information, forecasting doesn't need to
squeeze the information from its own history that hard.

------
graycat
Here are some old references for the problem of the OP:

David R. Brillinger, _Time Series Analysis: Data Analysis and Theory, Expanded
Edition_ , ISBN 0-8162-1150-7, Holden-Day, San Francisco, 1981.

George E. P. Box and Gwilym M. Jenkins, _Time Series Analysis --- Forecasting
and Control: Revised Edition_ , ISBN 0-8162-1104-3, Holden-Day, San Francisco,
1976.

Brillinger was a John Tukey student at Princeton and long at Berkeley.

------
platz
Normally the method for dealing with 'time series' is really just finding ways
to turn a non-stationary distribution into a stationary distribution, where
you can then apply classic statistical methods on them. So you're just finding
ways to factor out the time component in the data so you can use the standard
non-time sensitive regression models on the transformed data.

I don't think it's untill you get to the NN based models that they start
treating time as a first-class component in the model.

* If I'm wrong please explain why instead of downvoting

~~~
k_f
A (weak) stationary distribution has no trend, no seasonality, and no changes
in variance. With these properties, predictions are independent of the
absolute point in time. The transformations to turn non-stationary series into
stationary ones are reversible (usually differencing, log-transformations and
alike), thus the predictions can be applied back to the original time series.

Treating time as a first-class component really just means to factor in the
absolute point in time into the models at training time. This only makes sense
if the absolute time changes properties of the distribution that cannot be
accounted for with regular transformations. If that's the case, then we assume
that these changes cannot be modeled, and are thus either random or follow a
complicated systematic we can't grasp. In the first case, a NN wouldn't
improve either, in the second case, we either need to always use the full
history of the time series to make a prediction, or hope that a complex NN
like LSTM might capture the systematic.

In any case, I think one of the more compelling reasons to use NN is to not
have to do preprocessing. The trade-off is that you end up with a complicated
solution compared to the six or so easy-to-understand parameters a SARIMA
model might give you. And the latter even might give you some interpretable
intuition for the behavior of the process.

------
hprotagonist
My very first move with timeseries data is to get to frequency space as fast
as I can.

~~~
wenc
Genuinely curious -- how do you create predictive time-series models in the
frequency domain?

(background: control systems)

~~~
hprotagonist
speech recognition immediately jumps to mind

~~~
wenc
Speech recognition is generally not considered a time series problem though.

~~~
hprotagonist
what about speech is nontemporal?

~~~
wenc
Time series problems are indeed temporal, but not all temporal problems are
time series problems. Time series deals with time-specific features of a
discrete sequence like autocorrelation, trends, seasonality, etc. whereas
frequency domain methods deal with, well, frequency.

Support you’re are looking at sales patterns over a long period of time, which
has certain patterns. FFTs are unlikely to tell you much that is useful or
predict much whereas time series methods can reveal patterns where the t is
the independent variable.

------
ulucs
I mostly feel these methods are quite overkill for most applications. As a
purist, I'd recommend starting out with a simple linear regression and then
moving on to adding methods to cover the letters of SARIMA by showing the need
for each. It may not be as flashy, but linear regression is a stupidly
powerful and very cheap tool for all kinds of situations.

~~~
mr_toad
As a complete non-purist I’d suggest chucking the data into Auto.arima and
seeing where that gets you. Not only will it save a lot of time, in my
experience it tends to produce better models.

[https://cran.r-project.org/web/packages/forecast/index.html](https://cran.r-project.org/web/packages/forecast/index.html)

~~~
in9
I saw an article which I can't remember that warned in which contexts
auto.arima might not work. But in the majority of times auto.arima outperforms
the old Box-Jenkins methodology.

------
cube2222
I really recommend Prophet as an easy to use option like the article says.

I needed anomaly detection for prometheus metrics integrated with grafana for
marking "anomalous" regions so the model doesn't learn them.

Took me a week to set it all up including packaging up as a Microservice and
deploying.

------
ayayecocojambo
In the multivariate time series forecasting problem i found out that fine-
tuned xgboost (and its variants) performs much better than fbprophet, sarimax,
RNN variations. Predicting time series with RNN is like killing a bird for
bazooka.

------
oli5679
I really appreciate the philosophy of defining metric, and measuring
performance going from simple to complex methods.

I've also used Prophet library, and find it works well out of the box.

------
TrackerFF
Gaussian Processes seems to be pretty popular, might want to include that to
the comparisons.

~~~
sanxiyn
Gaussian process is marvelous. Demos like
[https://statmodeling.stat.columbia.edu/2012/06/19/slick-
time...](https://statmodeling.stat.columbia.edu/2012/06/19/slick-time-series-
decomposition-of-the-birthdays-data/) just seem magical.

------
jldugger
Anyone patched prophet into graphite? Curious if the two are easily combined.

------
person_of_color
This is one type of time series, but how about audio time series?

