
Comprehensive Tutorial on Time Series Modelling and Forecasting - min2bro
https://kanoki.org/2020/04/30/time-series-analysis-and-forecasting-with-arima-python/
======
latentdeepspace
Everyone repeat after me: "we need a baseline model".

You should always try some "dumb" models first. You'd be surprised how hard is
to beat (of course depends on your KPIs) a historical average model with a
more sophisticated method.

~~~
rmrfstar
But how am I going to get that VC money if I don't say "deep learning"?

~~~
jointpdf
I mean...you can always appeal to “old school” AI. Just dig in to the old
papers and use their words. Latent semantic analysis (LSA) is an example of a
hard to beat baseline model for text:

 _“By inducing global knowledge indirectly from co-occurrence data in a large
body of representative text, LSA acquired knowledge about the full vocabulary
of English at a comparable rate to schoolchildren.”_
([http://www.stat.cmu.edu/~cshalizi/350/2008/readings/Landauer...](http://www.stat.cmu.edu/~cshalizi/350/2008/readings/Landauer-
Dumais.pdf))

~~~
laretluval
Modern methods for deriving word embeddings easily beat LSA.

~~~
jointpdf
Hard to beat in terms of effort vs. quality of the outcome is more accurately
what I meant (it’s two lines of code in scikit-learn [ _CountVectorizer() +
TruncatedSVD()_ ] to go from raw text to document/word embeddings, and the
result is often “good enough” depending on what you’re trying to do). See the
results on pg. 6 (note LSI==LSA):
[http://proceedings.mlr.press/v37/kusnerb15.pdf](http://proceedings.mlr.press/v37/kusnerb15.pdf)

Also, at least based on papers I’ve read recently, BERT doesn’t work that well
for producing word embeddings compared to word2vec and GloVe (which can be
formulated as matrix factorization methods, like LSA). See table on pg. 6:
[https://papers.nips.cc/paper/9031-spherical-text-
embedding.p...](https://papers.nips.cc/paper/9031-spherical-text-
embedding.pdf)

Point being: mastering the old models gives you a solid foundation to build
from.

------
platz
To my amateur eyes, normally the method for dealing with 'time series' is
really just finding ways to turn a non-stationary distribution into a
stationary distribution, where you can then apply classic statistical methods
on them. So you're just finding ways to factor out the time component in the
data so you can use the standard non-time sensitive regression models on the
transformed data.

It seems like it's very challenging to either have time as a first-class
component in the model or somehow treat the data points as not independent.
Indeed most models require independence so often it seems like we try to force
the data to look that way by smoothing and transformations. You can assume
this anytime an algorithm is asking you to provide 'Stationarity'. It just
seems like trying to look for the keys (prediction) where the streetlight is
(model distributions with nice calculation properties).

~~~
montecarl
The problem with time is that it is (typically) not a causal variable. If you
are modeling the price of a stock for example, time is certainly not what is
causing to go up or down! Yes it is true, that the price at time t+1 is highly
correlated with the price at time t, but extrapolating outwards must require a
more sophisticated model that includes the real causal variables.

~~~
platz
so then, discounting making time itself a causal variable, it seems like using
methods that rely on stationary distributions still treat the data, after pre-
processing, as i.i.d, rather than predicting values from their correlated
history.

I'm interested in methods that don't "subtract" simple "trends" and
"seasonality" from the data (which may work for bog-standard templates such as
sales data but not what I'm interested in), and rather responds to sequential
relationships in the data itself, that exploits exactly the correlations you
describe directly.

~~~
beagle3
> I'm interested in methods that don't "subtract" simple "trends" and
> "seasonality"

a 2nd order difference equation can model a single harmonic frequency - that
is, if your data is a pure sine and sampled at regular intervals, then

    
    
        x_n =~ a x_n-1 + b x_n-2
    

can model any frequency with the proper a and b values (machine precision
limits apply in real world scenarios, of course); That is, if your data looks
like a sine wave with a yearly period, you still need no more than one sample
per minute and 2nd order model to filter it out.

It's likely not a perfect sinewave, so you'd need a lot more - but if you are
incredibly lucky and your periodic underlying admits a (relatively) sparse
harmonic decomposition, and the signal riding on it has (very) low amplitude
compared to the periodic signal, you can model very long periods implicitly by
just having enough recent samples.

~~~
platz
very interesting, thanks!

------
splittingTimes
For the interested, here is an overview into neural forecasting from the folks
at Amazon research:

Neural forecasting: Introduction and literature overview

[https://arxiv.org/pdf/2004.10240.pdf](https://arxiv.org/pdf/2004.10240.pdf)

------
riyadparvez
Is there any other good resource on time series modeling and forecasting other
than exponential smoothing and variants of ARIMA? Pretty much every tutotial
on the web is on exponential smoothing and ARIMA or some lazy LSTM tutorials.

~~~
em500
Some good free textbooks are Rob Hyndman's online book
[https://otexts.com/fpp2/](https://otexts.com/fpp2/) and Brockwell and Davis'
old textbook
[https://link.springer.com/book/10.1007/978-3-319-29854-2](https://link.springer.com/book/10.1007/978-3-319-29854-2).
They focus much on ARIMA and exponential smoothers, because most time series
data are pretty small sized (a few dozens to at most a few thousand samples),
so there's really not that much else that can do.

Most of Hyndman's textbook approaches (mostly ARIMA and various exponential
smoothers) are implemented in his 'forecast' R package.

ARIMA and exponential smoothers tend to be a bit hard to get working well on
daily data (they come from the era where most data was monthly or quarterly).
A modern take on classical frequency domain Fourier regression is Facebook
Prophet
([https://facebook.github.io/prophet/](https://facebook.github.io/prophet/))
which tends to work pretty well if you have a few years of daily data(
[https://facebook.github.io/prophet/](https://facebook.github.io/prophet/) )

~~~
claytonjy
FPP is great, but limited to the simplest possible timeseries: a single number
recorded at evenly-spaced intervals.

Anyone know of good resources for multivariate, multimodal, irregular
timeseries forecasting? I know some great practical tools and tutorials
(prophet, fast.ai), but I'd love to inject some statistical knowledge like FPP
offers.

~~~
em500
Mostly from my own knowledge/experience:

\- Multi-variate: text book treatments tend to focus mainly on Vector Auto
Regression (VAR) models. Unrestricted VARs scale very badly in vector
dimension, so the often end up in some regularized form (dimension reduced by
PCA or Bayesian priors). Lütkepohl's textbook is the standard reference.

VAR type models in my view not very practical for most business time series.
You should probably not waste too much time on them unless you're really into
macro-economic forecasting, in which case you're wasting your time anyway :).
VAR forecast accuracy in macro-economics is not great to put it mildly, but we
have nothing really better).

An alternative to VARs for multivariate time series are state space models,
which are described mostly in Durbin&Koopman and Andrew Harvey's time series
textbooks. These model types was recently popularized in tech circles by
Google's CausalImpact R package (though that package I think only implements
the univariate model).

\- Multi-model: if you need to model some generic non-Gaussian time series
process some slow generic simulation method (MCMC, particle filtering). I
can't recommend any good reference since I haven't kept up with the literature
for about 15 years. I only remember a bunch of dense journal papers from that
era (e.g.
[https://en.wikipedia.org/wiki/Particle_filter#Bibliography](https://en.wikipedia.org/wiki/Particle_filter#Bibliography))

\- Irregular: if the irregularity is mild (filling up a relatively small
number of gaps/missing data), you can do LOESS, smoothing splines, Kalman
filtering, which should all get you pretty similar results. If your time
series are extremely irregular, probably no generic method will do well and
you probably need to invest some days/weeks/months into a fairly problem/data-
specific method (probably some heavily tuned smoothing spline)

------
doctoboggan
The readers interested in this article are probably able to give me good
advice. I've been collecting stats daily on myself for the past year (weight,
activity, calories consumed, sleep hours, etc) and I would love to be able to
explore and extract interesting trends and relationships from the data.

Is there an easy tool where I can just drop in all the data and it presents me
with some sort of dashboard? I would love it if the tool could identify and
present interesting relationships (i.e. weight and calories consumed are
strongly correlated)

Does anyone know if something like that exists? Or should I start rolling my
own using python/pandas?

~~~
lowdose
[https://github.com/arielf/weight-loss](https://github.com/arielf/weight-loss)

~~~
doctoboggan
Thanks!

------
cakeofzerg
Currently learning gluonTS, SEEMS GOOD

~~~
ranc1d
Adding link in case others are intested!

[https://gluon-ts.mxnet.io/](https://gluon-ts.mxnet.io/)

------
elteto
What would be some good graduate programs (I'm thinking Master's level) in the
US that specialize in time series modeling and forecasting? Any available
online?

~~~
siegelzero
Penn State has a bunch of their graduate stats courses online [1]. I worked
through some of their time series class [2] and found it to be pretty good
quality.

[1]
[https://online.stat.psu.edu/statprogram/](https://online.stat.psu.edu/statprogram/)
[2]
[https://online.stat.psu.edu/statprogram/stat510](https://online.stat.psu.edu/statprogram/stat510)

------
pupdogg
Your page has a bigger focus on Google ads than the subject matter itself.

------
leeoniya
coincidentally, i posted this not too long ago:

[https://news.ycombinator.com/item?id=23045207](https://news.ycombinator.com/item?id=23045207)

------
ngcc_hk
Last time using it is 1981. Still relevant today in ML era?

~~~
tomrod
Yes.

