Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> ML (particularly DL) tends to outperform "classical" statistical time series forecasting when the data is (strongly) nonlinear, highly dimensional and large.

This claim about forecasting with DL comes up a lot, but I’ve seen little evidence to back it up.

Personally, I’ve never managed to have the same success others apparently have with DL time series forecasting.



It's true simply because large ANNs have a higher capacity, which is great for large, nonlinear data but less so for small datasets or simple functions.

In any case, Transformers are eating ML right now and I'm actually surprised there's no "GPT-3 for time series" yet. It's technically the same problem as language modeling (that is, multi-step prediction of numerics), however, there is only a comparably little amount of human-generated data for self-supervised learning of a time series forecasting model. Another reason might be that the expected applications and potentials of such a pre-trained model aren't as glamorous as generating language.


> It's technically the same problem as language modeling

You're thinking of modeling event sequences which is not strictly speaking the same as time series modeling.

Plenty of people do use LSTMs to model event sequences, using the hidden state of the model as a vector representation of processes current location walking a graph (i.e. a Users journey through a mobile app, or navigating following links on the web.)

Time series is different because the ticks of timed events are at consistent intervals and are also part of the problem being modeled. In general time series models have often been distinct from sequence models.

The reason there's no GPT-3 for any general sequence is the lack of data. Typically the vocabulary of events is much smaller than natural languages and the corpus of sequences much smaller.


There's a deeper issue. All language (and code and other things in the GPT/etc corpora) seem to have something in common - hierarchical, short- and long-range structure.

In contrast, there is nothing that all time series have in common. There's no way to learn generic time series knowledge that will reliably generalise to new unseen time series.


Like I said, still not seen any evidence.


Then look at some of the past time series related Kaggle challenges, plenty of evidence there in the winning solutions.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: