Hacker News new | past | comments | ask | show | jobs | submit login

> Most machine learning techniques focus on problems where the signal is very strong, but the structure is very complex. For instance, take the problem of recognizing whether a picture is a picture of a bird. A human will do well on this task, which shows that there is very little intrinsic noise. However, the correlation of any given pixel with the class of the image is essentially 0. The "noise" is in discovering the unknown relationship between pixels and class, not in the actual output.

Could it be that by looking only at the prices timeseries we are not looking at the actual information but only at the output of a irreversible function and that for effectively predicting the prices we need a model that captures what actually happens in the real world?




It's hard for me to follow exactly what you're getting at here, but (to make an analogy to cryptography) it seems like you're saying it's hard to find a signal because we only have the apparently random output of a function, not the seed itself.

It's fairly true that (at least these days) you're not going to identify a signal just by looking at a timeseries of prices, no matter how granular your dataset is (up to and including tick data). There are pockets of repeating patterns but those are vanishingly small and fleeting; the prices themselves may as well be stochastic.

Essentially all funds are empowered with significant amount of data, and the prices themselves are just used for backtesting and sanity checking. It's the source of truth, but it's not the way in which new insights are identified. The signal comes from other types of data that is far more reversible.


Thanks, it makes sense.

The article cited in the OP says:

""" Our historical dataset contains 5 minute mid-prices for 43 CME listed commodity and FX futures from March 31st 1991 to September 30th, 2014. We use the most recent fifteen years of data because the previous period is less liquid for some of the symbols, resulting in long sections of 5 minute candles with no price movement. Each feature is normalized by subtracting the mean and dividing by the standard deviation. The training set consists of 25,000 consecutive observations and the test set consists of the next 12,500 observations. """

(excerpt from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2756331)

That sounds to me as if the only data they trained the model are a bunch of prices.

I'm I reading it correctly?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: