> Most machine learning techniques focus on problems where the signal is very st...

dsacco · on Aug 7, 2017

It's hard for me to follow exactly what you're getting at here, but (to make an analogy to cryptography) it seems like you're saying it's hard to find a signal because we only have the apparently random output of a function, not the seed itself.

It's fairly true that (at least these days) you're not going to identify a signal just by looking at a timeseries of prices, no matter how granular your dataset is (up to and including tick data). There are pockets of repeating patterns but those are vanishingly small and fleeting; the prices themselves may as well be stochastic.

Essentially all funds are empowered with significant amount of data, and the prices themselves are just used for backtesting and sanity checking. It's the source of truth, but it's not the way in which new insights are identified. The signal comes from other types of data that is far more reversible.

ithkuil · on Aug 9, 2017

Thanks, it makes sense.

The article cited in the OP says:

""" Our historical dataset contains 5 minute mid-prices for 43 CME listed commodity and FX futures from March 31st 1991 to September 30th, 2014. We use the most recent fifteen years of data because the previous period is less liquid for some of the symbols, resulting in long sections of 5 minute candles with no price movement. Each feature is normalized by subtracting the mean and dividing by the standard deviation. The training set consists of 25,000 consecutive observations and the test set consists of the next 12,500 observations. """

(excerpt from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2756331)

That sounds to me as if the only data they trained the model are a bunch of prices.

I'm I reading it correctly?