
Mistakes when Applying Computational Intelligence & ML to Stock Market Modeling - jnazario
http://arxiv.org/abs/1208.4429
======
gfodor
The paper focuses on a number of practical mistakes but overlooks some higher-
level ones that I think are also important:

\- Most ML algorithms perform well when there is some underlying phenomenon
whose characteristics are being statistically inferred by the model. However,
the stock market is now essentially a large network of computers trying to
model one another. You can imagine how this might break some of the underlying
assumptions that good ML results rely upon. We can see the results of this in
the increasing frequency of "flash crashes" caused by over-leveraged quant
hedge funds all tripping over sell triggers and/or getting margin calls at the
same time.

\- In practice, asset class/trading strategy correlations tend to change
dramatically over time. Nowadays we talk about the market being 'risk-on' or
'risk-off' modes, since we're used to seeing market-wide selloffs and buyins.
Your ML model can only do so much if the entire market is selling everything,
unless you're going to dip your toes in derivatives or short selling to which
I say: good luck! :)

\- A major, major issue is execution. You can have the greatest and most
accurate model in the world, but actually trading it is another beast
entirely. Bid/ask spreads and market movements from your trading, particularly
if you are dealing with a non-trivial amount of money or trading securities
that have less than ideal liquidity, is usually going to eat up any alpha you
might have. In my own backtests of fairly straightforward trading algorithms
even a minor 0.05% spread on bid-ask spreads for a weekly trading algorithm
can eat your lunch, nevermind if you're planning on doing intra-day trading or
trading anything other than the most popular funds/stocks.

\- Beyond any of these risks, you're going to have to inevitably suffer
through downdrafts. I don't know about you, but watching my money disappear as
it's being controlled by a trading algorithm/model that is subject to all
kinds of mistakes and bugs is well beyond my own intestinal fortitude.

~~~
ChuckMcM
_"Most ML algorithms perform well when there is some underlying phenomenon
whose characteristics are being statistically inferred by the model. However,
the stock market is now essentially a large network of computers trying to
model one another."_

This is one of the insights that I think deserves more ink than it gets.
Historically system's analysis has focused on predominantly linear systems
where underlying factors that were unseen but capable of being modeled as FSM
combined into emergent behaviors which were more complex but ultimately still
linear across the region of analysis.

That sort of analysis falls down however when the modelling system is a part
of the system it is modelling. The result can easily become non-linear (or
turbulent or chaotic depending on when you were introduced into systems
analysis :-)) and the calculus for those conditions is a lot harder to tease
out.

The feedback algorithms become signals to other feedback algorithms and you
get hard to predict changes which don't track the measured data, they track
the response to the measured data, which is itself changing in response to the
response.

From a practical viewpoint (for me) its of interest in discovering entities
that use ML techniques to game search engine rank which is itself being
derived by ML algorithms.

~~~
gfodor
Indeed -- it opens up a (scary) need for models to be "self-aware" of
competitors' models and their effects on the training data. Above my pay
grade, for sure.

------
ramana10
I work in a quant fund. Can I just say that the longer I spend in the
industry, the more I find that ML/AI techniques are useless (in general). It
almost seems like collective self-delusion in building ever more complex
systems that attempt to use the latest fad in the field without first
appreciating that your dataset is almost always just noise...

Me, I just do my boss's bidding like a good little solider and code up
whatever the fund wants. My personal portfolio is a vanilla asset allocation
model. Guess which one has done better for the past 2 years?

------
tzs
There was a really nice example of a mistake applying ML to currency exchange
rate forecasting given in the "Data Snooping" section of Yaser Abu-Mostafa's
"Learning from Data" course at Caltech
(<http://work.caltech.edu/telecourse.html>).

Here's the relevant lecture: <http://work.caltech.edu/library/173.html> (the
currency exchange example starts around 51:38).

------
dave_sullivan
Re: the insufficient data sets problem, I've been surprised how hard it is to
find large quantities of historical financial data. Decent sources of the data
seem very expensive, and I'm surprised there isn't a startup out there that is
tackling this well since I'd be willing to pay a decent amount of money for
minute by minute or hour by hour tick data--but am finding most subscription
prices crazy. Is this data really as hard to come by as it seems?

~~~
jayro
Here are two relatively inexpensive options for tick-by-tick market data:

    
    
       * IQFeed
       http://www.iqfeed.net/dev/
    
       * Nanex
       http://www.nanex.net/historical.html
    

The Nanex data is more expensive and harder to work with, so you might want to
start with the IQFeed data.

------
hooande
This could be renamed "Common Machine Learning Mistakes". Great advice all
around, of course. Insufficient data, lack of or incorrect data prep, and
poorly defined success criteria are problems that plague all forms of machine
learning.

As other commentors pointed out, the biggest problem with stock market
modeling is with trade execution and automated trading. A skilled ML
practitioner will know how to deal with the size of the data sets and the
normalization of the data. Placing the trades properly and taking into account
the myriad sources of information available will give even experts a hard
time.

------
alexholehouse
Nice, accessible paper, but I find it galling that the images are so low
quality. This paper is by no means alone in suffering from low image quality -
I was reading a Nature paper earlier today where the graphics had compression
artifacts.

------
raverbashing
Pet peeve: reading two-column pdfs on screen is awful!

Article is nice, sometimes when using machine learning people forget how it
works, its limitations, etc, and sometimes a solution that fits better a
certain range is not the best solution

~~~
acqq
On typical screens made only to display movies, sure. On the 10 inch retina
iPad the article page fits perfectly the screen in portrait mode and
everything is perfectly readable. If there's a perfect use case for that 10
inch device it's this.

~~~
raverbashing
Actually the biggest problem is coming from the bottom of the 1st column to
the top of the 2nd column (and even on paper this can be bothersome sometimes,
well, it's their standart)

If the pages fits entirely on the iPad screen, great!

