
Twitter mood predicts the stock market. - jontonsoup
http://arxiv.org/pdf/1010.3003v1
======
hamner
Take any paper titled along the lines of "X predicts the stock market" with a
big grain of salt. In this paper's case, the data analyzed is limited to under
one year, and we never know how many parameterizations or models the authors
looked at to get the correlations they published.

[http://nerdsonwallstreet.typepad.com/my_weblog/files/datamin...](http://nerdsonwallstreet.typepad.com/my_weblog/files/dataminejune_2000.pdf)

~~~
bermanoid
I've done a lot of work in this area for clients who come to me looking for
help automating their trading strategies. One of the first things I do before
I agree to do any work is give them a rather long lecture on the dangers of
overfitting; the main point of the lecture is that when you're setting up a
strategy, it's not enough just to test if the strategy (usually some sort of
model with a few free parameters that have to be fit based on historical data)
would have worked if you used it. You really need to make sure that your
strategy for coming up with the strategy (meta-strategy) would have resulted
in a working strategy if you'd used it, and ideally the meta-strategy should
be parameter-free.

This comes up a lot with moving average strategies (and most other technical
indicator strategies, for that matter): someone looks at the past year's worth
of data, decides that a 30 day EMA crossing a 5 day EMA is a great signal.
That's fine, but suppose you'd looked at a different year's data...would you
still have picked 30 and 5 day EMAs, or some other number?

But that's not enough. Because there are other parameters that you're probably
dicking around with, and those can get you, too: you've found that setting the
EMA lengths based on the past 145 days of data consistently gives you good
predictive power over the next month, which is great, but now you've just
shifted the burden. You've still got to verify that that the logic that made
you pick "145" would give a good window size if you'd looked at a different
time period.

And so on. Go far enough, and you'll usually discover that you're optimizing a
higher order window to achieve near-zero profitability on average, you've run
out of data, and your strategy is useless (don't even get me started on how
deep into the meta-meta...meta-window rabbit hole you should go before giving
up if you actually have a ton of data, like second by second or tick data...).
Eventually you'll stop trying to beat the market with moving averages...this
applies to most other technical indicators, as well.

What I find really helpful is to show people 2D heat map movies of the
performance of moving average strategies, both applied to real stock data and
to randomized data (I would post a link to one of these, but I can't find them
at the moment...I'll have to regenerate them, which takes quite a while, then
maybe I'll do an article on this, the pictures are actually quite pretty).
When they see the whole "landscape", it usually dawns on them that their
strategy creates qualitatively similar pictures when applied to randomized
data, and that means that there's nothing there that's at all worth following
up.

In the few cases where there _is_ something going on, it's usually pretty
simple to verify without any parameter hanky panky, because it shows up
consistently regardless of your windowing (these effects are usually not
tradable - for instance, if you look at very short term sales data it's easy
to see short MA strategies that work, and almost all of them do, but you
quickly realize you're merely observing the bid/ask bounce, which is not
something you're likely to be able to capture).

------
rohwer
At the moment, it's being put to the test:

British hedge fund invests 25 million pounds for IU professor’s Twitter
research <http://www.idsnews.com/news/story.aspx?id=80469>

------
oliakaoil
this was already posted a couple months ago wasn't it?

