Hacker News new | past | comments | ask | show | jobs | submit login
Twitter mood predicts the stock market. (arxiv.org)
8 points by jontonsoup on May 13, 2011 | hide | past | favorite | 4 comments



Take any paper titled along the lines of "X predicts the stock market" with a big grain of salt. In this paper's case, the data analyzed is limited to under one year, and we never know how many parameterizations or models the authors looked at to get the correlations they published.

http://nerdsonwallstreet.typepad.com/my_weblog/files/datamin...


I've done a lot of work in this area for clients who come to me looking for help automating their trading strategies. One of the first things I do before I agree to do any work is give them a rather long lecture on the dangers of overfitting; the main point of the lecture is that when you're setting up a strategy, it's not enough just to test if the strategy (usually some sort of model with a few free parameters that have to be fit based on historical data) would have worked if you used it. You really need to make sure that your strategy for coming up with the strategy (meta-strategy) would have resulted in a working strategy if you'd used it, and ideally the meta-strategy should be parameter-free.

This comes up a lot with moving average strategies (and most other technical indicator strategies, for that matter): someone looks at the past year's worth of data, decides that a 30 day EMA crossing a 5 day EMA is a great signal. That's fine, but suppose you'd looked at a different year's data...would you still have picked 30 and 5 day EMAs, or some other number?

But that's not enough. Because there are other parameters that you're probably dicking around with, and those can get you, too: you've found that setting the EMA lengths based on the past 145 days of data consistently gives you good predictive power over the next month, which is great, but now you've just shifted the burden. You've still got to verify that that the logic that made you pick "145" would give a good window size if you'd looked at a different time period.

And so on. Go far enough, and you'll usually discover that you're optimizing a higher order window to achieve near-zero profitability on average, you've run out of data, and your strategy is useless (don't even get me started on how deep into the meta-meta...meta-window rabbit hole you should go before giving up if you actually have a ton of data, like second by second or tick data...). Eventually you'll stop trying to beat the market with moving averages...this applies to most other technical indicators, as well.

What I find really helpful is to show people 2D heat map movies of the performance of moving average strategies, both applied to real stock data and to randomized data (I would post a link to one of these, but I can't find them at the moment...I'll have to regenerate them, which takes quite a while, then maybe I'll do an article on this, the pictures are actually quite pretty). When they see the whole "landscape", it usually dawns on them that their strategy creates qualitatively similar pictures when applied to randomized data, and that means that there's nothing there that's at all worth following up.

In the few cases where there is something going on, it's usually pretty simple to verify without any parameter hanky panky, because it shows up consistently regardless of your windowing (these effects are usually not tradable - for instance, if you look at very short term sales data it's easy to see short MA strategies that work, and almost all of them do, but you quickly realize you're merely observing the bid/ask bounce, which is not something you're likely to be able to capture).


At the moment, it's being put to the test:

British hedge fund invests 25 million pounds for IU professor’s Twitter research http://www.idsnews.com/news/story.aspx?id=80469


this was already posted a couple months ago wasn't it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: