

Twitter Can Predict The Stock Market - pmorici
http://www.wired.com/wiredscience/2010/10/twitter-crystal-ball/

======
3pt14159
Why publish this if it works? One article, a fleeting bit of fame, and a truck
load of copycats. It would be more impressive if the article had read:
"Researchers, upon discovering tweets predict the stock market, make $100mm
before disclosing research to the public." No more need for University grants
anymore, much more believable findings. As an aside, data analysis can be
tricky. I'm pretty wary of loosely defined research objectives. For example,
why is it 3 days? Why is it those 72 words? Over-fitting is a real problem
with prediction based stuff.

~~~
futuremint
I've always been fascinated by the romantic idea of writing my own trading
engine.

So I did some research, and most people who have written them will tell you
that in cases like this, training on past data doesn't correlate well with
current & future data.

The stock market of 2011 is not the market of 2008.

But what do I know, not like I've actually done it :)

~~~
bobds
A good place to start: <http://www.collective2.com/>

You can rent your trading strategies to others, or rent someone else's
strategy.

Also some good info/tools regarding automation.

~~~
yummyfajitas
I haven't looked closely at this site, but it seems like it is almost certain
to devolve into a textbook example of adverse selection.

If you have a good strategy, you won't rent it out, you'll trade it. Why risk
others frontrunning you? Of course, you might post a historically good
strategy and front run it. Or they might just be risky strategies, which look
good for a short time (encouraging people to rent them), but which carry
catastrophic risks the creator doesn't want to take on.

I can't see a single reason why someone would post a good strategy here.

~~~
bobds
(Preface: I pretty much know nothing about stock markets and trading.)

Isn't there a whole industry that revolves around paying people to provide you
with their trading strategies? How is that considerably different than what's
happening on Collective2?

Having a good strategy doesn't mean you have the money to actually trade on
it. Or that you can slowly build up your trading bankroll using the same
strategy.

Then there's strategies that only yield modest returns. Why not make some
money on top of that by renting it out? If you let a dozen people use your
strategy, does acting on that information give you much of an advantage? I
would guess that depends on how much money those people are trading on your
strategy.

I would also guess there aren't many big players renting strategies on
Collective2. It's an interesting concept and I think the fact that they've
been active since 2003, somewhat validates the idea.

The best thing about it seems to be the ease of using an automated trading
agent. I don't know how easy it is to do that elsewhere, but one reason to put
your strategy on Collective2 (I'm guessing you can keep it private) would be
to use their automation facilities.

~~~
yummyfajitas
There are several whole industries revolving around paying people for help
with trading strategies. Most have very specialized economics.

A hedge fund requires capital to operate, and the owners can't necessarily
cover fixed costs (salaries, etc) with their own personal capital. I'd be
surprised if many of the strategies on collective2 fit this model

Investment advisers often fine tune a strategy to match your personal risks -
i.e., help Southwest Airlines hedge their exposure to gas prices, or Apple to
hedge their exposure to the RMB. Since Southwest is already short oil due to
being an airline, the trading strategy of going long evens them out. It
wouldn't make sense for me to trade this strategy, since I don't have an
intrinsic short position in oil (plus the alpha in Southwest's strategy comes
from selling flights, not oil).

 _If you let a dozen people use your strategy, does acting on that information
give you much of an advantage?_

Buy $10k of some low volume stock. Have a few other people pile on and buy the
same stock (after you). The price will go up a few cents. Then you sell,
probably to the same people buying from you. This is called frontrunning. If
you didn't frontrun, you bear the risk that one of your renters would buy the
shares before you do, thereby driving up the price before you purchase it.
Less of an issue with GOOG, admittedly.

------
charlief
I have seen the study a few times, most recently in
<http://news.ycombinator.com/item?id=1803505> . I think the big problem is
selection bias:

The Dow Industrial Average over the last 10 years

[http://www.google.com/finance?chddm=997050&q=INDEXDJX:.D...](http://www.google.com/finance?chddm=997050&q=INDEXDJX:.DJI)

* Notice that the end of 2008 was unusual for the index. 2008 had the most herded and fearful stock market in recent history. If at anytime the stock market was correlated to mood, it would be then. I am not sure if a 2008 analysis can be generalized to any year but 2008.

* They have not done an analysis on 2009 or 2010, and they chose to split the analysis and pick December 2008 based on a qualitative assumption from the "stabilization of DJIA values after considerable volatility in previous months and the absence of any unusual or signiﬁcant sociocultural events". December 2008 was very much in the midst of the crisis still.

* For their December "stable" data set, they only used 30 days. That is limited in sample size. There is a big pool to draw from since 2009 as the market has been relatively stable.

~~~
achompas
Right, upvoted. I read the paper and the authors are very particular in their
sample selection. How could someone choose 2008 as a sample? I'd be much more
impressed if they used a larger sample.

Also, some food for thought: it would be interesting to see someone testing
Twitter moods as an instrumental variable for a project.

------
Judson
I may be wrong, but an initial success rate of 73.3% _before_ adding the
emotional data seems like overfitting.

~~~
wildwood
Depending on how they defined success, 73% might be achievable just by looking
at co-correlation. Up days and down days tend to run in streaks. If they
defined success as predicting 'up' or 'down' for the day for the DJIA, just
going with the most popular result for the last N days could work.

But overfitting is definitely still a concern. Looking at the overall trend
for the Dow Jones in 2008, I wonder what the success rate of an indicator that
always said 'down' would be.

~~~
beagle3
> 73% might be achievable just by looking at co-correlation. Up days and down
> days tend to run in streaks.

If that were true, there would be an exceptionally easy way to make money: Buy
a future or option today based on yesterday's move. Leverage ad infinitum.

Random coin flips also tend to run in streaks, btw - in a few thousand throws,
you'll probably have several 10 "head" streaks and several 10 "tail" streaks.

~~~
wildwood
_there would be an exceptionally easy way to make money: Buy a future or
option today based on yesterday's move. Leverage ad infinitum._

This kind of co-correlation and general market direction is already baked into
the option and futures prices. Also, they don't tend to fluctuate as much from
day to day, since their prices reflect what the value will be on the contract
delivery date, not what the price will be tomorrow. I'm not clear on what
profit opportunity you're seeing.

 _Random coin flips also tend to run in streaks, btw - in a few thousand
throws, you'll probably have several 10 "head" streaks and several 10 "tail"
streaks._

And in a flat market, that's often the behavior you see. When the market
starts trending, though, the coin starts acting 'rigged', and streaks in the
prevailing market direction tend to become longer.

~~~
beagle3
I do not know what this "co-correlation" that you speak of is, and google
doesn't seem to either. Assuming you are speaking about day-to-day
correlation:

I don't know what futures you were thinking of, but financial futures (single
stock, index futures, currency futures) track the base value EXACTLY (but also
taking into account interest rates, dividends, etc). If this weren't the case,
there would be an immediate arbitrage opportunity.

Specifically, once you factor the interest rate out, the DJIA future and the
DJIA index are in sync within seconds. The HFT traders take care of that.

And while it is true that the market does trend occasionally (more than a coin
flip), timing the start and end of the trade is empirically very hard.

If you know the market is trending, why don't you buy a future betting on the
trending direction, with a stop at 2 ticks above and below your entry price?
If the market is trending, you have positive winning expectation.

Except the market doesn't work that way - and if you think the market is
trending when it isn't, you lose money with this scheme.

------
iwwr
The trouble with these financial models is that once they become common
knowledge, it's too late. The market absorbs these algorithms into its pricing
mechanism and renders no further arbitrage profits.

------
seb
So can google. Supposedly Sergey even suggested to start a Hedge fund, but it
would probably be insider trading if they would do their decisions based on
the user data.

~~~
daeken
How could it be insider trading, if they're not doing anything with GOOG?

~~~
jimmyk
Companies that have non-public information about companies that they trade
with are barred from using that information to trade stocks. For example if
company B which manufactures bullets orders gunpowder from company G, and
company B starts ordering a lot more gunpowder, employees from company G can't
buy more stock of company B due to that knowledge. It seems like Google could
fall into the category of company G since it has non-public search terms from
other companies.

------
vannevar
I would need to hear more to be convinced. The fact that they had a large
number of signals they were tracking, without a clear rationale for any one of
them, is troubling.

Consider a set of random signals; arbitrarily select one as the benchmark.
Then from among the rest take the signal that best predicts the daily
direction of the benchmark. That signal will likely have much better than 50%
accuracy because by definition the worst signal will be around 50% accurate
(if it were any less it would have an equally useful inverse correlation).

------
parkerboundy
There is a typo in the article: the software they are using is called
OpinionFinder and can be found here
<http://www.cs.pitt.edu/mpqa/opinionfinderrelease/>

------
Dn_Ab
I know very little about trading but even I can see a whole bunch of red flags
here. Firstly, if it has just made it to the news then its probably a decade
too late to take advantage of. Recently there was a news article about how
firms have software that reads and operates based on news events. Except that
this recent 'news' article was about something that had already been happening
for years. Secondly, twitter contains a subset of information in the market.
No surprise that there will be some correlation.

Then: predicting up or down movement of a stock is very vague. At what time
scale, what sort of trades are required and what sort of response times to
execute. What are its drawdowns like, does it account for taxes, commission
fees etc. Next, use of a complex nonlinear learning model with lots of
parameters - raises alarm bells - these tend to be very susceptible to noise,
trading data is highly correlated and typical regularization methods often do
not suffice. Then there is the whole issue of over-fitting in general, data
used to train on (size, survivor bias, accounting for splits and what not)
which makes the whole thing very hand wavy. Without additional info as basic
as rate of return, the stated 83% accuracy is meaningless. Like with all
things, its easy to get results that work within the limited and safe confines
of academic testing but actually shipping a working product is another story.

There has always been a draw to beating the stock market. And these days there
is nothing more romantic than doing so using Artificial Intelligence! But I
think the most important part of any trading strategy is to be made up of
parts that are constantly being swapped out and replaced based on research.
you can't just throw a machine learning algorithm at it and think job done.
The thing will likely only profit for a couple microseconds. however, as an
aside, I would not be surprised if one of [anti]spam/virus/botnets or HFT wars
will one day produce AI.

------
hogu
I skimmed the paper, but I couldn't find very much information on how they did
the cross validation (like, what dates, they trained, and then what dates they
tested the prediction) Also - I do believe that tweet sentiment can predict
the stock market, but not on such a large timescale. I would guess that any
analyst reading the news could have a good estimate of sentiment, at least as
good as the twitter opinion finder. I think the twitter opinion finder is
useful when you want to measure sentiment at a rate higher than that which
humans can do it.

------
fbnt
I remember doing some research about this a while ago. Getting some sort of
text-based emotional index isn't trivial at all, there are few hardly viable
solutions (google's prediction api and Bayes-based algorithms), but they
aren't really accurate. This is also been tried in the past by startups of
techcrunch fame such as stockmood.com, all failed miserably. Props to twitter
or anyone who will succeed at this.

~~~
zackattack
I have been doing research on this problem, send me an email if you'd like to
connect.

------
achompas
Edit: removed some points b/c charlief made them more succinctly.

An even better question: is the relationship causal? The researchers use
Granger causality analysis to test their hypothesis. Wikipedia tells me this
analysis "may produce misleading results when the true relationship involves
three or more variables." [2] By definition, Twitter and the DJIA are macro
aggregates of a number of factors. How could the researchers apply Granger
here?

[1] See Table 1 at <http://www.sca.isr.umich.edu/documents.php?c=c>

[2] <http://en.wikipedia.org/wiki/Granger_causality#Limitations>

------
herrherr
Old discussion:

<http://news.ycombinator.com/item?id=1803505>

------
grantbachman
I find this interesting because the consensus among the economic community is
that markets are highly efficient, that is, information is reflected
immediately in stock market prices. This suggests information exists which is
not being reflected. That's why I'm skeptical.

~~~
loewenskind
>the consensus among the economic community is that markets are highly
efficient

Really? There are some true believers out there under this impression, but I
didn't think anyone credible was. It wasn't so long ago that someone showed
efficient markets were an P=NP problem.

EDIT: I'm not the one who downvoted you.

~~~
yummyfajitas
I suspect you believe markets are close to efficient. If you don't, you are
either investing large fractions of your wealth in a strategy that you believe
will beat the market, or you are irrationally throwing your money away. Which
is it?

The NP completeness of efficient markets has been known for quite a while.

[http://dpennock.com/papers/pennock-ijcai-workshop-2001-np-
ma...](http://dpennock.com/papers/pennock-ijcai-workshop-2001-np-markets.pdf)

It wasn't so long ago that some jerks at Princeton wrote a paper along the
same lines, completely ignored all the existing literature to make their paper
appear more novel, and got a lot of publicity for themselves (hint: prediction
markets are unsexy, CDO's are sexy).

~~~
llimllib
NP completeness is a strawman here. It's perfectly plausible to have an
efficient market where the problem of accurate pricing is NP complete.

Furthermore, the paper you reference (while an excellent and fascinating
paper) does not directly bear on the NP completeness of the stock market:

> In Section 3, I discuss the prospect of opening securities markets that pay
> off contingent on the discovery of solutions to particular instances of an
> NP-complete problems. Such NP markets would provide direct monetary
> incentives for developers to test and improve their algorithms, and allow
> funding agents to target rewards to the designers of the best algorithms for
> the most interesting problems. In Sections 4 and 5, I discuss markets in
> #P-complete problems, where prices serve as collective approximate bounds on
> the number of solutions, and bid-ask spreads may indicate problem difﬁculty

is his summary of what the paper does (sections 1 and 2 are introductory
material). I claim that this does not at all show the NP completeness of
markets, and further that it's a claim irrelevant to the discussion here.

In what sense are you claiming that he proves the "NP completeness of
markets"? What does that mean? Why is it relevant to whether or not to invest
money in the stock market?

(sidenote: I don't think the question of the NP-completeness of some questions
related to stock pricing is irrelevant or uninteresting; indeed I just applied
to grad school to study problems like these. I just don't think they bear on
what you're implying they do)

That said, I voted you up because of your first sentence.

~~~
yummyfajitas
Neither the paper I linked to nor the paper written buy the Princeton guys
shows that equities markets are NP complete. They both show that markets in
certain derivatives are NP complete.

However, I made a mistake and linked to the wrong paper. Here is the correct
one:

[http://dpennock.com/papers/fortnow-dss-2004-compound-
markets...](http://dpennock.com/papers/fortnow-dss-2004-compound-markets.pdf)

Basically, the result says that if you have a market in derivatives which pays
off when certain formulas in propositional logic are true (e.g., a derivative
which pays off if A && (!B || C) is true, for specific events A,B,C), then the
auctioneer's matching problem is NP complete. The auctioneer's matching
problem is simply market making, and if the market were efficient, this
problem would already be solved (by looking at prices).

I don't think that loewenskind's claim is true, for the most part, I was just
providing a more detailed source on NP completeness of some markets.

------
bhickey
Is Twitter predicting the market or are traders moving it based on perception
of prediction?

------
mikeleeorg
Out of curiosity, has anyone used, or know someone who uses
<http://stocktwits.com/> to inform their trading decisions - and had a
positive outcome? I haven't, I'm just curious.

~~~
nowarninglabel
Haven't used it, but not sure why most would when there are already tools like
this built in to most online brokerages. Most of the trading advice I have
seen goes in two places, mass market appeal such as this, or obscure and
sometimes secretive forums.

------
palewery
If you can decode a way to 'beat the market' that means once you start beating
the market someone else is loosing. They will adjust their trading techniques
and your "algorithms" are now wrong

~~~
adamtmca
I think you misunderstood the efficient market hypothesis.

~~~
palewery
I think you misunderstand 'hypothesis'

------
scrrr
I doubt those hn users that think this is working will say so. ;)

------
andrewcamel
Can someone please link to a download of the OpenFinder module?

------
seanmalarkey
this is pretty fascinating.

