

A new algorithm predicts which Twitter topics will trend hours in advance - denzil_correa
http://web.mit.edu/snikolov/Public/trend.pdf

======
hooande
This seems like a great trending topic detection algorithm, but nothing
revolutionary. The "hours in advance" part is probably more to do with the
scale of twitter's data and their decisions about when to publish topics on
their website than it does the novelty of the approach.

It seems like a great and tested method for developing your own trending
topics algorithm, which could be useful in other contexts. I wonder if this
could be applied to corporate email servers to let management spot issues
before they're brought up.

~~~
ljd
This is something I would spend money on. Does anyone know if such an
Exchange/Gmail plugin exists?

------
VMG
Does the algorithm still work when it includes it's own effects?

If an algorithm like this was widely used, the predicted topics will trend
just because they have been picked -- the algorithm will fulfill the prophecy
itself.

~~~
mmishra
Exactly. I was also thinking that Finacial Engineering also tries to garner
the advantage of statistical analysis of stock market based on previous
trends. But I am sure there is no 100% accurate model. Moreover statistical
model need to be changed as behavior start changing.

I wonder if that will reflect here as well.

In any case, It has some interesting applications as they mentioned in
ticketing.

------
denzil_correa
The only concern I have is the way the experiment is conducted. The correct
way to collect data would be to consume the Streaming API to collect the
random sample of tweets provided by Twitter [0]. They should then run their
algorithm to compare how they are doing against Twitter's algorithm.
Currently, they are filtering out certain topics of interest and just
comparing these topics. Therefore, they are filtering a great degree of noise
which is actually observed by Twitter.

[0] <https://dev.twitter.com/docs/api/1.1/get/statuses/sample>

~~~
snikolov
Thanks for the interest! We collected tweets by sampling a small percentage of
all tweets in a time window, to emulate what one might get from the streaming
API (I did this as part of the VI-A <http://vi-a.mit.edu/> masters program at
MIT, and was an employee of Twitter). We did pick a fixed set of topics to
track, but those were randomly sampled (though we did get rid of topics that
trended multiple times in a large time window, like the name of a football
player who scored in multiple matches, in which case we don't know which event
we are trying to detect). One thing the algorithm doesn't do at the moment is
come up with its own trending topics. It just tests prediction of
trending/non-trending on a hold-out set taken from the original set of topics.

------
Tipzntrix
So this is an algorithm to determine what another algorithm will determine
will be popular? Meta.

------
lucvh
Do I predict this sparking a restriction in data availability in the API?

~~~
mhuffman
That was my initial thoughts. Might this be a potential way to monetize
twitter finally?

~~~
klapinat0r
Could you elaborate please? Are you refering to the "early access" to trending
topics, or restricting limits on API?

~~~
alanctgardner2
Funnily enough, Twitter could already sell 'early access' without any new
algorithms. Just time shift the availability of results based on payment
status. So for £100, see things that are guaranteed to be trending next hour,
because they're technically trending now, but we're withholding it.

------
danso
I'm bookmarking this to read later, but how much more sophisticated does the
algorithm have to be than:

1) Create a list of 500 - 1000 active, relatively popular Twitter users: this
would eliminate most celebrities who only tweet casually or delegate it to
their PR people...presumably, by the time they tweet something, it's already
huge.

2) Segregate the sample group of Twitter users into cliques

3) When any topic spreads between multiple cliques at an accelerated rate,
that topic will likely trend

In addition, have a list of mega-popular celebrities and assume that most of
what they tweet has a high probability of being a trending topic.

Twitter has some kind of formula for removing constantly-popular topics (or
else Justin Bieber would forever be on the list)...if there's an easy way to
include that, then it seems like predicting trending would be straightforward?

~~~
snikolov
I agree with you --- this seems like a perfectly good way to tackle the
problem of trend prediction directly. What we had in mind was something that
would be more generally applicable to any kind of time series data, and we
figured it would be interesting to test it on Twitter trends.

------
sanxiyn
The algorithm itself seems very simple. The paper quotes "The Unreasonable
Effectiveness of Data". The surprising thing is the simple algorithm works
great given large data.

------
philip1209
Is it easy for academic researchers to get firehose access?

~~~
snikolov
Not in general. I was part of the VI-A program (<http://vi-a.mit.edu/>) at
MIT, which allows you to do your thesis at a company.

------
Finster
I think I could use this to gain tremendous karma on reddit...

------
jackinloadup
Just watch out for the upcoming trend #Earthquake

~~~
rorrr
That's what the Italian scientists did.

------
gubatron
can it be used to predict market trends? instead of topics use stock tickers?

~~~
denzil_correa
There is already some work done on this front [0]. Ruiz _et al_ basically
concluded the stock price of a company is directly proportional to the number
of discussion topics of that company viz. if there for a company X there are
fewer topics under discussion the stock prices would be poor.

[0]
[http://www.cs.ucr.edu/~vagelis/publications/wsdm2012-microbl...](http://www.cs.ucr.edu/~vagelis/publications/wsdm2012-microblog-
financial.pdf)

~~~
junto
Thank you for sharing this.

