

How to spot first stories on Twitter using Storm - nkurz
http://micvog.com/2013/09/08/storm-first-story-detection

======
philip1209
> Having such a volume of input tweets streamed in real-time, FSD seemed like
> an ideal use case for Storm framework which can scale up the work by adding
> more resources.

This is not coincidence - Twitter open-sourced Storm.

~~~
peregrine
Well technically Backtype open sources storm as part of the twitter buyout of
backtype.

------
sytelus
The method of Locality Sensitive Hashing using random projection is better
described here:
[http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.p...](http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.pdf)

------
nathanathan
For the problem of filtering out false alarms, this paper seems to have made
some significant advances over the fscore for baseline nearest neighbor
approach:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221....](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221.1974)
They report an average f-score of 0.541 on a standard TDT corpus. For
comparison, their implementation of the NN algorithm had an f-score of 0.450.
There's still a long ways to go on this problem.

------
shriphani
I remember a similar talk by Miles Osborne @ CMU. Any chance you're from his
lab?

~~~
mvogiatzis
Yes, I implemented Petrovic's paper
([http://www.aclweb.org/anthology/N/N10/N10-1021.pdf](http://www.aclweb.org/anthology/N/N10/N10-1021.pdf))
from scratch but scaled it with Storm in a distributed fashion.

