> Having such a volume of input tweets streamed in real-time, FSD seemed like an ideal use case for Storm framework which can scale up the work by adding more resources.
This is not coincidence - Twitter open-sourced Storm.
For the problem of filtering out false alarms, this paper seems to have made some significant advances over the fscore for baseline nearest neighbor approach: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.221....
They report an average f-score of 0.541 on a standard TDT corpus. For comparison, their implementation of the NN algorithm had an f-score of 0.450. There's still a long ways to go on this problem.
This is not coincidence - Twitter open-sourced Storm.