Hacker News new | past | comments | ask | show | jobs | submit login
How to spot first stories on Twitter using Storm (micvog.com)
53 points by nkurz on May 31, 2014 | hide | past | favorite | 6 comments

> Having such a volume of input tweets streamed in real-time, FSD seemed like an ideal use case for Storm framework which can scale up the work by adding more resources.

This is not coincidence - Twitter open-sourced Storm.

Well technically Backtype open sources storm as part of the twitter buyout of backtype.

The method of Locality Sensitive Hashing using random projection is better described here: http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.p...

For the problem of filtering out false alarms, this paper seems to have made some significant advances over the fscore for baseline nearest neighbor approach: http://citeseerx.ist.psu.edu/viewdoc/summary?doi= They report an average f-score of 0.541 on a standard TDT corpus. For comparison, their implementation of the NN algorithm had an f-score of 0.450. There's still a long ways to go on this problem.

I remember a similar talk by Miles Osborne @ CMU. Any chance you're from his lab?

Yes, I implemented Petrovic's paper (http://www.aclweb.org/anthology/N/N10/N10-1021.pdf) from scratch but scaled it with Storm in a distributed fashion.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact