For the problem of filtering out false alarms, this paper seems to have made some significant advances over the fscore for baseline nearest neighbor approach: http://citeseerx.ist.psu.edu/viewdoc/summary?doi= They report an average f-score of 0.541 on a standard TDT corpus. For comparison, their implementation of the NN algorithm had an f-score of 0.450. There's still a long ways to go on this problem.

