Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>notorious webspam/blogspam regurgitation sites like "recode.net"

Actually recode.net does a lot of original reporting in addition to regurgitation. Disclosure: some of my former CNET/CBS colleagues are reporters there.

The problem (and I've seen this when building http://recent.io/ as well) is that every news organization does this to some extent. If organization X has a scoop, Y and Z will "follow" it by summarizing and rewriting it. This has been going on for over a century; the AP, founded in the 1840s, does it very well.

In the most egregious cases like the one above it's just a theoretically fair-use excerpt from the original story followed by a "read more" link. But sometimes it serves your users to link to the followup coverage instead. That's when the original article (FT, WSJ, Economist, etc.) may be behind a paywall, or when the followups have more context or additional details.

TL;DR: This is a non-trivial problem.



Even Google News only mostly gets this right: it tries to group stories on the same topic, but sometimes it has multiple buckets for the same topic, and sometimes an unrelated story gets thrown into the wrong bucket.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: