>notorious webspam/blogspam regurgitation sites like "recode.net"
Actually recode.net does a lot of original reporting in addition to regurgitation. Disclosure: some of my former CNET/CBS colleagues are reporters there.
The problem (and I've seen this when building http://recent.io/ as well) is that every news organization does this to some extent. If organization X has a scoop, Y and Z will "follow" it by summarizing and rewriting it. This has been going on for over a century; the AP, founded in the 1840s, does it very well.
In the most egregious cases like the one above it's just a theoretically fair-use excerpt from the original story followed by a "read more" link. But sometimes it serves your users to link to the followup coverage instead. That's when the original article (FT, WSJ, Economist, etc.) may be behind a paywall, or when the followups have more context or additional details.
Even Google News only mostly gets this right: it tries to group stories on the same topic, but sometimes it has multiple buckets for the same topic, and sometimes an unrelated story gets thrown into the wrong bucket.
Actually recode.net does a lot of original reporting in addition to regurgitation. Disclosure: some of my former CNET/CBS colleagues are reporters there.
The problem (and I've seen this when building http://recent.io/ as well) is that every news organization does this to some extent. If organization X has a scoop, Y and Z will "follow" it by summarizing and rewriting it. This has been going on for over a century; the AP, founded in the 1840s, does it very well.
In the most egregious cases like the one above it's just a theoretically fair-use excerpt from the original story followed by a "read more" link. But sometimes it serves your users to link to the followup coverage instead. That's when the original article (FT, WSJ, Economist, etc.) may be behind a paywall, or when the followups have more context or additional details.
TL;DR: This is a non-trivial problem.