

Keyword tracking goes real-time with new Superfeedr feature - joshfraser
http://www.readwriteweb.com/archives/keyword_tracking_goes_real-time_with_new_superfeed.php

======
dotBen
I'm a big fan of Superfeedr and the work Julien is doing with it... but a
serious problem with a keyword product based on RSS ingest is that many RSS
feeds are not 'full-content' and so the keyword searching is patchy.

In the case of 'short form' RSS feeds, you are not going to get a match if
your keyword is mentioned repeatedly in a given article, but not in the first
paragraph - which is what is published in the RSS feed, and thus ingested by
Superfeedr

While it is probably fair to say that the majority of RSS feeds Superfeedr is
tracking are 'full-content', the head of the long tail (ie the mainstream news
sources, top blogs, etc) where much of the value is are the least likely to
offer full content RSS feeds (they are old-skool thinking and want you to
click through to the mother ship).

What is a shame, I think, is that none of these services (there are a few now
in this space) are really upfront about this massive short-coming, which at a
technical level they must be aware of.

 _Given that I'm making a bit of a bash at providers in this space, I should
probably disclose that NewsBasis (a company I previously co-founded and have
an interest in, but no longer working with) is in the process of solving this
problem as part of their wider product offering. Which is why I've spent time
thinking about this problem, but perhaps also biased. However, I'm hoping once
this is stable there might be an opportunity to create an API off the back of
the keyword piece._

~~~
mjrusso
There is an easy hack, given a known list of RSS feeds that do not offer full
content and that one desires to be searched --

Simply run the links in the original feed through something like arc90's
Readability and produce a new feed with the output, ingesting this new feed
back into Superfeedr.

If this hack is being used only for search-related purposes, then I would
hazard that there is enough existing precedent here to claim legality.

~~~
dotBen
On paper it is a nice solution. However, I would assume Arc90 would see
considerable load on their systems (esp by feeds that are not generating any
page views) and shut this off. Also it's another point of failure and latency.

NewsBasis is looking to do the same processes Arc90 are doing, however, to
ensure a 'full content' payload is held for every source.

