

Launch Day at our Start-Up: Feedscrub - ardell
http://blog.feedscrub.com/2009/01/15/launching-our-start-up-feedscrub/

======
Maro
"It filters out the posts that are least interesting to you and leaves only
the stuff that is relevant."

How well does this service work in practice? I imagine telling spam and non-
spam emails apart is far easier then "predicting" which blog post I will like.

~~~
ardell
Hi Maro, I invite you to try it out yourself. I've released 500 "hackernews"
invites so everyone can give it a try.

You're right, it's a lot easier to filter out spam emails with the word
"V1AGRA" than it is to filter out news posts. You'll want to take some time to
train the filter properly before you unsubscribe from your junk feed.

~~~
Prrometheus
Is there anything fancier than a naive Bayes classifier going on?

Do you do any calculus on the relative values of false positives and false
negatives? It seems that a person with a large number of feeds would put a
greater importance on limiting the number of false positives (non-interesting
messages classified as interesting by your program), whereas a person with a
small number of feeds would be more worried about false negatives (interesting
stories classified as uninteresting).

Also, see this comment in this thread:
<http://news.ycombinator.com/item?id=435837> . For something like this, I
doubt a machine learning algorithm will work as well as social algorithms. Web
search is a very similar problem (find the most interesting link given input
from the user) and social algorithms won out. Of course, the kind and quality
of input that you are getting from the user is different. However, your
problem is much less well-defined (finding interesting things in general
instead of interesting things related to a particular topic).

~~~
ardell
Interesting points! We are using a naive Bayes classifier right now, and we're
working with some social aspects, especially as they relate to Latent Semantic
Analysis. We've considered doing a mini Netflix Prize for further improving
our algorithm :-).

We've also done a lot of tweaking to get the current filter as smart as
possible (thanks PG for your thoughts in A Plan for Spam!) Tweaked things like
the number of interesting words to count, weighting of title vs body, etc.

We'd love to talk with some machine learning experts, drop me a note if you
are one!

~~~
Prrometheus
I'm not an expert, just interested in the subject.

I thought of an idea, and this is completely fresh so take it with a grain of
salt, but you could use social data from all users subscribed to the same feed
to modify the prior probability that any given post was interesting, and then
use each individual's data to modify the conditional posterior probability
terms.

I'm not sure how effective it would be, but it would create a hybrid sort of
classifier that might be interesting.

I'm studying natural language processing this quarter, so I'll let you know in
three months if I come up with something more clever.

------
PStamatiou
just some unrelated feedback about how i have been dealing with this feed
information overload issue for now: i use feedly which utilizes among other
things google reader friends.. so pretty much i rely on hundreds of other
people to decide for me what is interesting and i check the stories they
upvote/recommend/share

------
soundsop
I think it's an interesting idea. I track about 200 feeds, but probably look
at only a small percentage of the stories. I definitely would appreciate help
in blocking out the least interesting stories.

That said, limiting the free account to only 3 feeds makes it difficult to
judge the site's usefulness. I guess that I would probably want to try the
site out with at least 50 feeds to see if I wanted to pay for the ability to
track even more feeds.

Consider that it may be more valuable to give out fewer invitations but raise
with numbers of free feeds per user.

~~~
pclark
if you use google reader you can attain this with the postrank plugin in
firefox.

~~~
timdorr
PostRank isn't personalized, which is the big difference with Feedscrub. FS
gives you unique results depending on what you like, not just based on how
much people are talking about something.

------
pclark
no word on how many paid accounts. Seems expensive for what it is.

~~~
ardell
Hey pclark, Feedscrub co-founder here. We're still waiting to see what our
conversion numbers will look like. I imagine it'll take users a few days to
train the filter and see how it works with their feeds. We're thrilled to have
so many new trial users already!

~~~
PStamatiou
hey jason - havent replied to your email yet but.. you guys going to ATL
Tweetup tonight? I think it's in highlands. there's a page on facebook

