
"Find me stuff I'm interested in" - davewiner
http://scripting.com/stories/2011/02/12/findMeStuffImInterestedIn.html
======
zdw
So, the plan here is:

1\. Read someones feeds, twitter, etc. and find what they link to

2\. Find other things the person hasn't seen that might be related to what
they like, and deliver it to them as a set of suggestions.

3\. Profit.

#1 and #3 seem easy... #2 seems hard, as there are plenty of things I see in
my own incoming streams (twitter, RSS, etc.) that are curated by those
sources, but I'm not interested in. Similarly, finding on topic sources that I
haven't already followed would be difficult after a while - there are only so
many people discussing any one topic at any depth (especially if it's new and
tech related).

This sounds like a "Netflix Challenge" difficulty problem, except with a much
wider, shallower, and more capricious dataset.

~~~
davewiner
My feed shows you what I'm interested in, not just what I link to. There are
words in the headlines, and in my descriptions of the stories. Those words are
a goldmine.

You also have 2500 other people's feeds, who are using the same service. You
have all kinds of ways of matching my interests against the stories they push
through the system. This isn't new territory. It's how Amazon's algorithms
work. They know that people who are interested in 3TB hard drives are also
interested in (or at least like) Snickers bars. They know this because they
ahve the stats in the database that correlate the two.

In other words, 2 is easier than you think. Lots of prior art.

~~~
jimminy
I agree with zdw in that it's very much like the Netflix Challenge, in the way
of focus.

If you don't narrow enough, on specific topics, you let irrelevant data in to
the results, if you narrow too much, you cut out posts that were tangential
but possibly even more relevant.

Also to compare this type of tracking to either Netflix or Amazon's
recommendations, is hard, because they use a fairly static set of items to
analyze, and thus don't suffer from long-tail analysis that would need to
occur in a constantly updated set of data. For a basic search recommendation
tool, this isn't as hard, but trying to pick up and determine someone's
interests, without over or undercutting the topics, in a constantly changing
set is very hard.

You also have to deal with shifts in user interest, Amazon gets by by only
having to recommend for the latest few entries, in most cases. Trying to
provide recommendations of new data for a user, you have to be able to be able
to refocus to adjustments in the user's interests.

Also, as far as the words being a goldmine, not always, are they a great
resource, yes, but only slighly more than the source that the user likes.
Sources are great for initial narrowing, words are not.

Two is not really easier, even with prior art, the prior art is great to look
at for inspiration and ideas, but when it comes down to it, it's a very
different set of requirements.

~~~
jamesbritt
"If you don't narrow enough, on specific topics, you let irrelevant data in to
the results, if you narrow too much, you cut out posts that were tangential
but possibly even more relevant."

I like Amazon's recommendation engine, but I'm surprised when I check out
suggested future releases and get no more than a single page. I'm interested
in a whole lot of things, and I think my purchase history and such would
support that, but for whatever reason Amazon sometimes doesn't show me very
much.

I'm guessing it's a side-effect of also marking some items as "not
interested", with the software unable to make reasonable distinctions between
likes and dislikes.

It's not uncommon, for example, to start getting recommendations for damn near
every new DVD release simply because I put a DVD into my wish list. As if
liking "a movie" == "liking movies" => "show all movies."

But if I then mark "Saw XII" as "not interested" it infers "does not like
movies." Or something; it's hard to tell.

The absence of metadata in my choices make this quite hard. I like some rock
music, but probably dislike most of it. Same for electronica or folk or
whatever. Discerning just what I like about this or that artist or piece is
hard, even for me. Same goes for books and movies. There are clearly common
threads, but deducing them is non-trivial.

In many ways I like the same music and art I liked 20 years ago, but I don't
always like the same musicians or artists, nor do I always like hearing or
seeing things represented or presented in the same way. So, it's the same, but
different.

For example, I loved "Metal Box"-era Public Image Ltd., but do not want to
listen to "Metal Box" forever; I want to know what is the 2011 equivalent of
"Metal Box". But that requires understanding current musical norms and
conventions, not just the tonal and rhythmic qualities of a particular album.

The whole "When are two things the same?" question is at the root of AI, and a
seriously tough nut to crack.

~~~
gwern
> I like Amazon's recommendation engine, but I'm surprised when I check out
> suggested future releases and get no more than a single page. I'm interested
> in a whole lot of things, and I think my purchase history and such would
> support that, but for whatever reason Amazon sometimes doesn't show me very
> much.

My theory is that this is evidence that the engine is using
<http://en.wikipedia.org/wiki/Collaborative_filtering> \- the way I think of
it is as a giant Venn diagram. At the beginning, I fit in almost every one of
the little circles because the service is so ignorant about me. It will
recommend me stuff from just about any overlapping circle. But as I select and
anti-select items, more circles become invalid, and I slowly narrow myself
down into a cramped little circle of just the items I've decided about, until
finally no one else's selections overlap with my own and the service can't
find anyone similar to steal recommendations from to give to me.

~~~
jamesbritt
Good observation. Also a bit sad, if true, as it suggests that anyone outside
the mainstream will get fewer recommendations.

Of course, it's tricky, and I appreciate an approach that takes into account
that not everyone who likes Foo will like the same things; other likes and
dislikes give important context. But I'd prefer a somewhat larger, if less
precise, set of suggestions over a small set.

------
evansolomon
The "personalized news service" idea is my default example for startups that
people keep trying and users keep ignoring. Entrepreneurs constantly want to
build this thing and it consistently falls on its face like few other ideas.

Maybe it's because they just haven't been good enough, and as soon as someone
cracks the code it will be great. However I think a much more likely answer is
that only a very, very small number of people have a news discovery problem.

~~~
davewiner
The difference between this approach and the ones that startups have been
trying (I see a lot of them) is that this idea will work. They've been trying
to create something that works for everyone. This approach only works for
people who actively forward or retweet links.

~~~
mckoss
I think you'll find lots of attempts at ths problem. We tried to do this with
data from Faves.com, our social bookmarking service. In the end we shut down
our public recommendation service because we couldn't keep up with the
spammers.

Any recommendation engine is a honeypot for web marketing types. I love the
concept, but the scalable execution is very hard. I think the best sites
approach this by engaging a motivated community, like HN, Reddit, and the
original Digg. You need some zealous moderators to keep the spam at bay.

------
m0th87
I made a project that did news recommendations a few years ago. One major
problem I came across while developing it is that there's a trade-off between
accuracy and bias. The more accurate recommended articles are to a user's
taste, the more they reinforce biases that he or she has. Republicans like
right-leaning articles, Democrats like left-leaning articles, etc.

------
kaizenfury7
I've been trying to implement this exact idea: <http://www.blazingrails.com/>

Topics describe your interests, and you can subscribe to multiple topics. The
'Best Posts' feed is a list of recommendations ranked by a combination of
matching topics and points.

The end goal is that you start getting a feed of only content you're
interested in.

------
topcat31
Feels like trunk.ly has the data to be able to do this...

I don't know much about recommendation engines but seems fairly easy to be
able to see what links I share, see similar users and then recommend me things
I might have missed.

~~~
fraserharris
Exactly what I was thinking. They have an API to mine the links data.

<http://trunk.ly/developer/>

------
Stuk
I've used <http://paper.li/> to create a "newspaper" from my personal twitter
feed (<http://paper.li/Stuidge> ). I follow people I'm interested in, and
paper.li picks up their links and gives me an interesting daily summary.

This isn't quite the same idea as the OP, but I have found a lot of
interesting links this way.

------
thisisnotmyname
Google reader has the "Recommended Items" feature that's pretty close to what
you're describing here.

------
loboman
This is what Popego used to do: [http://techcrunch.com/2008/09/09/popego-
tailors-the-social-g...](http://techcrunch.com/2008/09/09/popego-tailors-the-
social-graph-to-your-interests/)

------
samatman
Is this not exactly what the "Recommended Items" feed in Google Reader does?

------
joshu
I built this for delicious. Worked great but couldn't get staffing. Wah.

------
sanj
Facebook's Like button provides the data to do this for the rest of the world.

------
rcavezza
This reminds me of hunch.

~~~
davewiner
Yup, it's like hunch but for feeds.

------
mkramlich
I don't have a problem finding things I'd be interested in. Far opposite. I
have the problem of not having enough time, energy and money to act on all of
them. Therefore this feels like a solution without a problem.

------
NY_USA_Hacker
So, Amazon has:

"People who bought this book bought some of these other five books."

This 'recommendation' only requires simple counting and nothing like
'correlation'.

The technique has some pros and cons. I can believe that the technique is
helpful for shoppers, but it's a bit too weak for a recommendation engine.

