

Ask HN: Addressing cold start problems in recommender systems? - apurva

Hi All,
I have been working on a recommender engine for a while now and have now stumbled across the cold start problem. The problem here is that whatever data I collect is only an indicator of the likes of the user (for eg., if browsing history is taken as a source, then the basic assumption that people don't browse for what they don't like stands true)
So in such a case, any ideas as to how I train the system initially for dislikes?? I do know that the system will gradually tune to the user preferences with continuous feedback, but I would not like the first run to be very erratic either by choosing random dislikes...
Any ideas folks??
Any help in the matter is greatly appreciated....
======
nostrademons
Many machine-learning systems get bootstrapped by their implementer sitting at
a website clicking "Like" and "Dislike" buttons for a large randomly-chosen
sample of possible data.

If this strikes you as incredibly boring, you can farm it out with Amazon
Mechanical Turk or other crowdsourcing schemes. You could also do cleverer
variants of this, like putting image-recognition or OCR training sets into
CAPTCHAs, submitting possible links to Reddit or Digg, or hosting Internet
surveys with the questions of interest.

~~~
apurva
but the whole idea is... every one has their own notion of likes and
dislikes.... am i missing something here?

~~~
nostrademons
That's why recommendation systems are hard. :-)

You could try to identify a population of users whose likes and dislikes are
expected to be "similar" to the user in question, though, and then base your
training set off them. I believe that's how actual recommendation engines (eg.
Amazon, YouTube) work. Of course, then you have to figure out how to identify
similar users, which is another hard problem.

------
what
I read somewhere about a recommender system for movies (I think) and what they
did is force a user to rate 5 random movies as part of the registration
process. The movies weren't entirely random, but ones they thought were
significant in identifying a user's tastes.

In your example of browsing patterns, maybe you could ask new users if they do
or do not like to read certain types of articles. ie: are you interested in
technology, sports, entertainment, random pictures of cats etc and seed their
profiles based on their expressed level of interest for those things (maybe
including dislikes from people who claimed to have similar interests).

But I would think that dislikes are not so important in the beginning.
Although I don't know how your algorithm works, if you have a rough idea of
what a person likes, shouldn't you be able to recommend things that they might
like just based on that? When you end up recommending something that they
don't like, you'll get some dislike data and can start factoring that in.

------
AmberShah
My startup faces a variant of this problem. Not exactly a recommender system,
but it will get more accurate as time goes on. I am compensating by putting an
initial value that is a guess, and then it will adjust as time goes on. Sort
of messy but necessary.

It sounds like you're saying that in your case, the value will be different
for each person, so you don't have a way of seeding it correctly for different
people. I sort of think you have to have SOME info to go off of. Sort of like
hunch asks you questions, maybe you could do something like that?

~~~
apurva
well so here is what I am doing instead... I get only a notion of the likes of
the user, and since "everything else" is too huge a domain to consider as
dislike, I try and rank the user's like concepts and pick the lower one's as
the one's they like lesser. This only for initialization, and then I leave the
filter to tune itself through feedback. I know it's not the best approach in
the world, but let me see how this shapes up. Thanks of course, and any other
ideas still welcome!

