Building a Recommendation Engine

jchung · on Jan 8, 2015

For those interested in implementing other similar systems, the O'Reilly book on Collective Intelligence is exceptional: http://shop.oreilly.com/product/9780596529321.do

mandeepj · on Jan 8, 2015

It is a good book for starters but they are talking a lot about tags. It was published in 2007 so a bit out of date

tie_ · on Jan 8, 2015

Kudos for publishing your approach to recommendations!

Relying on users to explicitly mark objects is not the best approach, though. Each user have their own mind about what the best marks/weights would be, hence the input would be inconsistent and the recommendation results would suffer. Not to mention that manual marking is tedious, and as you pointed out - it is hard to apply to old content.

I would suggest using a system that can extract the marks automatically. Couple of years ago I was involved in a project that had such a component (NeuralBrother's Neugs), but there are OS projects that provide similar functionality as well.

ehurrell · on Jan 8, 2015

I agree, although with enough users this subjective marking effect can be somewhat mitigated. I'll give an example: Steam allowed people to tag games recently, which lead to a lot of 'hardcore gamers' tagging games they didn't like because they had no explicit gameplay as 'Walking Simulator' (Gone Home, Dear Esther etc). This tag is meant as an insult, but for people who like games like this it is actually useful, it's a grouping of games that are interesting to similar people, which is the knowledge Valve presumably wanted to gather.

I would suggest a mix of automated and manual tagging, use both, show the automated or curate the text shown. It should be easy enough to cluster the tags too, one man's walking simulator is another's 'story game' or similar.

brandonr05 · on Jan 8, 2015

Normalizing the user preference vector is not necessary in this scheme. The magnitude of the preference vector is a constant scale on all of the comparisons to possible recommendable entities, so it drops out.

Overall, this is a reasonable starting point for recommending user actions.

When you have more data, you can try using collaborative filtering to estimate user preferences more robustly from sparse data. You could also try optimizing user and entity parameters by maximizing a likelihood function on followed recommendations composed of probabilities regressed from a function of the user and entity Mark vectors.

barisser · on Jan 8, 2015

You're right about normalizing the user preference vector: not strictly necessary. Still if we're normalizing bounty vectors, it feels right to normalize everything :).

Another variation that I was considering is to generate 'user clusters'. In Mark Vector Space, divide user vectors into N groups such that the net variance across all clusters is minimized. Then when a user, for which there is sparse data, needs contextual information from other users, I could simply ask how correlated he is to the different clusters. If each cluster's 'center of mass' is a vector, the dot product between a new user and the different cluster vectors could be informative in reconstructing suggestions: the idea being to infer from similar users what a particular user might want.

I was also wondering whether adding a stochastic component to each user-vector would be interesting.

Thanks for the feedback.

JacobiX · on Jan 8, 2015

Thank you for the post, I just want to add that there is some machine learning approaches that automatically estimate the mark objects. Shameless self-promotion: https://github.com/GHamrouni/Recommender it is a C library for product recommendations/suggestions using collaborative filtering (CF).

vishalzone2002 · on Jan 8, 2015

I like the approach explained by etsy engineers in the blog post https://codeascraft.com/2014/11/17/personalized-recommendati...

java-man · on Jan 8, 2015

Check out http://cortical.io - their idea might offer better results.

slaction · on Jan 8, 2015

So basically they used tags, welcome to 2004