
Building a Recommendation Engine - bitsweet
http://blog.assembly.com/Building-a-Recommendation-Engine/
======
jchung
For those interested in implementing other similar systems, the O'Reilly book
on Collective Intelligence is exceptional:
[http://shop.oreilly.com/product/9780596529321.do](http://shop.oreilly.com/product/9780596529321.do)

~~~
mandeepj
It is a good book for starters but they are talking a lot about tags. It was
published in 2007 so a bit out of date

------
tie_
Kudos for publishing your approach to recommendations!

Relying on users to explicitly mark objects is not the best approach, though.
Each user have their own mind about what the best marks/weights would be,
hence the input would be inconsistent and the recommendation results would
suffer. Not to mention that manual marking is tedious, and as you pointed out
- it is hard to apply to old content.

I would suggest using a system that can extract the marks automatically.
Couple of years ago I was involved in a project that had such a component
(NeuralBrother's Neugs), but there are OS projects that provide similar
functionality as well.

~~~
ehurrell
I agree, although with enough users this subjective marking effect can be
somewhat mitigated. I'll give an example: Steam allowed people to tag games
recently, which lead to a lot of 'hardcore gamers' tagging games they didn't
like because they had no explicit gameplay as 'Walking Simulator' (Gone Home,
Dear Esther etc). This tag is meant as an insult, but for people who like
games like this it is actually useful, it's a grouping of games that are
interesting to similar people, which is the knowledge Valve presumably wanted
to gather.

I would suggest a mix of automated and manual tagging, use both, show the
automated or curate the text shown. It should be easy enough to cluster the
tags too, one man's walking simulator is another's 'story game' or similar.

------
brandonr05
Normalizing the user preference vector is not necessary in this scheme. The
magnitude of the preference vector is a constant scale on all of the
comparisons to possible recommendable entities, so it drops out.

Overall, this is a reasonable starting point for recommending user actions.

When you have more data, you can try using collaborative filtering to estimate
user preferences more robustly from sparse data. You could also try optimizing
user and entity parameters by maximizing a likelihood function on followed
recommendations composed of probabilities regressed from a function of the
user and entity Mark vectors.

~~~
barisser
You're right about normalizing the user preference vector: not strictly
necessary. Still if we're normalizing bounty vectors, it feels right to
normalize everything :).

Another variation that I was considering is to generate 'user clusters'. In
Mark Vector Space, divide user vectors into N groups such that the net
variance across all clusters is minimized. Then when a user, for which there
is sparse data, needs contextual information from other users, I could simply
ask how correlated he is to the different clusters. If each cluster's 'center
of mass' is a vector, the dot product between a new user and the different
cluster vectors could be informative in reconstructing suggestions: the idea
being to infer from similar users what a particular user might want.

I was also wondering whether adding a stochastic component to each user-vector
would be interesting.

Thanks for the feedback.

------
JacobiX
Thank you for the post, I just want to add that there is some machine learning
approaches that automatically estimate the mark objects. Shameless self-
promotion:
[https://github.com/GHamrouni/Recommender](https://github.com/GHamrouni/Recommender)
it is a C library for product recommendations/suggestions using collaborative
filtering (CF).

------
vishalzone2002
I like the approach explained by etsy engineers in the blog post
[https://codeascraft.com/2014/11/17/personalized-
recommendati...](https://codeascraft.com/2014/11/17/personalized-
recommendations-at-etsy/)

------
java-man
Check out [http://cortical.io](http://cortical.io) \- their idea might offer
better results.

------
slaction
So basically they used tags, welcome to 2004

