
Why You Should Not Build a Recommendation Engine - sorenbs
http://www.datacommunitydc.org/blog/2013/05/recommendation-engines-why-you-shouldnt-build-one
======
lars
Disagree with most of this. You don't need a massive inventory for
recommendations to be useful, and you don't need a massive data set of usage
data to be able to do them. What really matters is how sparse the user x item
matrix is, and I know from experience that you can give ok recommendations
even in cases with extremely sparse data.

I also don't like the idea it takes mysterious, scary "hefty data science" to
be able to do recommendations.

\- If you're recommending based on one thing (i.e. "people who viewed this
also viewed.."), you'll be doing cosine similarity on the vector of viewers
(i.e. the columns of the user x item matrix).

\- If you're recommending based on many things (i.e. "recommended for you"),
you'll be doing a matrix factorization of the user x item matrix. Pick SVD or
NMF, depending on how sparse the data is.

\- You probably won't be doing content based recommendations, without doing
breakthrough machine learning research. For a lot of content, no-one really
knows how to do this.

------
bsdpython
I think the key insight in this article is "A recommendation engine is a
feature (not a product)". As a side project I built an e-commerce
recommendation engine part time over 2-3 months. I didn't want to build out a
massive product database and I didn't want to ask users to fill out lengthy
profiles so I leveraged existing product and social APIs to handle both. It
wasn't very difficult to get something that worked reasonably well - though
competing solutions were mostly far worse so maybe I'm selling myself short.
In the end it was decently well received (had some press, good user feedback,
thousands of initial users) but it failed as a standalone business. Cost of
customer acquisition vs customer lifetime value just wasn't there. I've seen
no one else succeed in the space so I think the net takeaway is what I
referenced at the top - it's really more of a feature than a standalone
product. Building the world's best recommendation engine is really just a
piece of the puzzle.

------
solve
Directly giving the users predictions that are based on latent factors has a
huge problem -- is there ever really a case where the latent factors have very
high prediction ability, but an analyst can't simply see what's driving the
preferences by looking at those factors, and then create something better by
targeting that interaction directly?

Better to analyze the latent factors, and then use that analysis to gain
engineering-type insights into the structure of the problem, and target
specifically what's driving people to like something. Exactly as Netflix has
been doing in recent years. Exactly as people have done when writing books or
developing other entertainment content for centuries.

Just dumping unsupervised clustering on the end users is a poor technique that
should stay in the 1990's where it belongs.

~~~
numlocked
Is this true? Really interesting. Any links to write ups from Netflix on how
their approach has shifted? Or other big guys?

------
numlocked
I disagree. My company (ePantry) has only hundreds, not thousands, of distinct
products, but making recommendations is incredibly important. Showing users
the right product at the right time ensures they get the items they want, and
it moves the needle on average basket size.

I tried a number of off-the-shelf recommendation systems, none of which worked
for the comparatively dense matrix we have (since we only have a few hundred
products). I finally rolled one in house which took a few days of work
(thanks, redis!) and has a bunch of custom filters that OTS systems wouldn't
ever have.

I hate building stuff like this when OTS is available, but everything was
overkill, or too hard to configure, or initially gave bad results and required
too much black-box tweaking. It's been a success and was way easier than I
thought.

------
muglug
Title is a fudge – the article should really be titled "Why you should not
build a recommendation engine unless you have lots of existing users and vast
amount of content". That message is a lot more common-sensical, from a stats
perspective.

------
mattmanser
This is wrong. Think of all the companies that start on day 1 with 10,000s or
millions of products. Hotels, restaurants, gigs, events, posts, people, other
users, questions, articles, songs, etc.

Manually picking stuff is time consuming and might become stale quickly while
a basic recommendation engine will take you a day or two. The whole point is
to simply expose your consumer to more of your other products in the hope that
they engage with one. If you have a wide scope of products, presenting the
same tiny number of manual picks would be insane. Editors picks only works if
you have a tiny number of products.

So build one if you want. And don't be afraid to make a slightly shit one
either. As it's often better than tying up your time manually picking stuff
out when you could be doing something better with your time.

------
nitwit005
I have to disagree with the definition: "A recommendation engine is a feature
(not a product) that filters items by predicting how a user might rate them."

The feature bit is correct, but the rating bit might not be. People very often
give high ratings to things they are not interested in. People may highly rate
jewelry that they cannot afford, or give a five star rating to a movies
considered classics, that they don't actually want to view.

What you are trying to optimize is ultimately purchasing, continued
subscription, ad views, or some other profit metric. That can be hard to
directly measure, so something like ratings can be a stand in, but you may
accidentally optimize for something deeply undesirable like recommending
expensive jewelry to everyone that they will never buy.

------
dmichulke
If someone told me to build a recommendation engine, I'd basically use a knn-
approach.

You'd need to define

\- a similarity measure for users (based on, e.g., what they bought and what
they looked at) and

\- you just recommend the top items the k nearest users bought (weighted by
distance), possibly subtracted by the items already bought by the user in
question and cleaned from explicit items.

Of course, the system would start really bad but it would get gradually
better. This doesn't sound too complicated nor too simplistic nor "heavy data
science".

Unfortunately there are no real technical insights in the article justifying
why I shouldn't do this, so anyone can tell me why my approach would be bad or
good?

------
graycat
Nonsense. The OP sets up a straw man just to knock it down. The descriptions
of _recommendation engines_ are naive, uninformed, narrow, and not at all
comprehensive.

What can be done with good applied math for making _recommendations_ has
variety nearly beyond belief; the approaches outlined by the OP are, of
course, just silly.

For the _cold start problem_ , that need not be very difficult -- just start
in niches.

------
ffn
In more mathematical terms, the value a rec engine provides you looks like:

value = k1 * users + k2 * products + k3 * products * users

Prioritize building your rec engine feature after you've built user on-
boarding features and product on-boarding features because without users and
products, rec engines contribute 0 value.

------
wizard_class
Ive been looking for an engine that can handle content-based recommendation

~~~
thehal84
I might be able to help with that. email: hello at theenginuity.com

