
Building a Recommendation Engine with NumPy - J3L2404
http://software-carpentry.org/4_0/matrix/recommend/
======
physcab
I spent a great deal of time writing a recommendation engine for a major web
property, so I'll just offer some brief pointers for anyone interested in this
stuff:

\- If you know nothing about how recommenders might work, a good introductory
book is Programming Collective Intelligence.

\- To do recommendations at scale, your best bet is to write them in java and
run them on Hadoop using map/reduce. You can write them in python too I
suppose and use Hadoop Streaming.

\- NumPy is awesome. Its a great way to prototype your ideas and if you come
from a Matlab background (as I did), its very similar. I have not yet run
anything in production using NumPy though.

\- If you want a recommender that works out of the box, check out Apache
Mahout (used with Hadoop) or the Weka project.

~~~
withoutfriction
Is ruby appropriate for writing recommendation engines? Or is there far more
support for writing them in python?

~~~
drats
As far as I am aware Ruby has no real equivalent to Numpy and Scipy for
Python. This is why I swapped to Python from Ruby; the Ruby community gets
extremely thin indeed as soon as you stray from web-related things.

------
stcredzero
Others in these comments have discussed or questioned if Python/NumPy is the
best language for recommendations, due to scaling/speed. What if there was a
project to translate Python libraries to run on LuaJIT? I suspect a lot of
work could be done using sytax-directed automated translations, and there's
also Lunatic Python as a fallback, in case a needed Python library hasn't been
translated yet.

Does this seem to be a worthwhile project?

------
rmc
Is NumPy the best way to do recommendation engines in Python? Has anyone done
a recommendation library?

~~~
jchonphoenix
NumPy is used in rec engines because it has an extremely fast matrix library.

If you're doing non-matrix (or representing a matrix without a 2d array)
recommendation engines then NumPy could be completely useless.

I'm doing research on this exact problem at Carnegie Mellon and we are using a
graph to do things instead. We aren't using basic techniques like kNN however,
so that may have something to do with it. Instead, we have someone who has
done heavy research in submodularity and we're using an approximation
algorithm to the submodular function optimization problem.

~~~
calanya
Sounds very interesting. Can you point to a paper in the area?

~~~
jchonphoenix
I help Khalid and Carlos with their research (I'm an undergrad). Here's
Khalid's thesis proposal. The paper recommendation problem is section 2.2
(Beyond Keyword Search).

<http://www.cs.cmu.edu/~kbe/proposal.pdf>

------
danbmil99
please indicate a video post

~~~
J3L2404
This is their new layout, which I like very much, because you don't have to
watch the video and can just scroll down for each screenshot of code on the
left and the transcript on the right. I hope more sites adopt this style.

