

Simple recommendation system written in Ruby - otobrglez
http://otobrglez.opalab.com/ruby/2014/03/23/simple-ruby-recommendation-system.html
Nothing fancy, just simple tag&#x2F;words based recommendation algorithm implemented in Ruby.
======
manish_gill
Programming Collective Intelligence is an excellent book for learning these
sort of things. First chapter is a recommendation engine! :)

~~~
mazelife
I second this. And if you decide you want to go really in-depth on recommender
systems, I suggest taking a look at "Recommender Systems Handbook"
([http://www.springer.com/computer/ai/book/978-0-387-85819-7](http://www.springer.com/computer/ai/book/978-0-387-85819-7)).
It's basically a collection of scholarly articles on on the topic, so the
approach is academic. But it's also the best resource I'm aware of for
understanding what's state-of-the art across a range of aspects of recommender
systems. (Also, although the price is a somewhat hair-raising $179, there are
PDF copies of the whole thing floating around that are easy to find with a
google search.)

------
emehrkay
I did something very similar in the past except I used Cosine Similarity
([http://en.wikipedia.org/wiki/Cosine_similarity](http://en.wikipedia.org/wiki/Cosine_similarity)).
It allowed me to give each tag a "weight" and when comparing the tag clouds, I
would zero out any that aren't found. It works really really well.

------
wfn
Good stuff, and a nice writeup/explanation!

To impudently hijack the thread: for a very similar approach (jaccard
similarity coefficient, ruby) which has a nice abstracted implementation for
background workers, take a look at David Celis 'recommendable' \- here's him
introducing the same system:
[http://davidcel.is/blog/2012/02/07/collaborative-
filtering-w...](http://davidcel.is/blog/2012/02/07/collaborative-filtering-
with-likes-and-dislikes/) and the gem itself:
[http://davidcel.is/recommendable/](http://davidcel.is/recommendable/) I
believe it's been discussed on HN before.

Redis is used to store the binary votes, and to compute similarity
coefficients. Since redis is very good with set operations (intersections on
multi-million-member sets (and more) are crazy fast), it's quite the natural
choice for the db backend. One of the cases where a NoSQL solution seems to be
the right tool for the job, as a matter of fact!

I've used recommendable (incl. in production code) in the past, it works very
well, is reliable, robust, and easily hackable for whatever needs. (e.g. it's
meant to integrate with Rails, but it's quite simple to make it work on
barebones ruby, with (e.g.) Sinatra as a lightweight web app exposing vote
functionality, and so on.)

~~~
otobrglez
Thanks for the reference. David's post looks awesome!

------
sixtosugarman
You may use Levenshtein Distance to get better results by taking word
variations into account.

And also you can enhance it by using semantic similarity scores for strings.

~~~
mazelife
If you're willing to get into actual NLP, then semantic similarity would
certainly be one way to go. Is there any equivalent to Stanford (Java) or NLTK
(Python) in Ruby land? But I'm not sure that Levenshtein will necessarily get
you better results than the bag-of-words approach the author is taking with
Jaccard distance, if all you're doing is document classification.

~~~
jbranchaud
As far as NLP libraries in Ruby land, there is both
[treat]([https://github.com/louismullie/treat](https://github.com/louismullie/treat))
and [ruby bindings to the Stanford Core
NLP]([https://github.com/louismullie/stanford-core-
nlp](https://github.com/louismullie/stanford-core-nlp)).

~~~
otobrglez
I've used OpenNLP with jRuby for my NLP experiment. Check it out
[https://github.com/otobrglez/politiki-
ner](https://github.com/otobrglez/politiki-ner) to get an idea how to mix it.

------
jamra
Here is a cool approach to the subject by Linked In.
[http://engineering.linkedin.com/open-source/cleo-open-
source...](http://engineering.linkedin.com/open-source/cleo-open-source-
technology-behind-linkedins-typeahead-search)

Here is my HN-obligatory, self-written golang version:
[https://github.com/jamra/gocleo](https://github.com/jamra/gocleo)

I went with this author's approach to use Jaccard to rank the results,
however, I like this approach better:
[https://neil.fraser.name/writing/patch/](https://neil.fraser.name/writing/patch/)
They basically take the distance to the beginning of the text into account.

------
xmpir
I recommend reading [http://nlp.stanford.edu/IR-
book/](http://nlp.stanford.edu/IR-book/) on this topic.

------
razvvan
if you want something that scales better use minhash. You get a similarity
that is approx jaccard but with a lower footprint memory and cpu wise.

------
joshcrowder
How well does this perform compared to your postgreSQL example?

Also - We are looking for a Ruby / Backbone.js developer drop me an email
josh@seriousfox.co.uk :)

~~~
otobrglez
Thanks for you comment Josh. In production I actually implemented PG version
and it's still in use today. I thought it will scale better and last longer. I
didn't do benchmarking or anything like that - it think it would really depend
on size of you dataset. For something serious - compute recommendation in
background and store them in database. I believe that is how "big boys" do it.
:)

------
snkcld
for an approach using neo4j, check out cadet! (my project) cadet is more just
a jruby wrapper around neo4j, but one can use it to interact with neo4j (and
thus come up with recommendations without touching a line of java, or even
cypher )

still in progress, and id love any input!
[http://github.com/karabijavad/cadet](http://github.com/karabijavad/cadet)

[http://github.com/karabijavad/congress-
graph](http://github.com/karabijavad/congress-graph)

------
chrismealy
I remember seeing a nice little system that involved taking the square root of
something, but I can't remember what it was. Anybody know it?

