Simple tf-idf in 30 lines of Idiomatic Clojure (thecomputersarewinning.com)
23 points by ithayer on June 28, 2011 | hide | past | web | favorite | 3 comments

Apparently, everyone knows that tf-idf stands for "term frequency-inverse document frequency". I had no idea, and the article didn't have time to include a link to http://en.wikipedia.org/wiki/Tf%E2%80%93idf or even type out the acronym.

Two remarks:

1. Don't 'earmuff' your stopwords, since you don't intend them to be rebound. An according guideline can be found here: http://dev.clojure.org/display/design/Library+Coding+Standar...

2. You could replace (remove nil? (map db (tokenize raw-text))) with (keep db (tokenize raw-text))


