Hacker News new | comments | show | ask | jobs | submit login
Simple tf-idf in 30 lines of Idiomatic Clojure (thecomputersarewinning.com)
23 points by ithayer 1615 days ago | 3 comments

Apparently, everyone knows that tf-idf stands for "term frequency-inverse document frequency". I had no idea, and the article didn't have time to include a link to http://en.wikipedia.org/wiki/Tf%E2%80%93idf or even type out the acronym.


Two remarks:

1. Don't 'earmuff' your stopwords, since you don't intend them to be rebound. An according guideline can be found here: http://dev.clojure.org/display/design/Library+Coding+Standar...

2. You could replace (remove nil? (map db (tokenize raw-text))) with (keep db (tokenize raw-text))




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact