Hacker Newsnew | comments | ask | jobs | submitlogin
Is there a good English wordlist with common words, for free download?
2 points by pramodbiligiri 647 days ago | comments
We are doing entity extraction for documents specific to a domain. Unfortunately our domain specific index contains many common English words, and we would like to take them out or weigh them much lower.

I'm trying to choose between WordNet, Google Ngrams (too big!), and Moby Wordlist from Sheffield University. Any suggestions?



bediger4000 647 days ago | link

Look at the file named "eign" in the GNU troff distribution. I use it as a "stop word" list and it seems to work pretty well.

-----

pramodbiligiri 647 days ago | link

Oh, had never heard of this. This looks good for extremely common short words. The version I have has only 133 words though. I'm looking for something in the range of a few thousand words at least.

-----

mindcrime 647 days ago | link

How about:

http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-...

Also, see previous discussion at:

http://stackoverflow.com/questions/1218335/stop-words-list-f...

-----




Lists | RSS | Bookmarklet | Guidelines | FAQ | DMCA | News News | Feature Requests | Bugs | Y Combinator | Apply | Library

Search: