

Is there a good English wordlist with common words, for free download? - pramodbiligiri

We are doing entity extraction for documents specific to a domain. Unfortunately our domain specific index contains many common English words,  and we would like to take them out or weigh them much lower.<p>I'm trying to choose between WordNet, Google Ngrams (too big!), and Moby Wordlist from Sheffield University. Any suggestions?
======
bediger4000
Look at the file named "eign" in the GNU troff distribution. I use it as a
"stop word" list and it seems to work pretty well.

~~~
pramodbiligiri
Oh, had never heard of this. This looks good for extremely common short words.
The version I have has only 133 words though. I'm looking for something in the
range of a few thousand words at least.

------
mindcrime
How about:

[http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-...](http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-
stop-list/english.stop)

Also, see previous discussion at:

[http://stackoverflow.com/questions/1218335/stop-words-
list-f...](http://stackoverflow.com/questions/1218335/stop-words-list-for-
english)

