We are doing entity extraction for documents specific to a domain. Unfortunately our domain specific index contains many common English words, and we would like to take them out or weigh them much lower.
I'm trying to choose between WordNet, Google Ngrams (too big!), and Moby Wordlist from Sheffield University. Any suggestions?