|We are doing entity extraction for documents specific to a domain. Unfortunately our domain specific index contains many common English words, and we would like to take them out or weigh them much lower.|
I'm trying to choose between WordNet, Google Ngrams (too big!), and Moby Wordlist from Sheffield University. Any suggestions?