I'm working on a personal project involving automated text classification for a variety of purposes, including identifying the author of a sample of text given samples of training text from a very large number of authors.
I'm far from the first to use machine learning algorithms to identify the author of a text, but I think I have something a little better than most of the research projects I've read about and open-source tools I've tested. Initial results show significantly greater accuracy and a couple orders of magnitude more speed in situations involving thousands of possible authors.
I can imagine potentially good uses for this sort of tech, ranging from keeping banned users out of an online community to identifying the author of a ransom note, death threat or the like. I can also imagine evil uses, such as identifying political dissidents to persecute.
I'm not sure how I feel about releasing such a thing in to the world (as open-source or as a product), knowing that it will be used for both good and evil. Any comments?
At some point, progress requires that you shrug your shoulders and say "I don't know how the good and bad uses will weigh against each other, but I'm going to go ahead anyway".