Text mining is one of my specialties and I have had similar ideas for a business. One thing that has stopped me is the awesome (and free for about 50K API calls a day) Open Calais service that does entity extraction and identifies some relationships between entities in input text.
For document clustering there are many good open source tools that people and companies can use. The commercial Ling Pipe product does a good job at sentiment analysis.
Obtaining, scrubbing, and generally curating the data is a pain point that users of this system may still need to worry about.
I wish this new business good luck, but there are definitely some real problems to work around. Perhaps we should go into business together :-)
True, but problems are there to be solved. :)
We've done a lot of work around data normalization/scrubbing from a multitude of sources as part of a sister project, so I'm fairly confident about this aspect.
Curation and classification is another issue, but we have a few ideas.
As for business, you never know, just let me try to get off this Ramen based diet first. ;)
For document clustering there are many good open source tools that people and companies can use. The commercial Ling Pipe product does a good job at sentiment analysis.
Obtaining, scrubbing, and generally curating the data is a pain point that users of this system may still need to worry about.
I wish this new business good luck, but there are definitely some real problems to work around. Perhaps we should go into business together :-)