

High demand for Text Mining tools? But why? - zeratul

I got this email from TextMinr.com:<p><pre><code>   At the moment, we are seeing extraordinary
   demand for our beta. To manage the growth of
   the user base and scale without melting down,
   there might be a wait of 3-6 weeks before we
   can send you an invite. We are very sorry for
   the wait, but you will get your invite as soon
   as possible, and we operate a "first come,
   first served"-policy, so your place in the
   queue is assured.
</code></pre>
This is good news but I'm surprised. This means people see great value in text mining but why? It's not intuitive how much data can be scrubbed from text. How do you even know it's doable. OK, someone is making money out of Twitter sentiment but that's done. What will you make?<p>Another question is if TextMinr.com is up to the task. Text mining is not like any other data mining. It MUCH more difficult. Google prediction API is just silly because there is no control over feature engineering, feature selection, instance selection, or model selection. This leaves a lot of untapped market for predictive modeling startups IF (big if!) the demand is for profit not for fun or not for academics.
======
xavian
In the immortal words of Cave Johnson, "Why not?!" :-) Dealing with structured
data is easy compared to unstructured data, but there is a vast wealth of
unstructured content just waiting to be tapped. I suspect there is so much
interest BECAUSE TextMinr seems like a tool to explore a topic most folks find
difficult to grasp. I'm a search engine architect and it's not a "solved"
issue by any means. I think there is a lot of curiosity and a lot of untapped
content that has yet to be leveraged. I'm certainly interested. :)

------
devs1010
I'm kind of working on something similar but in Java but so far I have just
written text mining things that meet my own specific "wants", just scraping
google alone presents a vast amount of data that can be extracted and
evaluated to match against certain parameters. Its really something that is
"need-driven" though so its all about finding the specific need and then going
from there.

