I'll write a Ruby api wrapper if you give me an agreed-upon amount of usage when you settle on pricing.
Feel free to email me (HN name @ gmail) if you're interested or just want to follow up for customer development purposes.
Best of luck!
I really want this to work out, because my application needs this kind of technology to be reliable and accurate. I just processed the article at the link below, and the word "Tesla" was not captured as one of the topics.
Also, no pricing => not cool.
We also expose all these results to a Prolog interpreter on our backend and allow you to add custom logic to mashup and extend all of our results, as well as provide a much easier integration experience.
Totally agree with you on the pricing front, we're still finalising the details there. We're aiming to be fully transparent with both the technical and business side of things.
Why use TextRazor and pay for it ?
Also we found the Stanford tools (and the other open source NLP tools) were difficult to integrate into "production" apps for various reasons. One big one was performance - we aim to run the full parsing and extraction pipeline on an average news story in a few hundred milliseconds, which can be an order of magnitude faster than the others.
Edit: To be specific, it looks very similar. What do you have that Calais doesn't?
One thing that Open Calais does that I really like is that they attempt to have a single URI uniquely identifying recognized named entities. This is useful because, for example, when it recognizes President Bill Clinton, you get a reference to a unique URI, even if his name, title is different in different processed texts.
Thomson-Reuters bought ClearForest several years ago, thus acquiring Calais. If you are interested in text mining, and if you haven't experimented with Open Calais, then please put that on your TODO list.
You can demo it with some text here: http://viewer.opencalais.com/
Obvious uses would be any kind of CMS. Investigative journalism is another.
I've recently started exploring the Legal Informatics field. The problems in it are huge and typically involving adding some structure to lots and lots of unstructured text.
Also, Peter, if you do end up reading this, great work on the stuff you do :) Big fan here!
Drop me a line if I can be of any help!
Haven't devoted much time to it over the last couple months, but this provides a lot of inspiration.
Btw, anyone else read the title as "Trent Reznor"? I need a coffee...