
Show HN: TextRazor, a scriptable text mining API - tcwc
http://www.textrazor.com
======
adelevie
This looks great. I'm building <http://dokket.aws.af.cm>. It's a database
filled with documents from the Federal Communications Commission. From day 1,
I've been looking for smart ways to make use of the thousands of documents of
unstructured text. The customization you offer seems to be the killer feature
for me.

I'll write a Ruby api wrapper if you give me an agreed-upon amount of usage
when you settle on pricing.

Feel free to email me (HN name @ gmail) if you're interested or just want to
follow up for customer development purposes.

Best of luck!

~~~
richardofyork
I am also building an application that could use this service. So I tested
TextRazor and the results were not good on two random articles I processed
with the service.

I really want this to work out, because my application needs this kind of
technology to be reliable and accurate. I just processed the article at the
link below, and the word "Tesla" was not captured as one of the topics.
[http://www.teslamotors.com/blog/most-peculiar-test-drive-
fol...](http://www.teslamotors.com/blog/most-peculiar-test-drive-follow)

~~~
jsmcgd
It works for me.

~~~
richardofyork
I just tried it again and it worked this time. That is interesting: I don't
know if they tweaked it a bit or not :), but I am feeling a bit better now,
because my application really needs a reliable, accurate service like this.

~~~
tcwc
No tweaking, promise :) It's possible you hit an inconsistent server first
time around, I'll have a dig on our end. Let me know if you have any other
problems - toby@textrazor.com.

------
steeve
Cool demo, but how do you guys compare to, say, Alchemy API ?

Also, no pricing => not cool.

~~~
tcwc
Hey steeve we thought there were a few things missing in the competition.
We've built a bunch of extra functionality such as more extensive relation and
dependency parsing and contextual entailment generation, and use all that to
build much more accurate entity and topic recognition, an area we think the
others can be greatly improved on.

We also expose all these results to a Prolog interpreter on our backend and
allow you to add custom logic to mashup and extend all of our results, as well
as provide a much easier integration experience.

Totally agree with you on the pricing front, we're still finalising the
details there. We're aiming to be fully transparent with both the technical
and business side of things.

~~~
ismaelc
You can add billing/pricing tiers to your API now using Mashape
<http://www.mashape.com/> (Disclaimer: I work for Mashape. Let me know if you
need help!)

------
anilshanbhag
This is just like stanford parser. <http://nlp.stanford.edu:8080/parser/>

Why use TextRazor and pay for it ?

~~~
mark_l_watson
The Stanford NLP tools are very good, and also GPLed, which works for a lot of
projects. If the GPL doesn't work for you, the Apache OpenNLP project is also
good.

~~~
mark_l_watson
BTW, it is not just having software packages to use: it is a ton of work
obtaining and preparing training data. That said, Stanford NLP and OpenNLP
tools come out of the box with trained models for tagging, entity name
recognition, etc. For lots of uses, these pre-trained models will work well
for you.

------
thejosh
No pricing at all on the website is fishy for anyone who wants to use the free
plan?

~~~
mark_l_watson
I offer something similar, and I do have a pricing page:
<http://kbsportal.com/pricing> :-)

------
abaer
Really impressive results. I put in some music reviews, and it did an
excellent job of identifying artists, Genres, labels, etc. From an API
perspective,there's not much out there that competes with this - and nothing
with a modern API.

~~~
typpo
There are a number of other competing APIs. One of the best is OpenCalais,
which is owned by Thomson Reuters[1].

You can demo it with some text here: <http://viewer.opencalais.com/>

[1] <http://www.opencalais.com>

------
polskibus
That's exactly what I need to start working again on my algo trader! Seems to
be working well with a sample of financial news extracts. Will definitely look
into it further, thanks!

------
movingahead
This is very impressive stuff. I ran a news article through the demo, and the
entity recognition was very impressive. Waiting for them to reveal more
details on pricing.

------
habosa
Wow, this seems incredible, signed up immediately. My mind is already spinning
with all of the cool apps I could make with this. How hard would it be to
allow this functionality offline for paid users? You could have some sort of
packaged library which phones home to count requests used, but does the
processing offline to take network latency out of the equation. Not sure if
that's feasible, but it would be great.

------
ses
Being a big Prolog fan, I think this looks like an awesome product. This sort
of textual analysis will become more important as time goes on. As the
interest in search technologies grows, I think intelligent search (contextual
queries, query answering, clustering, recommendations, meta-data extraction
etc.) will start to appear in more end-user products. One question...
whereabouts in the UK are you based?

~~~
tcwc
Thanks, great to see the other Prolog fans coming out of the woodwork! Based
in London.

------
petercooper
Question: What are people actually doing with technology like this right now?
(i.e. who are the people who see this and think.. yay, I'll sign up now!)

~~~
JPKab
It's certainly not going to be ideal for your typical CRUD app. Think about
all of the information that is locked inside of unstructured text (MS word
docs, pdf's come to mind), and then imagine if you can scan through thousands
of documents, find the named entities, and then start connecting them together
in queries.

Obvious uses would be any kind of CMS. Investigative journalism is another.

------
atrilla
Great work! But I missed the "sentiment analysis" flavour that used to be so
popular some years ago with the NLP bunch... In this sense, I did something
similar:

<http://dtminredis.housing.salle.url.edu:8080/EmoLib/>

and

<http://nlptools.atrilla.net/web/omsa.php>

Drop me a line if I can be of any help!

------
GotAnyMegadeth
This looks amazing. Though when I put my name in it came up with: DBPedia
types: Person Athlete Agent GolfPlayer owl#Thing

------
doktrin
Very cool. This is essentially what I envisioned when I started work on
<http://www.textalyze.com>.

Haven't devoted much time to it over the last couple months, but this provides
a lot of inspiration.

~~~
wiradikusuma
Hi man, I tried my website (<http://www.ngajakjalan.com>) but the analyzer
probably reads my inlined JS and doesn't read the "done" version (I use
AngularJS). Maybe you can use something like PhantomJS to extract websites
content "as seen by human"?

------
thehodge
I don't know if your just being slammed right now but I posted a request for
pricing and haven't got anything.. it looks like a nice API to integrate into
our system but it really needs clearer pricing..

------
jsmcgd
Awesome stuff. It would be great to use something like this for suggesting
relevant tags of content. Perhaps a WordPress or Drupal plugin to get the ball
rolling.

------
PaulHoule
This is the best Wikipedia-backed namex I've seen and I've seen a lot of them,
worked on code for one, and even designed one. Awesome!

------
beebs93
Wow, nice demo. I agree with the others the lack of pricing is a bit
disconcerting.

Btw, anyone else read the title as "Trent Reznor"? I need a coffee...

~~~
snake_plissken
totes on the Trent Reznor part.

------
sinzone
Hi, would love to have this API listed on Mashape

------
eurodance
Well done.

