
Show HN: An easy-to-use Text Analysis API – NLP and Machine Learning - parsabg
http://aylien.com/text-api-demo
======
gklitt
Cool stuff! It's nice to see platforms like this which abstract out good
algorithms, so that developers can worry about thinking of interesting
applications. .Open source libs are even better, but pragmatically speaking, I
think these types of platforms probably move faster and get better results.

One major competitor (well known for anyone who's looked into this stuff) is
Alchemy [1]. I tried a New York Times link [2] on Aylien and Alchemy, and
Alchemy performed much better -- in fact, Aylien didn't even successfully find
the article body. I'm sure you guys will be iterating on improving the
algorithms, but just wanted to flag that as a potential turnoff for anyone
comparing your website demo with Alchemy.

Best of luck!

[1]
[http://www.alchemyapi.com/products/demo/](http://www.alchemyapi.com/products/demo/)

[2]
[http://www.nytimes.com/2014/02/18/world/middleeast/bombings-...](http://www.nytimes.com/2014/02/18/world/middleeast/bombings-
in-syria-force-wave-of-civilians-to-flee.html?hp&_r=0)

~~~
parsabg
thanks for the feedback! as you may know, NYT articles are behind a paywall
and fetching them can be problematic. so I believe Alchemy uses the NYT API to
fetch articles, which is something we'll look into in future.

------
fnl
Seen quite a few times (NLP web APIs), and my opinion is that this kind of
stuff tends to not be scalable: to be useful, such web APIs have to be able to
do entire articles in just a split fraction of a second. Although I am not
sure (because of the HN storm the API is down), it does not seem this tool
will live up to those expectations, either. In the end, my choice always has
been to include/wrap an off-the-shelf tool in your own pipeline rather than
relying on a external service that might be too slow for end-users and mass
mining alike...

~~~
tropicalmug
What tools would you suggest for doing this? Or even what algorithms to
implement for doing this sort of work?

------
drakaal
This is a much better Noun Phrase / Entity extractor.

[https://www.mashape.com/stremor/noun-entity-extraction-
noun-...](https://www.mashape.com/stremor/noun-entity-extraction-noun-phrase-
part-of-speech-tagger-alpha)

We don't rely on CoreNLP, or NLTK, we have our own sentence disambiguation,
and our own part of speech tools. So we are a lot faster.

Our other api's let you piece together a lot of cool NLP projects with very
little code.

------
mattmcknight
These sorts of things are typically better offered as libraries, particularly
as the training is usually specific to a corpus, or a particular context.

It would be a nice to offer a library with a bootstrapped training set.

~~~
phillmv
Not to mention either the sensitivity behind the data, the sheer volume behind
it, or the effort involved in customizing it for a particular algorithm or
input - only for it to shut down and take your data with you.

Machine Learning as a Service seems Hella Neat, tho.

~~~
LambdaAlmighty
Sorry, don't understand your last sentence.

It seems to contradict the paragraph before -- ML as a service seems a
terrible idea for the reasons you just listed (among others). What's "Hella
Neat" about that?

~~~
phillmv
The problem mostly stems from the vast risk you take on from making a large
investment in an unstable/unproven platform vendor.

Servers are relatively fungible, given ops automation; it's painful but not
the end of the world if you have to migrate away.

But the technology is still relatively immature in that building your own ML
service in house - and having it scale, etc - is still a big pain.

I would immensely prefer it if we first brought ML libraries up to a higher
level of maturity - as simple as apt-get install and adding `includes
ActiveLearning::Bayes` to your models.

But if a client came to me tomorrow and said "there's this great Amazon API
that we're thinking of using" I wouldn't consider that insane on first
principles.

------
kenshiro_o
Unfortunately the web site is still analyzing the example Techcrunch link
(it's been 3 min already).

Is something broken? Maybe you could cache some recurring analyses.

~~~
parsabg
sorry, our servers are melting :-) spawning new machines.

~~~
adrenalinup
You could cache the results of the examples that are on the right ;]

~~~
parsabg
good idea!

------
zvanness
Hey guys! Congrats, NLP is a huge problem that needs as many minds working on
it as possible.

Just tried a few links:

[http://arstechnica.com/security/2014/02/dear-asus-router-
use...](http://arstechnica.com/security/2014/02/dear-asus-router-user-youve-
been-pwned-thanks-to-easily-exploited-flaw/)

[http://blog.algore.com/2011/07/the_great_lakes_are_in_danger...](http://blog.algore.com/2011/07/the_great_lakes_are_in_danger.html)

Am I missing something here? It seems like it's just parsing text, i'm not
seeing any context(keywords, categories, summaries)

edit: It's giving fantastic results when pasting the raw text! :)

Are you guys using DBpedia? It's giving very similar results to a system I was
working on in the past:
[http://www.zachvanness.com/nanobird_relevancy_engine.pdf](http://www.zachvanness.com/nanobird_relevancy_engine.pdf)

~~~
parsabg
thanks for the feedback. can't reproduce the first issue, what happens when
you click on Analyze? do you mind sending us a screenshot?

we do use DBPedia in our Concept Extraction. please have a look at the docs:
[http://aylien.com/text-api-doc](http://aylien.com/text-api-doc)

~~~
zvanness
You're welcome!

Sure thing(when running it on the urls, I don’t get any keywords:
[http://i.cubeupload.com/zubo4G.png](http://i.cubeupload.com/zubo4G.png)

~~~
parsabg
thanks, keywords are under "Entities".

------
blueblob
What do you use for the extraction of entities (if you don't mind saying)? I
entered "The Cat in the Hat" is a good book. It didn't recognize any entities.
Are you using an ontology for named entity resolution, or just extracting NPs?

~~~
parsabg
a combination of different techniques (NPs, statistical models, dictionary
based matching) are used in our EE endpoint.

------
analytically
Another player in this space, from Oxford, UK:
[http://apidemo.theysay.io/](http://apidemo.theysay.io/)

------
imperio59
It does really poorly analyzing a Wiktionary entry like
[http://en.wiktionary.org/wiki/run](http://en.wiktionary.org/wiki/run) or with
a Wikipedia article like
[http://en.wikipedia.org/wiki/Big_O_notation](http://en.wikipedia.org/wiki/Big_O_notation)

------
bane
Playing around with it and seemed to have killed it by pasting the text from
this WP article ([http://pastebin.com/AtCU7E8H](http://pastebin.com/AtCU7E8H))
in and hitting analyze. It's been spinning for a while.

 _edit_ I see from another response that the server room is on meltdown, I'll
wait for a bit.

------
crypto5
Maybe somebody will find useful and relevant my pet project:
[https://github.com/crypto5/wikivector](https://github.com/crypto5/wikivector)
. It uses machine learning and wikipedia data as training set, supports 10
languages, and completely open source.

------
syllogism
Do you publish accuracy figures? Any information about what domains your
training data is from?

~~~
parsabg
> Do you publish accuracy figures?

we'd love to, but unfortunately some of our main competitors have restricting
terms in their ToS (e.g.
[http://www.alchemyapi.com/company/terms.html](http://www.alchemyapi.com/company/terms.html))
that prevent us from doing so. we will publish what we can though.

> Any information about what domains your training data is from?

they're mostly trained on general news and social media content (with lots of
manual and automated cleanup). drop us an email if you need more details:
hello@aylien.com

~~~
yen223
I'm curious - how does a competitor's ToS prevent your company from doing
anything?

~~~
Blahah
The competitors don't allow you to benchmark their services, so while you can
benchmark your own product you can't compare it to others. For example, from
the Alchemy API:

YOU MAY NOT ACCESS THE SERVICES FOR PURPOSES OF MONITORING THEIR AVAILABILITY,
PERFORMANCE OR FUNCTIONALITY, OR FOR ANY OTHER BENCHMARKING OR COMPETITIVE
PURPOSES

~~~
malkung
Also this: "publish or perform any benchmark or performance tests or analysis
relating to the Service or the use thereof without express authorization from
AlchemyAPI;"

Suppose I am evaluating their service, before I decide to buy. I would be
breaking these ToS, I guess.

------
polskibus
There's more and more of text analysis APIs, would you mind comparing your
feature set to something like Textrazor
([http://www.textrazor.com](http://www.textrazor.com)) or Open Calais?

What is special about your project ?

~~~
bduerst
I would also like a comparison. I used Open Calais two years ago for a
project, and would definitely use it again if needed.

Edit: A quick glance at the API also shows that there doesn't appear to be
much in the way of machine learning. Does this build models for you or is it
just to dissect text?

------
skiplecariboo
Super nice !

This is a very interesting area... Good to see something new apart from
Alchemy and opencalais !

~~~
parsabg
thanks for the feedback. there's a lot of room for improvement in this space.

------
cliveowen
"There was a time when men could roam free on earth, free from concrete and
tarmac. Now it's all gone to shit."

Classification: arts, culture and entertainment - architecture .(WTF?)

Polarity: positive. (Nope)

Polarity confidence: 0.9994709276706056. (Well...)

Looks pretty rough to me.

~~~
guptaneil
Why does that classification elicit a WTF? That seems like a reasonable
classification, given how little context the algorithm has about the snippet.
It's entirely plausible for that quote to be from a book about how "concrete
and tarmac" have impacted modern architecture. There's not really any other
hints about what it could be about.

There's no excuse for the polarity though. "Gone to shit" should be a pretty
good indicator about the sentiment.

------
ksk
A bunch of TA libraries (Stemmers, Wordbreakers, etc) ship "free" with Windows
that support a ton of different languages. I wish MS would open up the API a
bit more.

------
elwell
Clearly broken. Say's news.ycombinator.com sentiment is "Positive". All jokes
aside, really cool; love the accessibility of the demo.

------
cglace
I posted a couple of paragraphs from a financial blog and the tool interpreted
SEC to mean Southeastern Conference.

~~~
philipp-spiess
MVP - Most Valuable Player - even said I should say this as a hashtag,
although i meant Minimal Viable Product

------
moron4hire
Should I have not tried it with a 3000 word essay I wrote? It has been
beachballing for the last 5 minutes or so.

~~~
nomadcoop
I probably shouldn't have tried a 12,000 word short story...

------
adventured
How is this superior to Alchemy?

[http://www.alchemyapi.com/](http://www.alchemyapi.com/)

~~~
parsabg
in at least two main areas:

\- corpuses: we update our indexes frequently + use higher quality /
handpicked corpuses.

\- features: our API provides Summarization and Hashtag Suggestion.

and future plans, obviously. hope that helps.

~~~
LambdaAlmighty
The plural is "corpora".

~~~
ivan_ah
I looked this up recently and corpuses is also OK, though corpora is by far
the most common usage.

------
mrg3_2013
I tried bbc.com and nothing shows up. Is it supposed to work on top level
links and summarize ?

~~~
parsabg
not really, it works best on homogenous pieces of content.

~~~
mrg3_2013
OK. Summarizing a top level content (parse headlines and generate nugget
summary) would be a very useful feature, if you can do it.

------
Houshalter
I can't get it to work, can someone tell me what it's supposed to do?

------
parsabg
thanks for the feedback folks. FWIW, here's the documentation (/ NLP crash
course!): [http://aylien.com/text-api-doc](http://aylien.com/text-api-doc)

------
iamwithnail
Annnnnnd that's my thesis sorted. Part of it anyway.

------
afshinmeh
One of stunning stuffs that I've seen. Good job.

------
lukasm
HN - the ultimate DDOS machine

~~~
bertil
The upsite to that is that: the on-line helped asked me if there was anything
it could do; I responded “The sites seems slow.“ and I had a perfectly
appropriate answer.

------
jhbellz
pretty cool - what languages does your API support?

~~~
parsabg
you mean programming or human languages?

~~~
geoffroy
I'd be interested to know which human languages you support

~~~
parsabg
ATM all endpoints except Language Detection (which supports 76 languages) only
support English. 8 new languages are on the roadmap.

~~~
geoffroy
Thanks ! Hope French is on this roadmap !

------
mm0
sell it to a bank $$$

------
jackson1988
This is incredible!

------
hamed_r
Interesting!

