Cool stuff! It's nice to see platforms like this which abstract out good algorithms, so that developers can worry about thinking of interesting applications. .Open source libs are even better, but pragmatically speaking, I think these types of platforms probably move faster and get better results.
One major competitor (well known for anyone who's looked into this stuff) is Alchemy [1]. I tried a New York Times link [2] on Aylien and Alchemy, and Alchemy performed much better -- in fact, Aylien didn't even successfully find the article body. I'm sure you guys will be iterating on improving the algorithms, but just wanted to flag that as a potential turnoff for anyone comparing your website demo with Alchemy.
thanks for the feedback! as you may know, NYT articles are behind a paywall and fetching them can be problematic. so I believe Alchemy uses the NYT API to fetch articles, which is something we'll look into in future.
Seen quite a few times (NLP web APIs), and my opinion is that this kind of stuff tends to not be scalable: to be useful, such web APIs have to be able to do entire articles in just a split fraction of a second. Although I am not sure (because of the HN storm the API is down), it does not seem this tool will live up to those expectations, either. In the end, my choice always has been to include/wrap an off-the-shelf tool in your own pipeline rather than relying on a external service that might be too slow for end-users and mass mining alike...
These sorts of things are typically better offered as libraries, particularly as the training is usually specific to a corpus, or a particular context.
It would be a nice to offer a library with a bootstrapped training set.
Not to mention either the sensitivity behind the data, the sheer volume behind it, or the effort involved in customizing it for a particular algorithm or input - only for it to shut down and take your data with you.
Machine Learning as a Service seems Hella Neat, tho.
It seems to contradict the paragraph before -- ML as a service seems a terrible idea for the reasons you just listed (among others). What's "Hella Neat" about that?
The problem mostly stems from the vast risk you take on from making a large investment in an unstable/unproven platform vendor.
Servers are relatively fungible, given ops automation; it's painful but not the end of the world if you have to migrate away.
But the technology is still relatively immature in that building your own ML service in house - and having it scale, etc - is still a big pain.
I would immensely prefer it if we first brought ML libraries up to a higher level of maturity - as simple as apt-get install and adding `includes ActiveLearning::Bayes` to your models.
But if a client came to me tomorrow and said "there's this great Amazon API that we're thinking of using" I wouldn't consider that insane on first principles.
I contacted them about this using the live chat on the site. Their servers are melting down but it sounds like they're on it spinning up new instances etc.
What do you use for the extraction of entities (if you don't mind saying)? I entered "The Cat in the Hat" is a good book. It didn't recognize any entities. Are you using an ontology for named entity resolution, or just extracting NPs?
Playing around with it and seemed to have killed it by pasting the text from this WP article (http://pastebin.com/AtCU7E8H) in and hitting analyze. It's been spinning for a while.
edit I see from another response that the server room is on meltdown, I'll wait for a bit.
Maybe somebody will find useful and relevant my pet project: https://github.com/crypto5/wikivector .
It uses machine learning and wikipedia data as training set, supports 10 languages, and completely open source.
we'd love to, but unfortunately some of our main competitors have restricting terms in their ToS (e.g. http://www.alchemyapi.com/company/terms.html) that prevent us from doing so. we will publish what we can though.
> Any information about what domains your training data is from?
they're mostly trained on general news and social media content (with lots of manual and automated cleanup). drop us an email if you need more details: hello@aylien.com
The competitors don't allow you to benchmark their services, so while you can benchmark your own product you can't compare it to others. For example, from the Alchemy API:
YOU MAY NOT ACCESS THE SERVICES FOR PURPOSES OF MONITORING THEIR AVAILABILITY, PERFORMANCE OR FUNCTIONALITY, OR FOR ANY OTHER BENCHMARKING OR COMPETITIVE PURPOSES
Also this: "publish or perform any benchmark or performance tests or analysis relating to the Service or the use thereof without express authorization from AlchemyAPI;"
Suppose I am evaluating their service, before I decide to buy. I would be breaking these ToS, I guess.
There's more and more of text analysis APIs, would you mind comparing your feature set to something like Textrazor (http://www.textrazor.com) or Open Calais?
I would also like a comparison. I used Open Calais two years ago for a project, and would definitely use it again if needed.
Edit: A quick glance at the API also shows that there doesn't appear to be much in the way of machine learning. Does this build models for you or is it just to dissect text?
Why does that classification elicit a WTF? That seems like a reasonable classification, given how little context the algorithm has about the snippet. It's entirely plausible for that quote to be from a book about how "concrete and tarmac" have impacted modern architecture. There's not really any other hints about what it could be about.
There's no excuse for the polarity though. "Gone to shit" should be a pretty good indicator about the sentiment.
A bunch of TA libraries (Stemmers, Wordbreakers, etc) ship "free" with Windows that support a ton of different languages. I wish MS would open up the API a bit more.
The upsite to that is that: the on-line helped asked me if there was anything it could do; I responded “The sites seems slow.“ and I had a perfectly appropriate answer.
One major competitor (well known for anyone who's looked into this stuff) is Alchemy [1]. I tried a New York Times link [2] on Aylien and Alchemy, and Alchemy performed much better -- in fact, Aylien didn't even successfully find the article body. I'm sure you guys will be iterating on improving the algorithms, but just wanted to flag that as a potential turnoff for anyone comparing your website demo with Alchemy.
Best of luck!
[1] http://www.alchemyapi.com/products/demo/
[2] http://www.nytimes.com/2014/02/18/world/middleeast/bombings-...