

A review of free sparse sequence taggers for NLP - fnl
http://fnl.es/a-review-of-sparse-sequence-taggers.html

======
sqrt17
Did you test on newspaper text? 3% error rate looks like it, but the caveat
would be that

(i) you want to tag other texts, and

(ii) a license for the Penn Treebank (i.e. the standard training set for
English newspaper text) sets you back by about $3k,

plus another couple thousand if you want a commercial license for the Stanford
tools (although GPLv2 means you can use it in a SaaS without getting one)

~~~
fnl
By the way, regarding your other points (although I think they are a bit off
topic):

As for the cost of corpora: That number is extremely variable, and in many
cases (outside of newswire, in particular) you might have your own training
data. And then there is non-english NLP, too. Last but not least, tagging text
is not the only thing you can apply a CRF (or any graphical model) to...

Regarding paying for a Stanford (or any other software) license: That is
precisely what the text linked to explains - that you do not need to pay for
such stuff - not only are there free tools around, some of them are far better
than the commercially restricted ones.

