Hacker News new | past | comments | ask | show | jobs | submit login

Sorry but POS tagging is pretty much solved already. It is already in the 97+% [1]. Current papers are now mostly improving it by less than one percent.

1. http://nlp.stanford.edu/pubs/CICLing2011-manning-tagging.pdf




It's true that POS tagging works fairly well. But consider that a sentence involves more than one word. Even at 97 % accuracy for one word, the probability of correctly tagging every word in a short sentence of only ten words, is still as low as 0.97^10 = 0.74. And sentences are generally longer than ten words.

And as POS tagging is usually only done as preprocessing for some other task like syntactically parsing a text (which itself is usually preprocessing for yet another task), 97 % accuracy per word is not as good as it sounds. Parsers need to work with wrong data for every second or third sentence.


Indeed: the first paragraph of the linked paper says "Current good taggers have sentence accuracies around 55–57%".

(This surprises me. I would expect accuracy for different words in a sentence to be correlated, you either make no errors or several.)


Whoops, I didn't even look into the paper. Kind of makes my comment superfluous …


For the record, 97% on canonical test datasets with little recent progress doesn't mean that it's a solved problem. Admittedly, part of the problem is that elementary school POS categories aren't a great model of natural language.

More generally, for true syntax highlighting I think you do need the parse tree, and parsers (as opposed to taggers) definitely aren't at 97%.


They definitely aren't at 97% but they aren't that bad either. For English they are at around 92% (see for example http://arxiv.org/pdf/1603.04351.pdf) in labelled attachment score (right tag+right label).

If you are interested only in the syntactic tag, not in the structure of the tree, the number is somewhat higher.


Yes, you can get 97-98%, but only when evaluating on data from the same corpus as you trained on. If you evaluate on data from a different corpus, you immediately get a pretty big drop in performance. Thus one person in the field I've talked to even went so far as to say that competing in this part of the field (state-of-the-art performance, basically) is fundamentally a question of "who is the best at overfitting".

There's basically no part of NLP that's a solved problem. Even something as superficially simple as segmenting running text into sentences and tokens is decidedly non-trivial.


To be fair, I'm not a computational linguist (some of my friends did their PhDs in the field though). From what I remember, one of the most glaring issues in the field is that the most used corpus is a bunch of issues of the Wall Street Journal (which is a very specific data set).

The 80% figure was quoted from some talk I heard a few years ago, so I concede it's almost certainly improved since then.


On the other hand, 97% sounds impressive, but also would, on average, be slightly less than one error in your post (and more than one in this one)

Graded as a school exercise, I think 97% wouldn't be that good.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: