

Named Entity Recognition: Examining the Stanford NER Tagger - jmilinovich
http://blog.urx.com/urx-blog/2015/7/28/named-entity-recognition-examining-the-stanford-ner-tagger

======
nxb
Next, try the taggers on a more realistic setting than the standard corpuses
-- e.g. a product review that compares several products, and you'll instantly
see how incredibly poor the current state of the art NER is.

Technology is really going to advance once we have anything that comes close
to human level on NER and relation extraction. Kind of like self driving cars,
the basic ideas have been around for decades, but performance in realistic
adverse conditions remains awful for almost everywhere that it could
theoretically be used.

~~~
boomzilla
That is because the taggers are not trained on the same data. You can't expect
taggers trained on wikipedia data to do well in anything but other wikipedia
articles. On the other hand, if one has access to Amazon review data, (with
links to the product catalog), I am pretty sure a tagger that does well on
Amazon data can be trained.

~~~
zeerakw
Well that depends, if you somehow manage to link well across different domains
it can be done. Take a look at the Lowlands project from Copenhagen University
([http://lowlands.ku.dk](http://lowlands.ku.dk)), which deals specifically
with cross domain adaptation.

You are right that reasonable domains are required though.

------
zeerakw
It's always nice to know that your masters programme requires more of you in
just an exam (building a Relation Extraction pipeline including POS tagging
and a NER system).

Having said that, it's been shown pretty well that CRF's outperform the
Stanford Parser with simple features (it can get even better with better
features - particularly for organisations), which also beat out HMM's but it
could be interesting to see how neural networks would do.

~~~
lfowles
Where did you see this was for a masters programme?

Also, it depends. When I was working on my masters (which I didn't finish :)
), there were three options:

* Thesis * Project * Extra Courses

