
SyntaxNet: Neural Models of Syntax - taigeair
https://github.com/tensorflow/models/tree/master/syntaxnet
======
mwnivek
HN discussion 3 days ago:
[https://news.ycombinator.com/item?id=11686029](https://news.ycombinator.com/item?id=11686029)

------
ktRolster
If anyone is interested in understanding NLP, starting with little knowledge,
I found that the NLTK book was quite helpful:
[http://www.nltk.org/book/](http://www.nltk.org/book/)

~~~
pyvpx
this (SyntaxNet) is also about natural language understanding (NLU)

------
vagabondvector
[http://arxiv.org/abs/1603.06042](http://arxiv.org/abs/1603.06042)

this is the paper.

the idea is based on the incredible incremental perceptron coupled with beam
search for having different hypotheses at the same time trying to find a
policy which will build them and select the best one

[http://dl.acm.org/citation.cfm?id=1218970](http://dl.acm.org/citation.cfm?id=1218970)

The approach to structured prediction that was the fastest at the time was
that incremental perceptron and the implementation extremely simple.

Some other approaches includes conditional random fields, hidden markov
models, maximum margin markov networks, maximum entropy markov model and the
approach they cite called SEARN (learning to search method - DAgger (Ross et
al., 2011) and AggreVaTe (Ross & Bagnell, 2014) and LOLS (Chang et al.,
2015)).

[http://arxiv.org/abs/1502.02206](http://arxiv.org/abs/1502.02206)

Although, the syntaxnet team incorrectly dismisses SEARN (implying it suffers
from label bias) but it isn't justified.

Their approach approximates 0/1 (hamming) loss with log-loss and works only on
problems where the output decomposes over a sequence of decisions where each
decision can have that incremental loss.

LOLS can work on arbitrary loss functions. For example, lets say you are
translating a text from english to chinese, how can you say what the loss is
when you've translated half a document? It's hard if not impossible to
decompose the loss over decisions. (It's easy for part-of-speech tagging, the
tag is either correct or not, or for dependency parsing, either the parent
word is correct or not)

LOLS is a much superior method. It can also be combined with any binary
classifier - SVM with any kernel, perceptron, logistic regression, to guide
the decisions.

SyntaxNet uses a non-recurrent network to find the best parsing policy and the
same network can be used on the LOLS approach.

Problem of structured prediction is incredible and extremely interesting. For
example, doing POS tagging and dependency parsing at the same time can
increase the performance on both task, same would be accomplished if one
recognizes the named entities and at the same time tries to extract
relationship between them.

It's very nice to see past insights applied with heavier machinery. Exciting
times!

