
SLING: A Natural Language Frame Semantic Parser - runesoerensen
https://research.googleblog.com/2017/11/sling-natural-language-frame-semantic.html
======
syllogism
Disappointed to see zero mention of related work, and no comparison against
the many other papers that do transition-based SRL, many of which can produce
arbitrary output graphs.

The authors use a transformation of the PropBank data, so their numbers aren't
directly comparable against the literature. That's okay --- the PropBank
format itself isn't that great, and we'd like a better balance of usability
and learnability. However there have been dozens of systems published against
a different graph transformation of the PropBank data, e.g. the CoNLL 2008 and
2009 shared task:
[http://ufal.mff.cuni.cz/conll2009-st/#task](http://ufal.mff.cuni.cz/conll2009-st/#task)

There should be no problem to compare these systems against the system in this
paper. The unfair comparison would be to take a system that produces PropBank
output, and transform its output into this format. This comparison would be
unfair because the system would be optimising a different measure. A fairer
comparison would be to take a system that can produce arbitrary graphs, and
train it on the authors' format.

The transition system looks quite interesting, and pretty easy to try -- so
I'm actually tempted to implement this and give it a go. However with no
meaningful evaluation we have no way of knowing whether it outperforms the
many alternatives that have been proposed, e.g.
[https://www.semanticscholar.org/search?year%5B%5D=2017&year%...](https://www.semanticscholar.org/search?year%5B%5D=2017&year%5B%5D=2017&q=incremental%20semantic%20parser&sort=relevance)

The accompanying paper is described as a tech report, so arguably it shouldn't
be held to the same standard. But still. I'd have hoped to see more leadership
from the anchor author Fernando Perreira -- who has written plenty of relevant
literature on this, starting in the 80s.

The field is already very difficult to follow, even if papers are more
forthcoming about how they fit into previous work. Without that, the claims in
each paper become very difficult to understand. I would therefore suggest
rejection of this paper if I received it for blind review.

~~~
nl
I think this comes across as a little too critical.

I agree with everything you have said. But to me it looks like this came from
an industrial implementation which they decided to open source.

That means they weren't aiming for SOTA on standard datasets, but for what
worked best on their internal benchmarks. The evaluation procedures for the
published report looks like it was derived from how they train it internally.

That means they probably had to justify all the time they were spending on
open sourcing it, and writing a tech report.

So while it would be great to see standard datasets tried, I much prefer
seeing the code open sourced.

Also, nothing stopping someone else publishing a nice short paper on the CoNLL
datasets using this software.

~~~
syllogism
I'm not sure why you're thinking this was an industrial implementation? It's
built on DRAGNN, which is pretty new, so that doesn't give much time for it to
be in production. Also, if it were in production the blog post would surely
say that -- instead it's quite clearly marked as an experimental system, for
research. The lead author seems to be a researcher, too.

~~~
nl
The whole thing feels like it is bits of code extracted or rewritten out of an
existing system.

Take the whole Myelin JIT compiler. That's a pretty weird thing to build just
for research into frame parsing.

------
setzer22
I know this is a blog post and not a research paper, but I see two big flaws
in it that may mislead the reader:

 _(i)_ failing to acknowledge that this same task (semantic role labelling /
semantic graph generation) is performed by many existing NLP tools and not
even hinting at some comparison results. One would think Semantic Graphs (c)
are a novel thing invented at Google (TM).

 _(ii)_ Presenting the fact that all the intermediate features are magically
computed inside a black box as an improvement. This is, to me, a huge loss.
There is great value in this classical NLP pipeline they so much criticise,
especially in terms of explainability: "Why is _mail_ an agent in the semantic
graph? Because the tagger wrongly tagged it as a _noun_ ". I can (easily
enough) correct the above in a classical pipeline, while I'm wondering how
should I stir those neurons in Google's implementation to fix it.

Anyway, this ended up a bit more ranty than I expected. I actually am looking
forward to try this out and see how well it works. I think it's very positive
that google open sources their research projects, as I find this is
surprisingly uncommon in academia.

------
nl
If you are interested in this, the AllenAI group has an implementation of
their model for a very closely related task (semantic role labeling) online:
[http://demo.allennlp.org/semantic-role-
labeling/](http://demo.allennlp.org/semantic-role-labeling/)

------
PaulHoule
In contrast to most of the other NLP tasks people train neural networks to do,
this one is actually useful. (ex. the frame corresponds to some action that
can be passed onto conventional software)

~~~
DonaldPShimoda
> In contrast to most of the other NLP tasks people train neural networks to
> do

Could you give some examples of these types of tasks (and maybe associated
projects/papers) that you think are not actually useful?

~~~
PaulHoule
sentiment analysis over broad domains, for one. predicting the next token for
Penn Treebank.

------
zbyte64
First time I heard of DRAGNN, my mind is blown:
[https://github.com/tensorflow/models/blob/master/research/sy...](https://github.com/tensorflow/models/blob/master/research/syntaxnet/g3doc/DRAGNN.md)

~~~
syllogism
Mostly DRAGNN is a lot of hard work to compensate for how difficult Tensorflow
makes your life if you're trying to write this type of model. If you use a
library like Chainer, PyTorch or DyNet, the problem this solves simply never
occurs.

~~~
Eridrus
This may be true for training, but I have not heard of a real deployment story
for low latency NLP models in any of these define by run frameworks, which
this attempts to solve.

~~~
syllogism
I haven't benchmarked, but I think you'll find DyNet to be significantly
faster than DRAGNN in that setting.

I'm also not sure DRAGNN is a good direction to head in, if that's the
problem. You'll never ever be able to change anything in it, so if performance
isn't good you're stuck. Writing the forward pass isn't very difficult, so I'd
much rather be able to replace parts of a network with optimised code if
necessary.

~~~
Eridrus
Would be good to see some non-batch inference benchmarks for this. I couldn't
find any numbers online.

