SLING: A Natural Language Frame Semantic Parser

syllogism · on Nov 15, 2017

Disappointed to see zero mention of related work, and no comparison against the many other papers that do transition-based SRL, many of which can produce arbitrary output graphs.

The authors use a transformation of the PropBank data, so their numbers aren't directly comparable against the literature. That's okay --- the PropBank format itself isn't that great, and we'd like a better balance of usability and learnability. However there have been dozens of systems published against a different graph transformation of the PropBank data, e.g. the CoNLL 2008 and 2009 shared task: http://ufal.mff.cuni.cz/conll2009-st/#task

There should be no problem to compare these systems against the system in this paper. The unfair comparison would be to take a system that produces PropBank output, and transform its output into this format. This comparison would be unfair because the system would be optimising a different measure. A fairer comparison would be to take a system that can produce arbitrary graphs, and train it on the authors' format.

The transition system looks quite interesting, and pretty easy to try -- so I'm actually tempted to implement this and give it a go. However with no meaningful evaluation we have no way of knowing whether it outperforms the many alternatives that have been proposed, e.g. https://www.semanticscholar.org/search?year%5B%5D=2017&year%...

The accompanying paper is described as a tech report, so arguably it shouldn't be held to the same standard. But still. I'd have hoped to see more leadership from the anchor author Fernando Perreira -- who has written plenty of relevant literature on this, starting in the 80s.

The field is already very difficult to follow, even if papers are more forthcoming about how they fit into previous work. Without that, the claims in each paper become very difficult to understand. I would therefore suggest rejection of this paper if I received it for blind review.

nl · on Nov 15, 2017

I think this comes across as a little too critical.

I agree with everything you have said. But to me it looks like this came from an industrial implementation which they decided to open source.

That means they weren't aiming for SOTA on standard datasets, but for what worked best on their internal benchmarks. The evaluation procedures for the published report looks like it was derived from how they train it internally.

That means they probably had to justify all the time they were spending on open sourcing it, and writing a tech report.

So while it would be great to see standard datasets tried, I much prefer seeing the code open sourced.

Also, nothing stopping someone else publishing a nice short paper on the CoNLL datasets using this software.

syllogism · on Nov 16, 2017

I'm not sure why you're thinking this was an industrial implementation? It's built on DRAGNN, which is pretty new, so that doesn't give much time for it to be in production. Also, if it were in production the blog post would surely say that -- instead it's quite clearly marked as an experimental system, for research. The lead author seems to be a researcher, too.

nl · on Nov 16, 2017

The whole thing feels like it is bits of code extracted or rewritten out of an existing system.

Take the whole Myelin JIT compiler. That's a pretty weird thing to build just for research into frame parsing.

kleiba · on Nov 15, 2017

It's especially disappointing as the author of SEMAFOR, Dipanjan Das, now works for Google too.

_54qb · on Nov 16, 2017

I know this is a blog post and not a research paper, but I see two big flaws in it that may mislead the reader:

(i) failing to acknowledge that this same task (semantic role labelling / semantic graph generation) is performed by many existing NLP tools and not even hinting at some comparison results. One would think Semantic Graphs (c) are a novel thing invented at Google (TM).

(ii) Presenting the fact that all the intermediate features are magically computed inside a black box as an improvement. This is, to me, a huge loss. There is great value in this classical NLP pipeline they so much criticise, especially in terms of explainability: "Why is mail an agent in the semantic graph? Because the tagger wrongly tagged it as a noun". I can (easily enough) correct the above in a classical pipeline, while I'm wondering how should I stir those neurons in Google's implementation to fix it.

Anyway, this ended up a bit more ranty than I expected. I actually am looking forward to try this out and see how well it works. I think it's very positive that google open sources their research projects, as I find this is surprisingly uncommon in academia.

nl · on Nov 15, 2017

If you are interested in this, the AllenAI group has an implementation of their model for a very closely related task (semantic role labeling) online: http://demo.allennlp.org/semantic-role-labeling/

PaulHoule · on Nov 15, 2017

In contrast to most of the other NLP tasks people train neural networks to do, this one is actually useful. (ex. the frame corresponds to some action that can be passed onto conventional software)

DonaldPShimoda · on Nov 15, 2017

> In contrast to most of the other NLP tasks people train neural networks to do

Could you give some examples of these types of tasks (and maybe associated projects/papers) that you think are not actually useful?

PaulHoule · on Nov 15, 2017

sentiment analysis over broad domains, for one. predicting the next token for Penn Treebank.

zbyte64 · on Nov 15, 2017

First time I heard of DRAGNN, my mind is blown: https://github.com/tensorflow/models/blob/master/research/sy...

syllogism · on Nov 15, 2017

Mostly DRAGNN is a lot of hard work to compensate for how difficult Tensorflow makes your life if you're trying to write this type of model. If you use a library like Chainer, PyTorch or DyNet, the problem this solves simply never occurs.

Eridrus · on Nov 16, 2017

This may be true for training, but I have not heard of a real deployment story for low latency NLP models in any of these define by run frameworks, which this attempts to solve.

syllogism · on Nov 16, 2017

I haven't benchmarked, but I think you'll find DyNet to be significantly faster than DRAGNN in that setting.

I'm also not sure DRAGNN is a good direction to head in, if that's the problem. You'll never ever be able to change anything in it, so if performance isn't good you're stuck. Writing the forward pass isn't very difficult, so I'd much rather be able to replace parts of a network with optimised code if necessary.

Eridrus · on Nov 16, 2017

Would be good to see some non-batch inference benchmarks for this. I couldn't find any numbers online.

igravious · on Nov 15, 2017

Posted here: https://news.ycombinator.com/item?id=13913962 8 months ago by @Katydid but got zero traction.

I think we're perhaps at peak NN :/