
Your tl;dr by an ai: a deep reinforced model for abstractive summarization - etiam
https://metamind.io/research/your-tldr-by-an-ai-a-deep-reinforced-model-for-abstractive-summarization
======
rubyn00bie
While this is pretty neat; FWIW-- I've always been blown away but the
summarization tool built into MacOS. You just select text, hit summarize, and
adjust the length. It works wonderfully-- I used it college all the time for
annotated bibliographies. To be honest, I've always found it good enough and
it's a wildly simple tool (or so it looks) by comparison to using AI.

~~~
kccqzy
I agree. But it was pretty much neglected by Apple these days.

------
alexcnwy
Awesome!

P.s. Richard Socher, one of the authors on the paper taught a great Stanford
course 'CS224d: Deep Learning for Natural Language Processing' with videos and
notes available here:

[http://cs224d.stanford.edu/](http://cs224d.stanford.edu/)

------
lngnmn
According to linguists, there is an inevitable gap between a syntax and
semantics, due to the fundamental principle of arbitrariness of phonemes used.
To put it simple, any sequence of sounds could be associated with any meaning
(given a distinct semantics). Morphemes however, while in some cases could be
used as a direct pointers to the meaning, nevertheless require one or even
several contexts to be interpreted correctly.

TL;DR - there is no way to get proper semantics without mastering appropriate
contexts (domain knowledge) from mere syntax in principle. One would get
certain word patterns, but not the corresponding _deep structure_ (the
intended meaning). Summarization would be arbitrary.

Try to summarize the sermon on the mount.

~~~
skoocda
Some would further argue that learning underlying meaning (which can only
derived from context, AKA the pragmatic), is completely impossible without
_agency_. Until these AI systems can grok information relative to some sort of
spiritual/ philosophical/ metaphysical "self", they won't be able to make
quality judgements.

------
du_bing
Any open source code for trying? It's wonderful!

~~~
phreeza
I have been wondering if it would be good for someone to set up a patreon and
develop clean open source implementations of popular/current machine learning
papers, and get paid for every model he/she puts out.

~~~
EvgeniyZh
...and find out that most authors forgot to mention some critical details :)

~~~
nazka
Ya and like the unoptimized solution for WaveNet when DeepMind published it.
But hey they published so it's cool!

~~~
p1esk
You should ask for your money back.

------
projectorlochsa
I don't think reinforcement learning is equivalent over optimizing joint loss.

I mean, their model executes X steps and then they calculate the loss using
supervised data, use that loss to learn.

The same is being done with machine translation models when they optimize over
BLEU. It's still supervised learning because to calculate the loss you need
reference data.

~~~
andreyk
Is is RL because the loss is non-differentiable - they don't do standard
backprop, but use "self-critical policy gradient training algorithm" (a form
of RL). You could argue it's supervised in the sense that there is ground
truth data, but then again RL also has 'ground truth' in the form of a score
function - they don't provide the ground truth sentence to the model but a
different metric based on the accumulated outputs of the model, so if you
squint you can see how it fits in classic RL terms (though the starting state
is always the same, the action/state space is ridicolous, etc.).

~~~
projectorlochsa
Well, BLEU is non-differentiable and not decomposable over sequence of
translation decisions. Yet I wouldn't call methods reinforcement learning
because loss is tricky.

But yeah, I guess there's more to it than meets the eye.

~~~
andreyk
I suspect (I have not read that much NLP literature) that BLEU is typically
used as evaluation only, not as the training loss. eg Google's "Google’s
Neural Machine Translation System: Bridging the Gap between Human and Machine
Translation" mentions directly optimizing for BLEU, but again via RL and not
supervised learning. It certainly is a quirky example of RL, though... guess
that's the pace new ideas/approaches are introduced these days.

------
mark_l_watson
I heard a similar talk June 2016 at NAACL - this paper is probably a good
improvement. I have spent time writing extractive summmarization code, which
is a much easier problem to solve. The ability to ingest text, form and
internal representation, and then generate a summary is impressive.

~~~
pouta
My Startup does exactly that. I wonder if you're available to share some more
thoughts on this topic.

~~~
mark_l_watson
Sure, I would enjoy talking with you. My email is in my profile.

------
jasonkostempski
That looks pretty long.

