Hacker News new | past | comments | ask | show | jobs | submit login
On Extractive and Abstractive Summarization with Transformer Language Models (arxiv.org)
106 points by hirundo 9 days ago | hide | past | web | favorite | 12 comments

It's definitely cute that the abstract was generated by the model, but I wouldn't give that too much weight because it's the definition of cherry-picking. In this case, you can pick your data (the contents of the paper) to match a desirable output from the model (the abstract).

I think it would be a lot of effort to keep changing the paper until you get the perfect abstract. It would be easier to train different models or do random sampling from the predicted distribution.

This is impressive. They trained it on 200k articles from arXiv, 130k from PubMed, and over 1 million each from Newsroom and BigPatent. They have comparisons of generated abstracts versus actual abstracts of some landmark NLP papers.

My only gripe is that I would have liked to see (maybe in an Appendix) examples on papers on completely different topics, say one in biology, one in math and one in physics. It would be difficult to pick good examples, sure. But it would significantly strengthen at least my impression of the transferability.

Would be interesting to see what kind of paper it writes.

I really wish that the code or at least a working demo was a requirement to make such claims public.

Playing the devil's advocate here: might I ask why? The paper seems pretty thorough w.r.t. the description of used corpora, models, and hyperparameters. They even point to the exact implementation of their evaluation scoring and include a few examples in the paper itself. Even if they put up a demo instance for the required infrastructure it would be dead as soon as it hit HN and, as research code goes, likely a security hazard to wherever it's hosted.

In my view there seems to be enough here to replicate and validate the claims yourself if you wanted to. With a basic level of trust in academic integrity I'm completely fine with this paper.

From how they set up the training, I think this is a nontrivial task. Also, from a casual read through, it looks like it is generally focused on Arxiv papers.

To their credit, the authors included the models used and the metrics they used to validate their model. They also have detailed notes on the architecture for training which, at a quick glance, doesn't look easy to replicate unless you can borrow some GPU's in the cloud.

It focused on arxiv because you need a large set of labelled data (i.e. long documents with summaries). There is not many datasets of that kind out there.

Imagine a future where 'this abstract was generated by the model' is in the training material for future papers https://twitter.com/jonathanfly/status/1171551688668471297

Now if that is true, that is one badass summary!

(The submitted title was "This abstract was generated by one of the models presented in this paper".)


Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact