
Can science writing be automated? - wjSgoWPm5bWAhXB
http://news.mit.edu/2019/can-science-writing-be-automated-ai-0418
======
eatbitseveryday
Reminds me of this earlier work from students also at MIT, called SCIGen [1].
You can type in author names and it’ll generate some seemingly coherent text
for you as a “research paper”. They even submitted it to a conference.

[1]
[https://pdos.csail.mit.edu/archive/scigen/](https://pdos.csail.mit.edu/archive/scigen/)

------
ssivark
Can deep learning paper writing be automated? :P

These neural network hype articles are barely one step down from Buzzfeed
listicles in what they project versus what they claim versus what they have
actually done. Scan from a list of really ambitious goals, scan among list of
available datasets (God forbid, no scientist should have to actually collect
and clean new data pertaining to the problem!), throw a bunch of neural
networks architectures at it leveraging cloud compute, and publish a paper
based on whatever sticks. The process isn't complete without a title involving
the word "deep" and a PR announcement generated by a neural network trained on
Buzzfeed articles.

Maybe I'm slightly too harsh. The actual paper in this case is not making
crazy grandiose claims, but this article is certainly doing no favors. If this
is the quality of research reporting, then it might as well be automated.

~~~
bonoboTP
But this is not neutral research reporting. It's an MIT press release about
MIT research. It's PR, of course.

~~~
azeotropic
Considering that the vast majority of 'research reporting' involves rephrasing
press releases, this is a difference without a distinction.

While the article is about generating abstracts rather than press releases, I
don't see why the same approach wouldn't work for both problems. I'm sure a
neural net could be trained to write in press release style instead of
abstract style.

------
userbinator
Maybe it's because I've seen too much of the masses of automatically generated
"articles" that pollute my search results, but I really hope stuff like this
doesn't proliferate, because I think it'll just lead to even more of that spam
--- and be even harder to separate from the useful stuff.

In general, I have the same opinion of this as I do things like automatically
generated comments in code: "If it can be generated by a tool, it's not worth
reading."

~~~
sitkack
If it is much smaller than the source text, summarization can definitely be a
useful tool. Being able to TLDR a wall of text would be awesome, doing it with
a journalistic pyramid would be even better [1].

[1]
[https://en.wikipedia.org/wiki/Inverted_pyramid_(journalism)](https://en.wikipedia.org/wiki/Inverted_pyramid_\(journalism\))

------
bonoboTP
Could this very article have been written simply by applying some model to the
paper text? I doubt it. Look at it, it's full of quotes from the authors, full
of commentary and explaining the context to a lay audience (with a healthy
dose of self-promotion from the side of the university).

Although research papers do contain some context (on a more specialized level,
and with many references to other papers instead of repeating the idea), they
are more about the technical approach, which is less useful for direct use in
the popular article.

One goal of popular science writing is to inspire people, to tell us why we
should be excited about a topic, to see what's going on in fields distant from
ours etc. This is far from "research paper in -> pop sci article out".

TL;DR: _Summarizing_ a research paper is not the same as writing an
interesting pop sci article about a research project.

------
stubish
It would be interesting if authors used a tool like this, to see if the
automated summary matches the point they are trying to make. If the tool gets
it wrong, maybe your paper needs improving.

~~~
userbinator
_If the tool gets it wrong, maybe your paper needs improving._

Or maybe it's the tool that needs improving...

~~~
sitkack
Symbiotic coevolution.

------
yiyus
Summarizing scientific articles and writing them are not the same thing. I
think this is very interesting work (although I have not read the paper and my
knowledge of AI and NLP is very superficial), but the headline really puts me
back.

------
hownottowrite
Every scientific paper already contains a summary. It’s called the abstract.

------
no_identd
That( there) leaves "science writing" ill defined and as such this( here)
might seem to yield a problem which consists NOT of well posedness.

Anyway. Here, a very, very much related conference paper from 2017 which I
found last night:

[https://dl.acm.org/citation.cfm?id=3107452](https://dl.acm.org/citation.cfm?id=3107452)
"Discovering Inconsistencies in PubMed Abstracts through Ontology-Based
Information Extraction"

Found via the Google Scholar profile of Nisansa de Silva:

[https://scholar.google.com/citations?user=OgPD66oAAAAJ](https://scholar.google.com/citations?user=OgPD66oAAAAJ)

Whose works I looked into after finding this:

[https://cs.uoregon.edu/area-exam/sensing-puns-its-own-
reword...](https://cs.uoregon.edu/area-exam/sensing-puns-its-own-reword-
automatic-detection-paronomasia)

Because I suspect multilingual Paronomasia, Polysemy & Syllepsis can serve to
decrease the minimum description length of highly complex topics.

For example:

The German noun "Begier" ('Desire', but with an element of greed, i.e. 'greed'
with the prefix be-, so, literally, "begreed" or possibly "bygreed") sounds
almost equal to a a word easily constructible in English, from the stem "beg"
(as in "to beg") and the suffix -ia (as in, "Paronomasia"): Beg-ia, or perhaps
Beggia. Such phonetic similarity can serve as a powerful mnemonic, something
widely exploited in the Torah, and I'll leave explaining that to a Brazilian
statistics professor:

[https://link.springer.com/article/10.1007/s11841-017-0592-y](https://link.springer.com/article/10.1007/s11841-017-0592-y)

(Open Access. Read the PDF version if you want to actually stand a chance of
visually parsing the Hebrew/Aramaic characters.)

Note however that some scholars might find the claim given in that paper
controversial - mostly because one can't find the proposedly alluding elision
(as opposed to repetition) in this list of types/kinds of paronomasia &
polysemy from Noegel's Encyclopedia of Hebrew Language & Linguistics:

For Paronomasia:

* Homoeoprophoron

* Homoioteleuton

* Epanastrophe

* Parasonance

* Homonymic paronomasia

* Bilingual paronomasia

* Anagramic paronomasia

* Hendiadic paronomasia

* Rhyme

* Farrago

* Geminate parallelism and clustering

For Polysemy:

* Contronymic Polysemy

* Double entendres

* Antanaclasis

* Unidirectional polysemy

* Janus parallelism (pivotal polysemy)

* Double polysemy

* Bilingual polysemy

* Polysemy clustering

* Numerical polysemy

* Gematria

* Notarikon

* Acronymy

* Acrostics [abacedarius] (also Telestichs and Menostichs)

* Atbash

* Amphiboly (Amphibology)

Edit #1:

Ugh, accidentally hit the submit button before finishing the comment. I'll
edit the rest in.

Edit #2:

Still editing stuff in.

Edit #3: OK, there.

