This is a short overview of the state of NLG by Robert Dale, co-author of "Building Natural Language Generation Systems", which is basically the book for NLG.
He give a list of commercial providers and concludes that most of them just offer smart templates. This is the important part:
> To the extent that you can tell from the clues to functionality that are surfaced by these various products, all the tools are ultimately very similar in terms of how they work, which might be referred to as ‘smart template’ mechanisms. There is a recognition that, at least for the kinds of use cases we see today, much of the text in any given output can be predetermined and provided as boilerplate, with gaps to be filled dynamically based on per-record variations in the underlying data source. Add conditional inclusion of text components and maybe some kind of looping control construct, and the resulting NLG toolkit, as in the case of humans and chimpanzees, shares 99% of its DNA with the legal document automation and assembly tools of the 1990s, like HotDocs (https://www.hotdocs.com). As far as I can tell, linguistic knowledge, and other refined ingredients of the NLG systems built in research laboratories, is sparse and generally limited to morphology for number agreement (one stock dropped in value vs. three stocks dropped in value).
That sounds pretty negative, but he emphasizes that putting an easy-to-use UI on well understood technology is meeting real business needs.
At the end he briefly touches on GPT2 and related technology.
I think that most use cases for NLG are unethical.
I provide raw data for some news services and that data could be shown as tables or simple dashboards that are faster to read for users, but my customers insist in generating as much text as possible from, sometimes, 2 values.
The incentives are totally misaligned, news services try to get as much time from it's readers as possible. It's quite depressing.
I very much disagree. There are use case such as weather reporting, electronic medical report summarisation, etc. That make more sense as a textual representation than bundles of graphs and tables. In fact textual summarisation has been shown to lead to better decision making[1].
Good data-to-text NLG applications not only summarise data but they also can provides insight into causal relations of why events occurred by leveraging domain knowledge.
I agree with you Saad. I worded that comment poorly. There are some use cases that are gonna be used to keep people reading a lot of algorithmical gibberish for no reason other than increasing revenue, for example sports articles.
I think NLG is really interesting, the problem is that there are incentives to create long form content that doesn't add any value to readers.
I did some experiments last summer trying to get GPT-3 to author quizbowl questions, a type of trivia question. As expected, GPT-3's loose relationship with the truth made these results not acceptable. But a system that really could generate new questions from existing facts would be immensely useful in this domain.
Quizbowl questions have a particular structure that makes them generally not work with these other approaches. For instance, it needs to generate paragraph-long questions with the clues that go from the most to least difficult. This is hard skill for a human to learn, much less a machine.
Usage of these algorithms and heuristics is starting to become quite obvious in Spanish sports journalism. I stumble into articles that are basically AI gibberish generated from a short quote every other day.
I'd expect the only rational decision would be to ban the content mill domains from the crawler. Is there really that much value for the search customer in being directed to a generated page?
domains auto-generating junk or derivative content should be detectable from both human annotation or automated means. If a site with low click through rates, traffic and extreme volumes of text appears, it's worth a re-examination by people.
In particular at search time, google has a vested interest in limiting both the number of links to a particular domain and the occurrence of content farm links when Wikipedia would suffice. Content farms are pretty detectable in any objective relevance annotation workflow.
It wasn't that long ago that really poor quality auto generated content was clearly working for SEO purposes, so I'm not convinced Google is ahead of current state of the art.
As an engineer who has worked briefly with one of the start-ups focused on here on their core product, Robert's critique of the the technology driving the current generation of template-based commercial solutions is absolutely spot-on. The results are useful but the template based approach along with the really lackluster tooling both my employer and competitors in the space had cobbled together made self-service nigh impossible. His assessment that these companies are really professional services companies dealing in bespoke one-off software solutions despite their efforts to market as robust AI products that magically turn some arbitrary JSON schema into prose is very fair I think.
I am eager to see if neural approaches develop to better handle constraints, but part of me thinks the level of control over generated prose that template based approaches provide is essential for customer-facing text and that part of the market will persist even when every other recipe in Google search results is GPT-3 generated rubbish around the actual damn list of ingredients.
I've been using a finetuned T5 model for commercial text generation for the past six months. The volume of discussion around the model on github and elsewhere leads me to believe others are as well, although people tend to be circumspect about implementation details.
Check out the T5 models at huggingface for this. Main NLG use cases with T5 are translations, summarization and question generation. Latter is sophisticated + nothing trivial and "NLG", so yeah.
ML and NLP surveys are outdated basically the minute the author hits send nowadays. With GPT-3 all of these old school NLG companies seem quaint. Much of their functionality can be replicated in minutes with GPT-3, even by non-technical users familiar with prompt design. As soon as OpenAI offer fine tuning OOTB (coming soon) NLP/G will be all but solved for 99% of use cases.
I don't want to suggest you shot off a random comment without actually reading the linked survey. But it spends a whole section discussing that, and why it doesn't see commercial use. Here's a particularly relevant bit:
> Sometimes the results produced by GPT-2 and its ilk are quite startling in their apparent authenticity. More often than not they are just a bit off. And sometimes they are just gibberish. As is widely acknowledged, neural text generation as it stands today has a significant problem: driven as it is by information that is ultimately about language use, rather than directly about the real world, it roams untethered to the truth. While the output of such a process might be good enough for the presidential teleprompter, it would not cut it if you want the hard facts about how your pension fund is performing. So, at least for the time being, nobody who develops commercial applications of NLG technology is going to rely on this particular form of that technology.
I agree with the author on GPT-2. But GPT-3, which became available shortly after this was published, is quite a bit more powerful and there are many commercial applications being built on it now.
I do not see how GPT-3 could solve the basic architectural problem that the parent comment quotes, namely, that "driven as it is by information that is ultimately about language use, rather than directly about the real world, it roams untethered to the truth".
As an experiment I used a GPT-3-powered website [1] to see what GPT-3 has to say about bears, and the first answer was:
> "Weird that every day, there are so many cute/funny/entertaining bears to enjoy online but hardly any on the ground."
When asked about beards, the first answer has no relation with beards at all:
> "If a person doesn’t constantly outwit, outplay, outlast, others, the strong eat the weak."
And then there's that time when GPT-3 told someone to kill themselves [2].
While funny and (mostly) grammatically correct, these "thoughts" are nonsense and no amount of extra parameters is going to solve the disconnection between GPT-3 and reality. I imagine you could condition GPT-3 to generate text for a specific piece of data in such a way that guarantees the correctness of its output, but at that point you might as well throw GPT-3 away and write a rule-based system.
Let's just wait a couple of years to see if GPT-3 was any good in applications. Doesn't matter what we think, what matters is if it is viable.
It's younger sibling DALL-E is capable of language grounded in images, I expect the next version to be multi-modal as well. On another line of research there's effort to tame the horse (GPT) by attaching a secondary neural net. This can monitor language, topic, style and bias and ensure increased accuracy in tasks by auto-learning good prompts. It would make development of applications much easier because the base model which was super expensive to train can be reused many times while the secondary net is small and fast to train. Other efforts are related to including a search engine on an inner loop, to make the language model able to query large collections. Also, there's an open effort to create a huge text corpus, so far 800GB (The Pile). It improves on the GPT-3 training corpus on some categories that were lacking.
I think it's safe to say the article is way off the current research level.
Even GPT-3 knows nothing about the real world; it's merely trained to repeat the words that most often followed the prompt in its training data. That's obviously not useful for news...if a fact is in the training data, it's not news. It's not useful for "hard facts about how your pension fund is performing" unless you want to know how it performed a long time ago.
But I agree there are some applications it is useful for, like education.
> Even GPT-3 knows nothing about the real world; it's merely trained to repeat the words that most often followed the prompt in its training data.
I don't know why that would imply that it knows nothing about the real world, unless the data corpus it is trained on likewise bears no relation to reality...
I know, I know. But I don't say this lightly. I've worked in NLP/G for nearly 15 years and after spending the last 6 months working with GPT-3 I feel the writing is on the wall.
But it isn’t. I am still actively involved in NLG both professionally and academically and neural NLG systems have great promise but are still far from actively delivering tangible solutions to areas such data-to-text NLG. Inaccuracy, hallucinations are still highly problematic.
He give a list of commercial providers and concludes that most of them just offer smart templates. This is the important part:
> To the extent that you can tell from the clues to functionality that are surfaced by these various products, all the tools are ultimately very similar in terms of how they work, which might be referred to as ‘smart template’ mechanisms. There is a recognition that, at least for the kinds of use cases we see today, much of the text in any given output can be predetermined and provided as boilerplate, with gaps to be filled dynamically based on per-record variations in the underlying data source. Add conditional inclusion of text components and maybe some kind of looping control construct, and the resulting NLG toolkit, as in the case of humans and chimpanzees, shares 99% of its DNA with the legal document automation and assembly tools of the 1990s, like HotDocs (https://www.hotdocs.com). As far as I can tell, linguistic knowledge, and other refined ingredients of the NLG systems built in research laboratories, is sparse and generally limited to morphology for number agreement (one stock dropped in value vs. three stocks dropped in value).
That sounds pretty negative, but he emphasizes that putting an easy-to-use UI on well understood technology is meeting real business needs.
At the end he briefly touches on GPT2 and related technology.