Hacker News new | past | comments | ask | show | jobs | submit login
Natural language generation: The commercial state of the art in 2020 (cambridge.org)
90 points by polm23 on Jan 11, 2021 | hide | past | favorite | 51 comments

This is a short overview of the state of NLG by Robert Dale, co-author of "Building Natural Language Generation Systems", which is basically the book for NLG.

He give a list of commercial providers and concludes that most of them just offer smart templates. This is the important part:

> To the extent that you can tell from the clues to functionality that are surfaced by these various products, all the tools are ultimately very similar in terms of how they work, which might be referred to as ‘smart template’ mechanisms. There is a recognition that, at least for the kinds of use cases we see today, much of the text in any given output can be predetermined and provided as boilerplate, with gaps to be filled dynamically based on per-record variations in the underlying data source. Add conditional inclusion of text components and maybe some kind of looping control construct, and the resulting NLG toolkit, as in the case of humans and chimpanzees, shares 99% of its DNA with the legal document automation and assembly tools of the 1990s, like HotDocs (https://www.hotdocs.com). As far as I can tell, linguistic knowledge, and other refined ingredients of the NLG systems built in research laboratories, is sparse and generally limited to morphology for number agreement (one stock dropped in value vs. three stocks dropped in value).

That sounds pretty negative, but he emphasizes that putting an easy-to-use UI on well understood technology is meeting real business needs.

At the end he briefly touches on GPT2 and related technology.

Smart templates arent bad at all given that for most uses you want to enforce style and attributes.

Generation is easier than understanding. Systems like GPT-3 are not capable of respecting constraints, which is this basis for practical creativity.

I think that most use cases for NLG are unethical.

I provide raw data for some news services and that data could be shown as tables or simple dashboards that are faster to read for users, but my customers insist in generating as much text as possible from, sometimes, 2 values.

The incentives are totally misaligned, news services try to get as much time from it's readers as possible. It's quite depressing.

I very much disagree. There are use case such as weather reporting, electronic medical report summarisation, etc. That make more sense as a textual representation than bundles of graphs and tables. In fact textual summarisation has been shown to lead to better decision making[1].

Good data-to-text NLG applications not only summarise data but they also can provides insight into causal relations of why events occurred by leveraging domain knowledge.

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656014/

I agree with you Saad. I worded that comment poorly. There are some use cases that are gonna be used to keep people reading a lot of algorithmical gibberish for no reason other than increasing revenue, for example sports articles.

I think NLG is really interesting, the problem is that there are incentives to create long form content that doesn't add any value to readers.

I tend to agree. While technologically its a marvel. But I do have the feeling the Robocallers and NLG customers are hanging out in the same place.

I did some experiments last summer trying to get GPT-3 to author quizbowl questions, a type of trivia question. As expected, GPT-3's loose relationship with the truth made these results not acceptable. But a system that really could generate new questions from existing facts would be immensely useful in this domain.

I'm not surprised GPT is awful at that, but there are good ways to do it. Look for papers on "factoid question generation".

Quizbowl questions have a particular structure that makes them generally not work with these other approaches. For instance, it needs to generate paragraph-long questions with the clues that go from the most to least difficult. This is hard skill for a human to learn, much less a machine.

Ah, that does sound difficult.

Usage of these algorithms and heuristics is starting to become quite obvious in Spanish sports journalism. I stumble into articles that are basically AI gibberish generated from a short quote every other day.

Google is probably going to struggle with SEO content/link mills that now produce something better than spun gibberish.

I'd expect the only rational decision would be to ban the content mill domains from the crawler. Is there really that much value for the search customer in being directed to a generated page?

That's the dilemma. NLG of sufficient quality[1] won't be easily detectable as machine generated.

[1] "Quality" here meaning how it looks to Google's crawler, not actual quality content.

domains auto-generating junk or derivative content should be detectable from both human annotation or automated means. If a site with low click through rates, traffic and extreme volumes of text appears, it's worth a re-examination by people.

In particular at search time, google has a vested interest in limiting both the number of links to a particular domain and the occurrence of content farm links when Wikipedia would suffice. Content farms are pretty detectable in any objective relevance annotation workflow.

It wasn't that long ago that really poor quality auto generated content was clearly working for SEO purposes, so I'm not convinced Google is ahead of current state of the art.

It is working as of now.

As an engineer who has worked briefly with one of the start-ups focused on here on their core product, Robert's critique of the the technology driving the current generation of template-based commercial solutions is absolutely spot-on. The results are useful but the template based approach along with the really lackluster tooling both my employer and competitors in the space had cobbled together made self-service nigh impossible. His assessment that these companies are really professional services companies dealing in bespoke one-off software solutions despite their efforts to market as robust AI products that magically turn some arbitrary JSON schema into prose is very fair I think.

I am eager to see if neural approaches develop to better handle constraints, but part of me thinks the level of control over generated prose that template based approaches provide is essential for customer-facing text and that part of the market will persist even when every other recipe in Google search results is GPT-3 generated rubbish around the actual damn list of ingredients.

curious if there any examples of using it for text generation, since it has different model structure from GPT.

I've been using a finetuned T5 model for commercial text generation for the past six months. The volume of discussion around the model on github and elsewhere leads me to believe others are as well, although people tend to be circumspect about implementation details.

How are you using it, through huggingface or directly?

Check out the T5 models at huggingface for this. Main NLG use cases with T5 are translations, summarization and question generation. Latter is sophisticated + nothing trivial and "NLG", so yeah.

From my experience T5 is currently the best publicly available model for NLG.

Did you use it already in some production environment?

Not yet but I am working on it!

Nice, will you use it via huggingface?

Yeah currently experimenting with T5 large in FP16. Will then use the ONNX conversion script to decrease inference time and deploy it!

Is there any place where NLG guys hang around? Most groups seem to have NLP as the focus.

On the academic side the SIGGEN mailing list is where a lot of the activity in the NLG community goes on: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=SIGGEN

EleutherAI on discord. https://discord.gg/vtRgjbM

Wonder the same thing or just an active discord for these topics...

What are the commercial usage of GPT-3 API?

I must completely lack imagination, because I don’t know what to use it for if it doesn’t give me access to the weights.

The state of NLG as of June 2020. A long time ago.

Apparently it was published just one day before GPT-3.

ML and NLP surveys are outdated basically the minute the author hits send nowadays. With GPT-3 all of these old school NLG companies seem quaint. Much of their functionality can be replicated in minutes with GPT-3, even by non-technical users familiar with prompt design. As soon as OpenAI offer fine tuning OOTB (coming soon) NLP/G will be all but solved for 99% of use cases.

I don't want to suggest you shot off a random comment without actually reading the linked survey. But it spends a whole section discussing that, and why it doesn't see commercial use. Here's a particularly relevant bit:

> Sometimes the results produced by GPT-2 and its ilk are quite startling in their apparent authenticity. More often than not they are just a bit off. And sometimes they are just gibberish. As is widely acknowledged, neural text generation as it stands today has a significant problem: driven as it is by information that is ultimately about language use, rather than directly about the real world, it roams untethered to the truth. While the output of such a process might be good enough for the presidential teleprompter, it would not cut it if you want the hard facts about how your pension fund is performing. So, at least for the time being, nobody who develops commercial applications of NLG technology is going to rely on this particular form of that technology.

I agree with the author on GPT-2. But GPT-3, which became available shortly after this was published, is quite a bit more powerful and there are many commercial applications being built on it now.

I do not see how GPT-3 could solve the basic architectural problem that the parent comment quotes, namely, that "driven as it is by information that is ultimately about language use, rather than directly about the real world, it roams untethered to the truth".

As an experiment I used a GPT-3-powered website [1] to see what GPT-3 has to say about bears, and the first answer was:

> "Weird that every day, there are so many cute/funny/entertaining bears to enjoy online but hardly any on the ground."

When asked about beards, the first answer has no relation with beards at all:

> "If a person doesn’t constantly outwit, outplay, outlast, others, the strong eat the weak."

And then there's that time when GPT-3 told someone to kill themselves [2].

While funny and (mostly) grammatically correct, these "thoughts" are nonsense and no amount of extra parameters is going to solve the disconnection between GPT-3 and reality. I imagine you could condition GPT-3 to generate text for a specific piece of data in such a way that guarantees the correctness of its output, but at that point you might as well throw GPT-3 away and write a rule-based system.

[1] https://thoughts.sushant-kumar.com/bears

[2] https://www.nabla.com/blog/gpt-3/

Is language use not inherently shaped by the real world?

The site you tried is a tweet generator, not a question answering site. I prompted GPT-3 with "Bears and beards are different because" and got...

"Bears and beards are different because they are not the same thing.

Bears are animals. Beards are facial hair.

Bears are dangerous. Beards are not.

Bears live in the woods. Beards live on your face.

Bears eat people. Beards do not."

But my original point was mainly that this field is moving fast and the the old school NLG companies (I created one back in the day!) are toast.

There is more to "the real world" than the definition of words, which is the most that you can expect a language model to learn.

Yes it is true that bears are animals.

No it is not true that, as GPT-3 said, "There aren't any on the ground"

Let's just wait a couple of years to see if GPT-3 was any good in applications. Doesn't matter what we think, what matters is if it is viable.

It's younger sibling DALL-E is capable of language grounded in images, I expect the next version to be multi-modal as well. On another line of research there's effort to tame the horse (GPT) by attaching a secondary neural net. This can monitor language, topic, style and bias and ensure increased accuracy in tasks by auto-learning good prompts. It would make development of applications much easier because the base model which was super expensive to train can be reused many times while the secondary net is small and fast to train. Other efforts are related to including a search engine on an inner loop, to make the language model able to query large collections. Also, there's an open effort to create a huge text corpus, so far 800GB (The Pile). It improves on the GPT-3 training corpus on some categories that were lacking.

I think it's safe to say the article is way off the current research level.

Even GPT-3 knows nothing about the real world; it's merely trained to repeat the words that most often followed the prompt in its training data. That's obviously not useful for news...if a fact is in the training data, it's not news. It's not useful for "hard facts about how your pension fund is performing" unless you want to know how it performed a long time ago.

But I agree there are some applications it is useful for, like education.

> Even GPT-3 knows nothing about the real world; it's merely trained to repeat the words that most often followed the prompt in its training data.

I don't know why that would imply that it knows nothing about the real world, unless the data corpus it is trained on likewise bears no relation to reality...

> unless the data corpus it is trained on likewise bears no relation to reality

It’s trained on Reddit, so I wouldn’t rule that out.

"solving" NLP, is AGI complete. GPT-3 is great at superficially correct syntax but breaks at deep / consistent / meaningful semantics.

How well can GPT-3 distinguish between human generated and GPT-3 generated?

Flying cars buddy! you just wait..

I know, I know. But I don't say this lightly. I've worked in NLP/G for nearly 15 years and after spending the last 6 months working with GPT-3 I feel the writing is on the wall.

But it isn’t. I am still actively involved in NLG both professionally and academically and neural NLG systems have great promise but are still far from actively delivering tangible solutions to areas such data-to-text NLG. Inaccuracy, hallucinations are still highly problematic.

Can you recommend any good tutorials on prompt design for GPT-3 ? Or how to use GPT-3 in general ?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
