Can someone ELI5 me why *suddenly* LLM and transformers became all the rage in A...

herculity275 · on Feb 28, 2023

* 2017 - the Attention Is All You Need paper proposes transformers, suggests language translation as their primary application

* 2019 - GPT2 is presented to the public and demonstrates transformer-based LLMs and their emergent capabilities

* 2020 - GPT3 is released and shows that throwing more compute at LLMs yields significantly better LLMs

* 2022 - ChatGPT is openly released to the public, showcasing the versatility of an LLM-based chatbot

In my experience transformers have been all the rage in the researcher/enthusiast scene since 2019. The technology has just gradually matured enough to become viable for consumer use, which is why you see the industry rushing to adopt it. ChatGPT was the watershed moment for the tech because suddenly anyone in the world could sign up for free, open a chat dialogue and start getting legible LLM output without needing to understand the tech or prompt engineering.

jerpint · on Feb 28, 2023

Not to mention without needing expertise to deploy the thing

HarHarVeryFunny · on Feb 28, 2023

The technology has been a while coming .. language models have long been a research area within machine learning, with recurrent models such as RNNs and LSTMs being an earlier approach since they allow the model to process a (language) sequence of arbitrary length.

Problems/limitations of recurrent models led to other approaches being tried using "attention" as way to let earlier parts of a sequence impact future prediction, culminating in the 2017 "Attention is all you need" paper which introduced the "Transformer" architecture that all these current LLMs are based on.

From there it was a matter of scale - scaling up the model and amount of data the models were trained on. Nobody knew how well this "Transformer" architecture could perform at scale, but early signs were promising enough to keep pushing to see how much better they could get. OpenAI in particular have been very aggressive in pushing this scaling up with their GPT-N (N=1/2/3..) models. They themselves expressed some surprise at the capabilities of GPT-2, leading to the much larger GPT-3 that is the basis of ChatGPT.

Both OpenAI and others had been leery of publically releasing these very capable LLMs for fear of ways they might be misused, but finally OpenAI released GPT-3 (with a bit of human feedback polish) in the guise of the chat bot ChatGPT, which was the first time the public had seen what the tech was capable of.

The sudden impact of ChatGPT belies the incremental improvements that brought us to this point, but seems to have been largely because the public had never seen/experienced the steps that got us here, partly because of the highly accessible packaging of the tech as a web-based chat bot, and perhaps partly because it was released without much explanation from OpenAI as to what it was/how it works - they seem quite happy for the public to do what they've done and anthromorphise it as being an AI assistant.

jmmcd · on Feb 28, 2023

Transformers aren't really a wonderful architecture in the sense of a great fit between the architecture and what we know about the task. (For comparison, I think convolutional networks are.)

What makes Transformers great is:

1. Can handle long sequences without large increase in number of parameters to be trained.

2. Parallelize better than previous sequence models, ie LSTM. If we could train LSTMs of the size and with the same training data size as current Transformers, they'd probably be just as good.

naasking · on March 1, 2023

So maybe RWKV [1] is the next step. It parallelizes even better and seems to have no sequence limit.

[1] https://github.com/BlinkDL/RWKV-LM

jerpint · on Feb 28, 2023

Transformers were used on text to train models without needing labeled data. People realized that simply scaling the data and models meant better performance. when they scaled even further, emergent intelligence started appearing and the models were dominating every known task. Now everyone wants an LLM

amelius · on Feb 28, 2023

> and the models were dominating every known task

I don't think this is very accurate. How well does LLM perform on image segmentation, for example?

jerpint · on Feb 28, 2023

I meant more transformers than LLMs, models trained in self supervised way like DINO actually are pretty good at image segmentation

unlikelymordant · on Feb 28, 2023

Transfomers are currently state of art i.e. the problems that transformers are currently solving cant be solved better by any other known technique/algorithm. Thats the main reason why they are so popular at the moment. LLMs are popular right now because a recent transformer based neural network has proven to be fun/useful i.e. gpt-3/chatgpt. A lot more useful than previous language models at least.

macrolime · on Feb 28, 2023

RWKV is showing that maybe RNNs can perform on par with transformers

https://github.com/BlinkDL/RWKV-LM

user_named · on Feb 28, 2023

I believe it's because they became more capable

antibasilisk · on Feb 28, 2023

because we discovered that LLMs are capable of emergent understanding and reasoning.