Ask HN: What Do LLMs Learn

jstx1 · on Feb 28, 2023

They take a sequence of tokens and they do their best to continue the sequence in a plausible way. That's the basic idea - given the text so far figure out what's likely to follow.

iamflimflam1 · on Feb 28, 2023

I understand this part. They learn how to continue text.

But what rules are they discovering - how do they become so good at text continuation?

t-vi · on March 6, 2023

Do they need to learn rules though, or could they memorize enough to learn the probabilities of the most likely continuation? To my mind the blurry-jpeg-metaphor[1] is very much spot-on. While it is speculation, it would seem to me personally that LLMs de-facto doing a (lossy) memorization and neural networks in general being able to do well in "local consistency" seems a way to think about them that is consistent with most of the model behaviour we observe.

1. https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

jstx1 · on Feb 28, 2023

> But what rules are they discovering

The point of ML is that you don't think about the rules explicitly, you let the algorithm figure them out from the data.

> how do they become so good at text continuation?

Good underlying algorithm (transformers do better than other algorithms we've tried) + lots of data.

I guess this isn't very satisfying but you're looking for some super deep and profound answer and there is none.

rnosov · on Feb 28, 2023

My understanding that the breakthrough was the attention mechanism where attention layers learn where important words are.

PaulHoule · on Feb 28, 2023

That's the first step of training.

The next step of training is Human Feedback Reinforcement Learning. They get rewarded for certain outputs and punished for other outputs. This is how they learn to be agreeable, to attempt to answer people's questions, not write Hitler speeches, etc.

jstx1 · on Feb 28, 2023

What you're describing is how to turn an LLM into a chat bot like ChatGPT. OP is asking about LLMs which by themselves don't need any reinforcement learning.

PaulHoule · on Feb 28, 2023

Yeah but to do anything useful (say to classify or make sequence labels like "color proper names red") you usually do need to do a second stage of training. The remarkable thing is that the unsupervised training on a large corpus transfers so well to future stages.

To be pedantic, "predict the next token" was what we were trying to do with RNNs 7-8 years ago. People are training transformers on "mask out 15% of the words randomly and guess what they were" which is a big difference because that task is symmetrical in the forward and backwards directions whereas the single-direction nature of RNNs was a major limitation (e.g. when they start out they have no state so if a model was writing fake abstracts for clinical case reports, something I tried, it decides what disease the patient had based on what letters or words it picked early on whereas it really should start out with a "latent state" that includes the characteristics of the patients including the disease the same way the clinical encounter did and the way the author did when they wrote the abstract.)