The goal of the initial pretraining phase is to make it good at predicting the next word. The rest of the training process is aimed at making it (1) helpful and (2) as correct as possible.
I think some people oversimplify things by calling LLMs "next token predictors" and they leave out the tuning towards helpfulness and correctness.
That's a great talk, thanks. It was enlightening to learn about basic models and the fine-tuning for an assistant personality. It's also interesting that neither the function of the basic LLM nor its interaction with the assistant model are fully understood, according to Karpathy.
The goal of the initial pretraining phase is to make it good at predicting the next word. The rest of the training process is aimed at making it (1) helpful and (2) as correct as possible.
I think some people oversimplify things by calling LLMs "next token predictors" and they leave out the tuning towards helpfulness and correctness.