Hacker News new | past | comments | ask | show | jobs | submit login

That's the first step of training.

The next step of training is Human Feedback Reinforcement Learning. They get rewarded for certain outputs and punished for other outputs. This is how they learn to be agreeable, to attempt to answer people's questions, not write Hitler speeches, etc.




What you're describing is how to turn an LLM into a chat bot like ChatGPT. OP is asking about LLMs which by themselves don't need any reinforcement learning.


Yeah but to do anything useful (say to classify or make sequence labels like "color proper names red") you usually do need to do a second stage of training. The remarkable thing is that the unsupervised training on a large corpus transfers so well to future stages.

To be pedantic, "predict the next token" was what we were trying to do with RNNs 7-8 years ago. People are training transformers on "mask out 15% of the words randomly and guess what they were" which is a big difference because that task is symmetrical in the forward and backwards directions whereas the single-direction nature of RNNs was a major limitation (e.g. when they start out they have no state so if a model was writing fake abstracts for clinical case reports, something I tried, it decides what disease the patient had based on what letters or words it picked early on whereas it really should start out with a "latent state" that includes the characteristics of the patients including the disease the same way the clinical encounter did and the way the author did when they wrote the abstract.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: