Hacker News new | past | comments | ask | show | jobs | submit login

Another little tidbit about RLHF and InstructGPT is that the training scheme is by far dominated by supervised learning. There is a bit of RL sprinkled on top, but the term is scaled down by a lot and 8x more compute time is spent on the supervised loss terms.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: