Hacker News new | past | comments | ask | show | jobs | submit login

It says

> We train the Alpaca model on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003

Which leads to self-instruct https://github.com/yizhongw/self-instruct

From a glimpse they used a LM to classify instructions & train the model which IMHO is very similar to RLHF




No, it is not RLHF because there is no reward model involved. See also OpenAI's explanation here: https://platform.openai.com/docs/model-index-for-researchers




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: