It says > We train the Alpaca model on 52K instruction-following demonstrations ... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

est on March 14, 2023 | parent | context | favorite | on: Alpaca: A strong open-source instruction-following...

It says

> We train the Alpaca model on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003

Which leads to self-instruct https://github.com/yizhongw/self-instruct

From a glimpse they used a LM to classify instructions & train the model which IMHO is very similar to RLHF

sanxiyn on March 14, 2023 [–]

No, it is not RLHF because there is no reward model involved. See also OpenAI's explanation here: https://platform.openai.com/docs/model-index-for-researchers

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact