> We train the Alpaca model on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003
Which leads to self-instruct https://github.com/yizhongw/self-instruct
From a glimpse they used a LM to classify instructions & train the model which IMHO is very similar to RLHF
> We train the Alpaca model on 52K instruction-following demonstrations generated in the style of self-instruct using text-davinci-003
Which leads to self-instruct https://github.com/yizhongw/self-instruct
From a glimpse they used a LM to classify instructions & train the model which IMHO is very similar to RLHF