I agree I left out option 0, but I think the other two were presented correctly?
- Black box distillation uses direct answers to questions and conversation style. This is less useful as you still have to do supervised fine-tuning on the answers, as they may be wrong, and don't lead to greater insights (which reinforcement learning does)
- RLIAF relies on preferences and values to judge answers. These don't need supervised fine-tuning and help guide the new model to better answers rather than just correcting specific previously asked answers
Well, I mean you mixed up "fine-tuning" and "reinforcement learning" a bit when describing these options.
Regarding the value of these options, SFT communicates more information to the model being trained, but there's a risk of overfitting. So I'd guess they might use both - do a bit of SFT and then finish with RLAIF.
- Black box distillation uses direct answers to questions and conversation style. This is less useful as you still have to do supervised fine-tuning on the answers, as they may be wrong, and don't lead to greater insights (which reinforcement learning does)
- RLIAF relies on preferences and values to judge answers. These don't need supervised fine-tuning and help guide the new model to better answers rather than just correcting specific previously asked answers