I agree I left out option 0, but I think the other two were presented correctly?...

killerstorm · 2026-06-26T09:49:10 1782467350

Well, I mean you mixed up "fine-tuning" and "reinforcement learning" a bit when describing these options.

Regarding the value of these options, SFT communicates more information to the model being trained, but there's a risk of overfitting. So I'd guess they might use both - do a bit of SFT and then finish with RLAIF.