> over-RLHF Over RLAIF, which basically makes the model less diverse and being m...

visarga · on Nov 21, 2023

Well, to me the fact that everyone is complaining about refusals no matter how they change the prompt shows RLAIF works pretty well. It seems to be prepared to refuse things no matter how they are formulated. If you want to make sure a LLM doesn't say stupid things this is a great method. The only problem is Anthropic banned too many topics.

When I don't trigger the refusal I get better conversation style from Claude than GPT-4. I often exhaust my Claude quota and have to move over to GPT-4, which is dry and no fun. Maybe Claude knows how to suck up to users better than GPT-4, but I don't get annoyed because before it congratulates me on something, it explains clearly what they understood from my last message, and it gets it really well.