The Poison of (LLMs) Alignment

RecycledEle · 2023-08-30T11:20:21

It should not be surprising that asking an AI to not tell the truth leads to the quality of its' answers deteriorating.

I did not say they asked the AI to lie. I did they asked it not to give a correct and truthful response.

Maybe this explains much of the deterioration of ChatGPT that has been reported.

keskival · 2023-08-30T17:12:53

It is known that alignment tax affects smaller models but in larger models (>~100B parameters) the "tax" starts to become negative at least when trained in RLHF: https://arxiv.org/pdf/2204.05862.pdf

The largest Llama2 model has 70B parameters. They ran the experiment with 7B Llama2.

Nevermark · 2023-08-30T13:25:26

I would expect any pre-prompt or fine tuning training that is relevant to the subsequent prompts to improve the results (relative to the pre-prompting or tuning).

Likewise, any pre-prompt or fine tuning training that is NOT relevant to the subsequent prompts adds unhelpful and highly non-contextual complexity to the models job, so likely to reduce the quality of response.