Hacker News new | past | comments | ask | show | jobs | submit login
The Poison of (LLMs) Alignment (arxiv.org)
17 points by throwaway888abc 8 months ago | hide | past | favorite | 3 comments



It should not be surprising that asking an AI to not tell the truth leads to the quality of its' answers deteriorating.

I did not say they asked the AI to lie. I did they asked it not to give a correct and truthful response.

Maybe this explains much of the deterioration of ChatGPT that has been reported.


It is known that alignment tax affects smaller models but in larger models (>~100B parameters) the "tax" starts to become negative at least when trained in RLHF: https://arxiv.org/pdf/2204.05862.pdf

The largest Llama2 model has 70B parameters. They ran the experiment with 7B Llama2.


I would expect any pre-prompt or fine tuning training that is relevant to the subsequent prompts to improve the results (relative to the pre-prompting or tuning).

Likewise, any pre-prompt or fine tuning training that is NOT relevant to the subsequent prompts adds unhelpful and highly non-contextual complexity to the models job, so likely to reduce the quality of response.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: