Direct Preference Optimization: Your Language Model Is a Reward Model | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Direct Preference Optimization: Your Language Model Is a Reward Model (arxiv.org)
		1 point by rahimnathwani on Jan 12, 2024 \| hide \| past \| favorite \| 2 comments

rahimnathwani on Jan 12, 2024 [–]

Previously mentioned by a few commenters:

https://news.ycombinator.com/item?id=36139342

https://news.ycombinator.com/item?id=36423421

https://news.ycombinator.com/item?id=38659012

Newly interesting because Andrew Ng raved about it in his latest newsletter: https://www.deeplearning.ai/the-batch/issue-231/

amai on Jan 14, 2024 | [–]

see also https://hn.algolia.com/?q=Direct+Preference+Optimization

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact