I don’t buy Lecun’s argument. Once you get good RL going (as we are now seeing w... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		josh-sematic 5 months ago \| parent \| context \| favorite \| on: Ask HN: Any insider takes on Yann LeCun's push aga... I don’t buy Lecun’s argument. Once you get good RL going (as we are now seeing with reasoning models) you can give the model a reward function that rewards a correct answer most highly, an “I’m sorry but I don’t know” less highly than that, a wrong answer penalized, a confidently wrong answer more severely penalized. As the RL learns to maximize rewards I would think it would find the strategy of saying it doesn’t know in cases where it can’t find an answer it deems to have a high probability of correctness.

Tryk 5 months ago [–]

How do you define the "correct" answer?

josh-sematic 5 months ago | | [–]

Certainly not possible in all domains but equally certainly possible in some. There’s not much controversy about the height of the Eiffel Tower or how to concatenate two numpy arrays.

jpadkins 5 months ago | | [–]

obviously the truth is what is the most popular. /s

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact