Self-discovery > Self-consistency > Medprompt > Chain of thought
I think the leaderboard the way it is devised is a bit silly, it rewards failure of the base model and success of the prompt atop it, but that is not how we want to be using the style of prompting. We need to see it how the gorrila code nudging metric does it, both base model score and the increase from the prompt style matter.
I think the leaderboard the way it is devised is a bit silly, it rewards failure of the base model and success of the prompt atop it, but that is not how we want to be using the style of prompting. We need to see it how the gorrila code nudging metric does it, both base model score and the increase from the prompt style matter.