Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I particularly like their usage of LLM-as-a-judge. They don't go "hey chatgpt, sort these from best to worst based on vibes", rather they extract a set of ground truths and check how the answer compares, a task that SOTA LLM can do kind of reliably. It's a very smart way to circumvent the problems introduced by pure LLM-as-a-judge methods.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: