Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: