Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love obscure benchmarks, and I feel like I can trust their results a lot more - afterall, they (probably) weren't benchmaxxed. RuneBench[0] is another good example (how well LLMs can play Runescape)

[0] https://maxbittker.github.io/runebench/

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: