Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Id hope anyone using LLMs in production is testing them against their use directly.

Benchmarks make for a good first pass though to figure out which ones to test



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: