Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions (github.com/agi-eval-official)
3 points by jinqueeny 3 months ago | past
Stop benchmarking LLMs. Make them fight (github.com/agi-eval-official)
2 points by jinqueeny 3 months ago | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: