Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
New method for testing AI using real work-flows (github.com/assimilatedhuman)
2 points by ballista2026 4 days ago | past | discuss
LLM INQUISITOR: Evaluating how AI models handle long, realistic tasks (github.com/assimilatedhuman)
1 point by ballista2026 11 days ago | past | discuss

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: