Hacker Newsnew | past | comments | ask | show | jobs | submit | jonnylaw's commentslogin

Attaching to the app is impractical to catch regressions in production. LLMs are probabilistic - this means you can have a regression without even changing the code / making a new deployment.

A metric to alert on could be task-completion rate using LLM as a judge or synthetic tests which are run on a schedule. Then the other metrics you mentioned are useful for debugging the problem.



ooh, that's useful. thanks.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: