jonnylaw's comments

jonnylaw · 2025-09-28T08:22:35 1759047755

Attaching to the app is impractical to catch regressions in production. LLMs are probabilistic - this means you can have a regression without even changing the code / making a new deployment.

A metric to alert on could be task-completion rate using LLM as a judge or synthetic tests which are run on a schedule. Then the other metrics you mentioned are useful for debugging the problem.

jonnylaw · on Dec 30, 2023

You use a tee https://en.wikipedia.org/wiki/Tee_(command)

jgalt212 · on Dec 30, 2023

ooh, that's useful. thanks.