I am doing similar experiments for in-browser user testing, where the user is basically Claude. Incredible results with this simple pipeline:
1. Claude tests a feature
2. Notes down all friction and pain points
3. Convert those as a prioritized todo list
4. Use Claude Code or similar to action the todo list
It has some friction in-between steps 3 and 4, but nothing that can't be solved without running `claude --chrome` via CLI instead of the Chrome extension.
Agreed. No one should ever have to touch YAML for writing unit tests for LLMs. Ever. Most people writing agents and LLM applications are Python developers/data scientists/ML engineers.
It has some friction in-between steps 3 and 4, but nothing that can't be solved without running `claude --chrome` via CLI instead of the Chrome extension.