Hacker Newsnew | past | comments | ask | show | jobs | submit | azzarcher's commentslogin

I am doing similar experiments for in-browser user testing, where the user is basically Claude. Incredible results with this simple pipeline: 1. Claude tests a feature 2. Notes down all friction and pain points 3. Convert those as a prioritized todo list 4. Use Claude Code or similar to action the todo list

It has some friction in-between steps 3 and 4, but nothing that can't be solved without running `claude --chrome` via CLI instead of the Chrome extension.


It would be neat if it had a headless mode.



appreciate it - part of why i put this list up is so that people can add to it lol


How is this standing out from https://benchllm.com/?


I really dislike benchllm's use of yamls for test cases - I'd rather it be in code.

""" input: What's 1+1? Be very terse, only numeric output expected: - 2 - 2.0 """


Agreed. No one should ever have to touch YAML for writing unit tests for LLMs. Ever. Most people writing agents and LLM applications are Python developers/data scientists/ML engineers.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: