peter94's comments

peter94 · 2026-03-05T16:58:24 1772729904

What's the point of having the testing be done by Claude Code via a skill, rather than just hard-coding the set of tests to be run?

Bullhorn9268 · 2026-03-05T17:33:09 1772731989

Some of the rules are impossible to code generally enough, right? Like, "no defensive coding" - I can't imagine what kind of AST shenaningas you would have to do without basically banning `if`.

parad0x0n · 2026-03-05T17:12:39 1772730759

that's another option of course. But it's definitely easier to setup all these checks and linter tests with a skill file vs git hooks and actions

peter94 · 2026-02-23T16:33:06 1771864386

Personally, I think Claude Code played it a little too safe here, so that's why we didn't put more emphasis on its precision. Note that 100% precision is also easy to achieve in this case: Only match trials with papers that explicitly mention said trials via regex. So clearly we have to pay attention to both precision and recall. We just happened to go with F1 as the more or less canonical measure to take both into account, but I agree that, depending on your use case, you may be interested in other measures of accuracy.