Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
DougBTX
3 days ago
|
parent
|
context
|
favorite
| on:
Qodo CLI agent scores 71.2% on SWE-bench Verified
Absolutely fine, as long as the success flag is predicted by the model ensemble under test. That’s how Claude Code works for example, it will continue to iterate until success (or it will give up with failure at a certain point).
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: