Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
siva7
22 hours ago
|
parent
|
context
|
favorite
| on:
Some critical issues with the SWE-bench dataset
o3-mini and gpt-4o are so piss poor in agent coding compared to claude that you don't even need a benchmark
jbellis
22 hours ago
|
next
[–]
o3-mini-medium is slower than claude but comparable in quality. o3-mini-high is even slower, but better.
reply
danielbln
22 hours ago
|
prev
[–]
Claude really is a step above the rest when it comes to agentic coding.
reply
dr_kiszonka
11 hours ago
|
parent
[–]
When I used it with Open Hands it was great but also quite expensive (~$8/hr). In Trea, it was pretty bad, but free. Maybe it depends on how the agents use it? (I was writing the same piece of software, a simple web crawler for a hobby RAG project.)
reply
Join us for
AI Startup School
this June 16-17 in San Francisco!
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: