This is not a good comparison for real world coding tasks.
Based on my own experience and anectodes, it's worse than Claude 3.5 and 3.7 Sonnet for actual coding tasks on existing projects. It is very difficult to control the model behavior.
I will probably make a blog post on real world usage.
Based on my own experience and anectodes, it's worse than Claude 3.5 and 3.7 Sonnet for actual coding tasks on existing projects. It is very difficult to control the model behavior.
I will probably make a blog post on real world usage.