The premier open weight models don't even comparatively perform well on the public benchmarks compared to frontier models. And that's assuming at least some degree of benchmark contamination for the open weight models.
While I don't think they're completely useless (though its close), calling them fantastic replacements feels like an egregious overstatement of their value.
EDIT: Also wanted to note that I think this becomes as much an expectations-setting exercise as it is evaluation on raw programming performance. Some people are incredibly impressed by the ability to assist in building simple web apps, others not so much. Experience will vary across that continuum.
Yeah, in my comparing deepseek coder 2 lite (the best coding model I can find that’ll run on my 4090) to Claud sonnet under aider…
Deep seek lite was essentially useless. Too slow and too low quality edits.
I’ve been programming for about 17 years, so the things I want aider to do are a little more specific than building simple web apps. Larger models are just better at it.
I can run the full deepseek coder model on some cloud and probably get very acceptable results, but then it’s no longer local.