Although you are correct, Nvidia is limited on total output. They can't produce 50XXs fast enough, and it's naive to think that isn't at least partially due to the wild amount of AI GPUs they are producing.
No, but the supply constraints are part of what is driving the insane prices. Every chip they use for consumer grade instead of commercial grade is a potential loss of potential income.
A bit better at coding than ChatGPT 4o but not better than o3-mini - there is a chart near the bottom of the page that is easy to overlook:
- ChatGPT 4.5 on AWS Bench verified: 38.0%
- ChatGPT 4o on AWS Bench verified: 30.7%
- OpenAI o3-mini on AWS Bench verified: 61.0%
BTW Anthropic Claude 3.7 is better than o3-mini at coding at around 62-70% [1]. This means that I'll stick with Claude 3.7 for the time being for my open source alternative to Claude-code: https://github.com/drivecore/mycoder
Does the benchmark reflect your opinion on 3.7? I've been using 3.7 via Cursor and it's noticeably worse than 3.5. I've heard using the standalone model works fine, didn't get a chance to try it yet though.
I don't see Claude 3.7 on the official leaderboard. The top performer on the leaderboard right now is o1 with a scaffold (W&B Programmer O1 crosscheck5) at 64.6%: https://www.swebench.com/#verified.
If Claude 3.7 achieves 70.3%, it's quite impressive, it's not far from 71.7% claimed by o3, at (presumably) much, much lower costs.
>BTW Anthropic Claude 3.7 is better than o3-mini at coding at around 62-70% [1]. This means that I'll stick with Claude 3.7 for the time being for my open source alternative to Claude-code
That's not a fair comparison as o3-mini is significantly cheaper. It's fine if your employer is paying, but on a personal project the cost of using Claude through the API is really noticeable.
> That's not a fair comparison as o3-mini is significantly cheaper. It's fine if your employer is paying...
I use it via Cursor editor's built-in support for Claude 3.7. That caps the monthly expense to $20. There probably is a limit in Claude for these queries. But I haven't run into it yet. And I am a heavy user.
To put that in context, Claude 3.5 Sonnet (new), a model we have had for months now and which from all accounts seems to have been cheaper to train and is cheaper to use, is still ahead of GPT-4.5 at 36.1% vs 32.6% in SWE-Lancer Diamond [0]. The more I look into this release, the more confused I get.
There is always room for more tools. How many database exist? Front end frameworks? Languages? Backend frameworks? Analytics packages?
To think that in this space there is only one solution and all others are just outright failures or not worth doing is weird thinking as that isn't normally how it works. There are usually multiple niches and success/revenue strategies.
I strongly think this is the future of software development. And thus there will be many winners here.
It actually works with bun, pnpm, yarn, etc - any standard Node package manager.
I use pnpm personally and that is evident in the repo setup itself, but npm is sort of the standard so I put in that in the docs, rather than mentioning a long list of alternatives.
On average I've been spending $25 a day on Claude credits once this was up and fully running. That is cheaper than hiring another developer in just about any country and it greatly boosts my productivity.
If you use threads / chains of messages in any form, I strongly encourage you to checkout caching. The cost savings are crazy. ($0.05 / cache read 1M tokens instead of $3 / 1M input tokens)
except that the future of LLM assisted programming means I can also make my own implementation of aider pretty easily. So theres going to be an explosion of software that does basically the same thing but it's private or just not widely shared. not because I don't want to share but because starting and supporting an open source project is a pita and I just want to build this one little cool thing and be done with it.
Wait, are you saying that, because you know Typescript but not Python, you can't make modifications on a software intended to develop for you using AI?
Auto-coders, which is what I call this tech, are great but they screw up complex tasks, so you need to be able to step in when they are screwing one up. I view it as a team of junior devs.
This will probably change at some point, but they require supervision this point and corrections.
If you do not actually know what you are doing, these things can create a mess. But that is just the next challenge to overcome and I suspect we'll get there relatively soon.
reply