Hacker News new | past | comments | ask | show | jobs | submit | bhouston's comments login

Altman's claim and NVIDIA's consumer launch supply problems may be related - OpenAI may be eating up the GPU supply...

OpenAI is not purchasing consumer 5090s... :)

Although you are correct, Nvidia is limited on total output. They can't produce 50XXs fast enough, and it's naive to think that isn't at least partially due to the wild amount of AI GPUs they are producing.

No, but the supply constraints are part of what is driving the insane prices. Every chip they use for consumer grade instead of commercial grade is a potential loss of potential income.

they do have coding benchmarks, I summarized them here: https://news.ycombinator.com/item?id=43197955

A bit better at coding than ChatGPT 4o but not better than o3-mini - there is a chart near the bottom of the page that is easy to overlook:

- ChatGPT 4.5 on AWS Bench verified: 38.0%

- ChatGPT 4o on AWS Bench verified: 30.7%

- OpenAI o3-mini on AWS Bench verified: 61.0%

BTW Anthropic Claude 3.7 is better than o3-mini at coding at around 62-70% [1]. This means that I'll stick with Claude 3.7 for the time being for my open source alternative to Claude-code: https://github.com/drivecore/mycoder

[1] https://aws.amazon.com/blogs/aws/anthropics-claude-3-7-sonne...


Does the benchmark reflect your opinion on 3.7? I've been using 3.7 via Cursor and it's noticeably worse than 3.5. I've heard using the standalone model works fine, didn't get a chance to try it yet though.

personal anecdote - claude code is the best llm devx i've had.

I don't see Claude 3.7 on the official leaderboard. The top performer on the leaderboard right now is o1 with a scaffold (W&B Programmer O1 crosscheck5) at 64.6%: https://www.swebench.com/#verified.

If Claude 3.7 achieves 70.3%, it's quite impressive, it's not far from 71.7% claimed by o3, at (presumably) much, much lower costs.


I doubt o3s costs will be lower for that performance. They juice their benchmark results by letting it spend $100k in thinking tokens.

>BTW Anthropic Claude 3.7 is better than o3-mini at coding at around 62-70% [1]. This means that I'll stick with Claude 3.7 for the time being for my open source alternative to Claude-code

That's not a fair comparison as o3-mini is significantly cheaper. It's fine if your employer is paying, but on a personal project the cost of using Claude through the API is really noticeable.


> That's not a fair comparison as o3-mini is significantly cheaper. It's fine if your employer is paying...

I use it via Cursor editor's built-in support for Claude 3.7. That caps the monthly expense to $20. There probably is a limit in Claude for these queries. But I haven't run into it yet. And I am a heavy user.


Agentic coders (e.g. aider, Claude-code, mycoder, codebuff, etc.) use a lot more tokens, but they write whole features for you and debug your code.

If open ai offers a more expensive model (4.5) and a cheaper model (3 mini) and both are worse, it starts to be a fair comparison

It's the other way around on their new SWE-Lancer benchmark, which is pretty interesting: GPT-4.5 scores 32.6%, while o3-mini scores 10.8%.

To put that in context, Claude 3.5 Sonnet (new), a model we have had for months now and which from all accounts seems to have been cheaper to train and is cheaper to use, is still ahead of GPT-4.5 at 36.1% vs 32.6% in SWE-Lancer Diamond [0]. The more I look into this release, the more confused I get.

[0] https://arxiv.org/pdf/2502.12115


This guy says he is Aleph Null: http://richard-parkins.free.nf/

I thought Aleph Null was countably infinite?

There is always room for more tools. How many database exist? Front end frameworks? Languages? Backend frameworks? Analytics packages?

To think that in this space there is only one solution and all others are just outright failures or not worth doing is weird thinking as that isn't normally how it works. There are usually multiple niches and success/revenue strategies.

I strongly think this is the future of software development. And thus there will be many winners here.


It actually works with bun, pnpm, yarn, etc - any standard Node package manager.

I use pnpm personally and that is evident in the repo setup itself, but npm is sort of the standard so I put in that in the docs, rather than mentioning a long list of alternatives.


GP is likely referring to this concern, "The Great npm Garbage Patch":

https://news.ycombinator.com/item?id=41178258


On average I've been spending $25 a day on Claude credits once this was up and fully running. That is cheaper than hiring another developer in just about any country and it greatly boosts my productivity.

If you use threads / chains of messages in any form, I strongly encourage you to checkout caching. The cost savings are crazy. ($0.05 / cache read 1M tokens instead of $3 / 1M input tokens)

Okay I have token caching annd token costing implemented in a PR. Will go live tomorrow. Thanks for the suggestion!

Isn't it much cheaper to just use CoPilot with GPT-o?

Claude I find is significantly better at coding that OpenAI tech, especially in agentic tool using workflows.

I will have a look! Thx!

Aider is python. Claude code is closed source and this is open source and typescript.

I will investigate aider. I wrote this tool from idea to now in just four weeks without reference to existing tools so now I need to do that.

Yeeah you're going to find out you should have just aider I'm afraid...

except that the future of LLM assisted programming means I can also make my own implementation of aider pretty easily. So theres going to be an explosion of software that does basically the same thing but it's private or just not widely shared. not because I don't want to share but because starting and supporting an open source project is a pita and I just want to build this one little cool thing and be done with it.

I will be launching a version of this on GitHub as an app to help open source developers. So open source is also going to get a boost.

Aider is python. That is annoying for me as I like to modify things. This is typescript.

Wait, are you saying that, because you know Typescript but not Python, you can't make modifications on a software intended to develop for you using AI?

Auto-coders, which is what I call this tech, are great but they screw up complex tasks, so you need to be able to step in when they are screwing one up. I view it as a team of junior devs.

This will probably change at some point, but they require supervision this point and corrections.

If you do not actually know what you are doing, these things can create a mess. But that is just the next challenge to overcome and I suspect we'll get there relatively soon.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: