More

impulser_ · 2026-02-07T01:27:38 1770427658

I'm glad more people are catching onto lightweight CLI tools and using skills to give llms more tools. It's way better than MCP. I been doing this for awhile now and it's just the best way to gets LLMs to do things with APIs built for humans.

humanperhaps · 2026-02-07T03:25:46 1770434746

From my perspective, this also marks the shift towards agent skills instead of MCP, since agent skills rely on CLI tools. To me, this is also better than MCP since third-party developers can easily reuse existing APIs and libraries instead of needing official MCP support.

andreagrandi · 2026-02-09T07:22:52 1770621772

same! I personally released a couple of CLIs (written using Claude Code) which I regularly use for my work: logbasset (to access Scalyr logs) and sentire (to access Sentry issues). I never use them manually, I wrote them to be used well by LLMs. I think they are lighter compared to an MCP.

impulser_ · 2026-02-07T00:50:26 1770425426

They plan to use to for "Code Mode" which mean the LLM will use this to run Python code that it writes to run tools instead of having to load the tools up front into the LLM context window.

DouweM · 2026-02-07T01:19:44 1770427184

(Pydantic AI lead here) We’re implementing Code Mode in https://github.com/pydantic/pydantic-ai/pull/4153 with support for Monty and abstractions to use other runtimes / sandboxes.

The idea is that in “traditional” LLM tool calling, the entire (MCP) tool result is sent back to the LLM, even if it just needs a few fields, or is going to pass the return value into another tool without needing to see the intermediate value. Every step that depends on results from an earlier step also requires a new LLM turn, limiting parallelism and adding a lot of overhead.

With code mode, the LLM can chain tool calls, pull out specific fields, and run entire algorithms using tools with only the necessary parts of the result (or errors) going back to the LLM.

These posts by Cloudflare: https://blog.cloudflare.com/code-mode/ and Anthropic: https://platform.claude.com/docs/en/agents-and-tools/tool-us... explain the concept and its advantages in more detail.

pama · 2026-02-07T13:12:37 1770469957

I like your effort. Time savings and strict security are real and important. In modern orchestration flows, however, a subagent handles the extra processing of tool results, so the context of the main agent is not poluted.

impulser_ · 2026-02-06T01:04:32 1770339872

"Max subscribers are hitting their 5 hour usage limits in 30-40 minutes with a single instance doing light work"

This has not been my experience at all. The only time I even got close to this is multiple long sessions that had multiple compacts.

The key is if you hit compact, start a new session.

namelosw · 2026-02-06T04:55:38 1770353738

This hasn't been my experience either. I personally find the max plan is very generous for day-to-day usage. And I don't even use compact manually.

However, when I tried out the SuperPower skill and had multiple agents working on several projects at the same time, it did hit the 5-hour usage limit. But SuperPower hasn't been very useful for me and wastes a lot of tokens. When you want to trade longer running time for high token consumption, you only get a marginal increase in performance.

So people, if you are finding yourself using up tokens too quickly, you probably want to check your skills or MCPs etc.

bicepjai · 2026-02-06T16:46:58 1770396418

As a regular user, I hit these walls so often. I am experimenting with local model and open code. I am hoping to see some good results with qwen3 coder

mcast · 2026-02-06T02:29:44 1770344984

It's known that Anthropic's $20 Pro subscription is a gateway plan to their $100 Max subscription, since you'll easily burn your token rate on a single prompt or two. Meanwhile, I've had ample usage testing out Codex on the basic $20 ChatGPT Plus plan without a problem.

As for Anthropic's $100 Max subscription, it's almost always better to start new sessions for tasks since a long conversation will burn your 5-hour usage limit with just a few prompts (assuming they read many files). It's also best to start planning first with Claude, providing line numbers and exact file paths prior, and drilling down the requirements before you start any implementation.

deaux · 2026-02-06T03:07:43 1770347263

> It's known that Anthropic's $20 Pro subscription is a gateway plan to their $100 Max subscription, since you'll easily burn your token rate on a single prompt or two.

I genuinely have no idea what people mean when I read this kind of thing. Are you abusing the word "prompt" to mean "conversation"? Or are you providing a huge prompt that is meant to spawn 10 subagents and write multiple new full-stack features in one go?

For most users, the $20 Pro subscription, when used with Opus, does not hit the 5-hour limit on "a single prompt or two", i.e. 1-2 user messages.

pastel8739 · 2026-02-06T07:28:37 1770362917

Today I literally gave Claude a single prompt, asking it to make a plan to implement a relatively simple feature that spanned a couple different codebases. It churned for a long time, I asked a couple very simple follow up questions, and then I was out of tokens. I do not consider myself to be any kind of power user at all.

deaux · 2026-02-06T08:18:38 1770365918

The only time I've ever seen this happen is when you give it a massive codebase, without any meaningful CLAUDE.md to help make sense of it and no explicitly @ mentioning of files/folders to guide, and then ask it for something with huge cross-cutting.

> spanned a couple different codebases

There you go.

If you're looking to prevent this issue I really recommend you set up a number of AGENTS.md files, at least top-level and potentially nested ones for huge, sprawling subfolders. As well as @ mentioning the most relevant 2-3 things, even if it's folder level rather than file.

Not just for Claude, it greatly increases speed and reduces context rot for any model if they have to search less and more quickly understand where things live and how they work together.

visarga · 2026-02-06T10:55:04 1770375304

I have a tool that scans all code files in a repo and prints the symbols (AST based), it makes orienting around easy, it can be scoped to a file or folder.

fragmede · 2026-02-06T14:19:37 1770387577

> spanned a couple different codebases

It's either that, or you have a lot of skills loaded or something. I use Claude for hours a day and usually don't run out of tokens.

mcast · 2026-02-06T15:28:05 1770391685

I should note this only happens in Claude Code, not the web UI. Since CC is agentic and spawns subagents on prompts that require a lot of thinking.

It will spend a lot of time grokking the codebase, which would consume more tokens on larger projects.

Foobar8568 · 2026-02-06T06:52:28 1770360748

I am on $100 max subscription, and I rarely hit the limit, I used to but not anymore, but then again, I stopped building two products at the same time and concentrate to finish up the first/"easiest" one.

8cvor6j844qw_d6 · 2026-02-06T14:02:03 1770386523

I'm considering dropping the Max plan for the API.

Using the Max plan with tools like OpenClaw violates Anthropic's ToS [1].

The API gives you the same flexibility without the risk of getting your account suspended.

[1] https://www.anthropic.com/legal/consumer-terms (Section 3-7)

ipaddr · 2026-02-06T17:50:42 1770400242

At ten times the price. At least you keep the ability for them to charge you a hundred a month.

runako · 2026-02-06T03:25:22 1770348322

> you'll easily burn your token rate on a single prompt or two

My experience has been that I can usually work for a few hours before hitting a rate limit on the $20 subscription. My work time does not frequently overlap with core business hours in PDT, however. I wonder whether there is an aspect of this that is based on real-time dynamic usage.

cantalopes · 2026-02-06T08:51:36 1770367896

i never had these issues with gemini cli using google vertex endpoint, and i never even reached $50 per month

i don't want to think about how to hack a tool i'm paying for not locking me out because "i promped wrong"

daliusd · 2026-02-06T13:22:02 1770384122

I wonder what do you mean by "if you hit compact". Claude Code does not show used tokens.

ben_w · 2026-02-06T13:49:38 1770385778

When I used it before Christmas (free trial), it very visibly paused for a bit every so often, telling me that it was compressing/summarising its too-full context window.

I forget the exact phrasing, but it was impossible to miss unless you'd put everything in the equivalent of a Ralph loop and gone AFK or put the terminal in the background for extended periods.

bavell · 2026-02-06T16:21:14 1770394874

Run /usage or configure your statusline

novaleaf · 2026-02-06T14:47:10 1770389230

if you enable verbose mode, it does.

However I run like 3x concurrent sessions that do multiple compacts throughout, for like 8hrs/day, and I go through a 20x subscription in about 1/2 week. So I'm extremely skeptical of these negative claims.

Edit: However I stay on top of my prompting efficiency, maybe doing some incredibly wasteful task is... wasteful?

darqis · 2026-02-06T19:31:20 1770406280

It's my experience however.

impulser_ · 2026-02-05T06:20:58 1770272458

I honestly think OpenAI needs a new leader, they are so lost. They have no vision, it seems like they are just always chasing and not leading anymore.

Anthropic has a clear vision, they want to build the best agentic models so they can build products for humans to use like Claude Cowork, and Claude Code.

Google has a clear vision, they want to build the smartest model possible to give to consumers to use and to implement into their products.

OpenAI doesn't have any vision. GPT 5.2 alone has 5 different version that are all essentially the same. They are slow, take forever to do anything, and aren't smart in any area. They released Sora then just forgot about like everyone did. They released Atlas and forgot abotu like everyone did. They release GPT Image which honestly is probably the main reason they still have users they have.

They honestly might be in trouble, Oracle is fucking themselves over with the amount of debt they have to build out infrastructure for them. Microsoft and Nvidia are backing away from putting more eggs in the basket.

It actually no surprise they are trying to find every way to make money because if they don't they might be the first to fall.

Anthropic might not be winning the consumers, but it almost like you don't want to win consumers yet until you figure out how to make enough money to support them as Anthropic and OpenAI don't have business producing 100b in net income every year.

impulser_ · 2026-01-23T15:07:49 1769180869

My favorite thing is just being able to talk through the code and problem and have someone right there too response even if it not 100% right it still gets you to think and it nice to have it push back on things you ask ect. It basically a co worker you can bug all day and everytime they are still happy to help.

impulser_ · 2025-12-24T22:41:52 1766616112

Are they buying them to try and slow down open source models and protect the massive amounts of money they make from OpenAI, Anthropic, Meta ect?

It quite obvious that open source models are catching up to closed source models very fast they about 3-4 months behind right now, and yeah they are trained on Nvidia chips, but as the open source models become more usable, and closer to closed source models they will eat into Nvidia profit as these companies aren't spending tens of billion dollars on chips to train and run inference. These are smaller models trained on fewer GPUs and they are performing as good as the pervious OpenAI and Anthropic models.

So obviously open source models are a direct threat to Nvidia, and they only thing open source models struggle at is scaling inference and this is where Groq and Cerberus come into the picture as they provide the fastest inference for open source models that make them even more usable than SOTA models.

Maybe I'm way off on this.

Workaccount2 · 2025-12-24T22:54:07 1766616847

Shy of an algo breakthrough, open source isn't going to catch up with SOTA, their main trick for model improvement is distilling the SOTA models. That's why they they have perpetually been "right behind".

impulser_ · 2025-12-24T23:03:03 1766617383

They don't need to catch up. They just need to be good enough and fast as fuck. Vast majority of useful tasks of LLMs has nothing to do with how smart they are.

GPT-5 models have been the most useless models out of any model released this year despite being SOTA, and it because it slow as fuck.

aschobel · 2025-12-24T23:22:55 1766618575

For coding I don’t use any of the previous gen models anymore.

Ideally I would have both fast and SOTA; if I would have to pick one I’d go with SOTA.

There a report by OpenRouter on what folks tend to pay for it; it generally is SOTA in the coding domain. Folks are still paying a premium for them today.

There is a question if there is a bar where coding models are “good enough”; for myself I always want smarter / SOTA.

wyre · 2025-12-25T00:38:34 1766623114

FWIW coding is one of the largest usages for LLM's where SOTA quality matters.

I think the bar for when coding models are "good enough" will be a tradeoff between performance and price. I could be using Cerebras Code and saving $50 a month, but Opus 4.5 is fast enough and I value the piece-of-mind I have knowing it's quality is higher than Cerebras' open source models to spend the extra money. It might take a while for this gap to close, and what is considered "good enough" will be different for every developer, but certainly this gap cannot exist forever.

KronisLV · 2025-12-26T13:12:03 1766754723

I just use a mix of Cerebras Code for lots of fast/simpler edits and refactoring and Codex or Claude Code for more complex debugging or planning and implementing new features, works pretty well. Then again, I move around so many tokens that doing everything with just one provider would need either their top of the line subscriptions or paying a lot per-token some months. And then there's the thing that a single model (even SOTA) can never solve all problems, sometimes I also need to pull out Gemini (3 is especially good) or others.

gejose · 2025-12-25T02:43:15 1766630595

> just need to be good enough and fast as fuck

Hard disagree. There are very few scenarios where I'd pick speed (quantity) over intelligence (quality) for anything remotely to do with building systems.

ssivark · 2025-12-25T05:50:39 1766641839

If you thought a human working on something will benefit from being "agile" (building fast, shipping quickly, iterating, getting feedback, improving), why should it be any different from AI models?

Implicit in your claim are specific assumptions about how expensive/untenable it is to build systemic guardrails and human feedback, and specific cost/benefit ratio of approximate goal attainment instead of perfect goal attainment. Rest assured that there is a whole portfolio of situations where different design points make most sense.

nkmnz · 2025-12-25T07:23:43 1766647423

> why should it be any different from AI models?

1. law of diminishing returns - AI is already much, much faster at many tasks than humans, especially at spitting out text, so becoming even faster doesn’t always make that much of a difference. 2. theory of constraints - throughput of a system is mostly limited by the „weakest link“ or slowest part, which might not be the LLM, but some human-in-the-loop, which might be reduced only by smarter AI, not by faster AI. 3. Intelligence is an emergent property of a system, not a property of its parts - with other words: intelligent behaviour is created through interactions. More powerful LLMs enable new levels of interaction that are just not available with less capable models. You don’t want to bring a knife, not even the quickest one in town, to a massive war of nukes.

jameshush · 2025-12-25T03:00:21 1766631621

I agree with you for many use cases, but for the use case I'm focused on (Voice AI) speed is absolutely everything. Every millisecond counts for voice, and most voice use cases don't require anything close to "deep thinking. E.g., for inbound customer support use cases, we really just want the voice agent to be fast and follow the SOP.

nkmnz · 2025-12-25T07:27:59 1766647679

If you have a SOP, most of the decision logic can be encoded and strictly enforced. There is zero intelligence involved in this process, it’s just if/else. The key part is understanding the customer request and mapping it to the cases encoded in the SOP - and for that part, intelligence is absolutely required or your customers will not feel „supported“ at all, but be better off with a simple form.

jacquesm · 2025-12-25T14:54:05 1766674445

As a customer when confronted with such a system I hang up and never ever do business with that company again. Regardless of polish, they're useless.

nkmnz · 2025-12-25T16:39:51 1766680791

What do you mean by "such a system"? One that uses AI to funnel your natural language request into their system of SOP? Or one that uses SOPs to handle cases in general? SOP are great, they drastically reduce errors, since the total error is the square root of the sum of squares of random error and bias – while bias still occurs, the random error can and should be reduced by SOPs, whenever possible. The problem is that SOPs can be really bad: "Wait, I will speak to my manager" -> probably bad SOP. "Wait, I will get my manager so that you can speak to them" -> might be a better SOP, depending on the circumstances.

jacquesm · 2025-12-25T17:08:44 1766682524

It never works. You always just get the digital equivalent of a runaround and there simply isn't a human in the loop to take over when the AI botches it (again). So I gave up trying, this crap should not be deployed unless it works at least as good as a person. You can't force people to put up with junk implementations of otherwise good ideas in the hope that one day you'll get it right, customer service should be a service because on the other end of the line is someone with a very high probability of being already dissatisfied with your company and/or your product. For me this is not negotiable, if my time is less valuable to you, the company, than it is to actually put someone on to help then my money will go somewhere else.

nkmnz · 2025-12-25T17:27:56 1766683676

I'm still not sure if you're speaking of SOP in general or AI-interfaces to them. Why don't you answer that simple question before ranting on?

CuriouslyC · 2025-12-25T14:20:06 1766672406

Speed is great for UI iteration or any case where a human must be in the loop.

gessha · 2025-12-25T05:25:53 1766640353

As long as the faster tech is reliable and I understand its quirks, I can work with it.

Aurornis · 2025-12-25T02:07:12 1766628432

> They don't need to catch up. They just need to be good enough

The current SOTA models are impressive but still far from what I’d consider good enough to not be a constant exercise in frustration. When the SOTA models still have a long way to go, the open weights models have an even further gap distance to catch up.

nl · 2025-12-24T23:27:59 1766618879

GPT 5 Codex is great - the best coding model around except maybe for Opus.

I'd like more speed but prefer more quality than more speed.

echelon · 2025-12-25T02:30:37 1766629837

This. You can distill a foundation model into open source. The Chinese will be doing this for us for a long time.

We should be glad that the foundation model companies are stuck running on treadmills. Runaway success would be bad for everyone else in the market.

Let them sweat.

AmazingTurtle · 2025-12-25T10:34:36 1766658876

I'd prefer a 30 minute response from GPT-5 over a 10 minute Response from {Claude/Google} <whatever their SOTA model is> (yes, even gemini 3)

Reason is: while these models look promising in benchmarks and seem very capable at an affordable price, I *strongly* felt that OpenAI models perform better most of the times. I had to cleanup Gemini mess or Claude mess after vibe coding too much. OpenAI models are just much more reliable with large scale tasks, organizing, chomping tasks one by one etc. That takes its time but the results are 100% worth it.

Demiurge · 2025-12-25T01:25:11 1766625911

I get GPT 5.2 responses on copilot faster than for any other model, almost instantly. Are you sure they’re slow as fuck?

dontwannahearit · 2025-12-24T23:19:49 1766618389

Confused. Is ‘fuck’ fast or slow? Or both at the same time? Is there a sort of quantum superposition of fuck?

ThrowawayTestr · 2025-12-24T23:32:18 1766619138

It's an intensifier

blackoil · 2025-12-25T08:38:26 1766651906

Wasn't that supposed to be 'ass'

imcritic · 2025-12-25T11:06:41 1766660801

Then how would double intensifier look like?

867-5309 · 2025-12-24T23:47:07 1766620027

well, it's not slow as fuck! it's quick as lightning and speedy as hell

nineteen999 · 2025-12-24T23:06:08 1766617568

Bullseye.

mistercheph · 2025-12-25T02:07:49 1766628469

Too bad, so sad for the Mister Krabs secret recipe-pilled labs. Shy of something fundamental changing, it will always be possible to make a distillation that is 98% as good as a frontier model for ~1% of the cost of training the SOTA model. Some technology just wants to be free :)

blackoil · 2025-12-25T08:39:10 1766651950

We trust in our lord and savior China and Zuck to keep the peasants fed.

_fizz_buzz_ · 2025-12-25T01:11:48 1766625108

> their main trick for model improvement is distilling the SOTA models

Could you elaborate? How is this done and what does this mean?

MobiusHorizons · 2025-12-25T01:18:24 1766625504

I am by no means an expert, but I think it is a process that allows training LLMs from other LLMs without needing as much compute or nearly as much data as training from scratch. I think this was the thing deepseek pioneered. Don’t quote me on any of that though.

tensor · 2025-12-25T05:25:51 1766640351

No, distillation is far older than deepseek. Deepseek was impressive because of algorithmic improvements that allowed them to train a model of that size with vastly less compute than anyone expected, even using distillation.

I also haven’t seen any hard data on how much they do use distillation like techniques. They for sure used a bunch of synthetic generated data to get better at reasoning, something that is now commonplace.

MobiusHorizons · 2025-12-25T08:00:14 1766649614

Thanks it seems I conflated.

tickerticker · 2025-12-25T04:56:55 1766638615

Yes. They bounced millions of queries off of ChatGPT to teach/form/train their DeepSeek model. This bot-like querying was the "distillation."

orbital-decay · 2025-12-25T07:41:24 1766648484

They definitely didn't. They demonstrated their stuff long before OAI and the models were nothing like each other.

SirMaster · 2025-12-25T06:16:51 1766643411

Why would OpenAI allow someone to do that?

qcnguy · 2025-12-26T19:37:02 1766777822

They don't anymore. They introduced ID verification shortly after, but it's hard to stop completely while also scaling fast.

MadnessASAP · 2025-12-25T07:15:04 1766646904

They didn't, but how do you stop it? Presuming the scale that OpenAI is running at?

Kiboneu · 2025-12-25T05:04:11 1766639051

>Are they buying them to try and slow down open source models

The opposite, I think.

Why do you think that local models are a direct threat to Nvidia?

Why would Nvidia let a few of their large customers have more leverage by not diversifying to consumers? Openai decided to eat into Nvidia's manufacturing supply by buying DRAM; that's concretely threatening behavior from one of Nvidia's larger customers.

If Groq sells technology that allows for local models to be used better, why would that /not/ be a profit source for Nvidia to incorporate? Nvidia owes a lot of their success on the consumer market. This is a pattern in the history of computer tech development. Intel forgot this. AMD knows this. See where everyone is now.

Besides, there are going to be more Groqs in the future. Is it worth spending ~20B for each of them to continue to choke-hold the consumer market? Nvidia can afford to look further.

It'd be a lot harder to assume good faith if Openai ended up buying Groq. Maybe Nvidia knows this.

deaux · 2025-12-25T05:49:26 1766641766

> Besides, there are going to be more Groqs in the future.

And likely some of them are going to be in countries that won't let them sell out to Nvidia.

nl · 2025-12-24T23:25:57 1766618757

NVIDIA release some of the best open source models around.

Almost all open source models are trained and mostly run on NVIDIA hardware.

Open source is great for NVIDIA. They want more open source, not less.

Commoditize your complement is business 101.

impulser_ · 2025-12-24T23:31:56 1766619116

Then why are they spending $20 billion dollars to handicap an inference company that giving open source models a major advantage over closed source models?

gpapilion · 2025-12-25T03:41:24 1766634084

Realistically groq is a great solution but has near impossible requirements for deployment. Just look at how many adapters you need to meet the memory requirements of a small llm. SRAM is fast but small.

I would guess their interconnect technology is what NVIDIA wants. You need something like 75 adapters for an 8b parameter model they had some really interesting tech to make the accelerator to accelerator communication work and scale. They were able to do that well before nvl 72 and they scale to hundreds of adapters since large models require more adapters still.

We will know in a few months.

credit_guy · 2025-12-24T23:56:21 1766620581

> to handicap an inference company

That's a non-charitable interpretation of what happened. The are not "spending $20 billion to handicap Groq". They are handing Groq $20 billion to do whatever they want with it. Groq can take this money and build more chips, do more R&D, hire more people. $20 billion is truly a lot of money. It's quite hard to "handicap" someone by giving them $20 billion.

wmf · 2025-12-25T00:08:21 1766621301

Groq doesn't have any employees. They can't do R&D because there's no one to do it. The $20B goes to Groq's investors.

credit_guy · 2025-12-25T02:49:46 1766630986

From the article:

  > Groq added that it will continue as an “independent company,” led by finance chief Simon Edwards as CEO.

The $20B does not go to Groq's investors. It goes to Groq. You can say that Groq is owned by its investors, and this is the same thing, but it's not. In order for the money to go to the investors, Groq needs to disburse a dividend, or to buy back shares. There is no indication that this will happen. And what's more, the investors don't even need this to happen. I'm sure any investor that wants to sell their shares in Groq will now find plenty of buyers at a very advantageous price.

wmf · 2025-12-25T03:24:55 1766633095

Let's bet on this shit. Where's the Polymarket.

p1esk · 2025-12-25T01:14:53 1766625293

they spending $20 billion dollars to handicap an inference company

Inference hardware company

nl · 2025-12-25T03:31:47 1766633507

> handicap

Your words.

Because it's very good tech for inference?

It doesn't even do training.

And most inference providers for Open Source models use NVIDIA eg Fireworks, Basten, TogetherAI etc.

Most NVIDIA sales go to training clusters. That is changing but it'd be an interesting strategy to differentiate the training and inference lines.

ilaksh · 2025-12-24T23:51:18 1766620278

Yes, you are way off, because Groq doesn't make open source models. Groq makes innovative AI accelerator chips that are significantly faster than Nvidia's.

LoganDark · 2025-12-25T00:04:35 1766621075

For inference, but yes. Many hundreds of tokens per second of output is the norm, in my experience. I don't recall the prompt processing figures but I think it was somewhere in the low hundreds of tokens per second (so slightly slower than inference).

zamalek · 2025-12-24T23:59:40 1766620780

> Groq makes innovative AI accelerator chips that are significantly faster than Nvidia's.

Yeah I'm disappointed by this, this is clearly to move them out of the market. Still, that leaves a vacuum for someone else to fill. I was extremely impressed by Groq last I messed about with it, the inference speed was bonkers.

wmf · 2025-12-25T00:06:22 1766621182

If it's that good Nvidia can just keep selling it.

allovertheworld · 2025-12-25T05:00:20 1766638820

more like now Nvidia wants to release their own ASIC to combat google

fragmede · 2025-12-25T05:01:01 1766638861

Umm... no one tell them, okay?

PunchyHamster · 2025-12-25T09:04:01 1766653441

You still need hardware to run open source models. It might eat into OpenAI profit but I doubt it will eat into NVIDIA's

If anything more companies in making models business the higher NVIDIA chip demand will be, till we get some proper competition at least. We badly need some open CUDA equivalent so moving off to competition isn't a problem

HPsquared · 2025-12-25T11:38:01 1766662681

Nvidia's dream would be for everyone to buy a personal DGX H100 for private local inference. That's where open source could lead. Datacenters are much more efficient in their use of chips.

xnx · 2025-12-25T14:20:42 1766672442

Exactly. Efficiency use of their chips is the enemy of Nvidia.

dTal · 2025-12-25T15:54:40 1766678080

Doubt it:

https://en.wikipedia.org/wiki/Jevons_paradox

heavyset_go · 2025-12-25T00:34:23 1766622863

Nvidia just released their Nemotron models, and in my testing, they are the best performing models on low-end consumer hardware in both terms of speed and accuracy.

ymck · 2025-12-24T23:29:16 1766618956

I'd say that it's probably not a play against open source, but more trying to remove/change the bottlenecks in the current chip production cycle. Nvidia likely doesn't care who wins, they just want to sell their chips. They literally can't make enough to meet current demand. If they split off the inference business (and now own one of the only purchasable alternatives) they can spin up more production.

That said, it's completely anti-competitive. Nvidia could design a inference chip themselves, but instead the are locking down one of the only real independents. But... Nobody was saying Groq was making any real money. This might just be a rescue mission.

SkyPuncher · 2025-12-24T22:48:35 1766616515

They need to vertically integrate the entire stack or they die. All of the big players are already making plans for their own chips/hardware. They see everyone else competing for the exact same vendor’s chips and need to diversify.

ramoz · 2025-12-25T01:00:50 1766624450

They acquired in order to have an ASICs competitor to Google TPU.

karmasimida · 2025-12-25T08:25:02 1766651102

With RAM/memory price this high, open source is not going to catch up with closed source.

The open source economy relies on the wisdom of crowds. But that implies and equal access to experimentation platforms. The democratization of PC and consumer hardware brings the previous open source era that we all love, I am afraid the tech mongols had identified the chokehold of LLM ecosystem and found ways to successfully monopolized it

baconner · 2025-12-28T04:35:28 1766896528

NVIDIA makes money no matter if the model is open weights or not. I don't think open is a concern for them and they'd very much like to be servicing China and their batch of open models I think. what's concerning them more likely is

A. The inevitable breakdown of their massive head start with CUDA and data center hardware. A serious competitor at real scale.

B. Anything that'll cool off the massive data center buildouts that are fueling them.

Seems clear that locking up a major potential competitor especially the minds behind it solves for A. And their ongoing machinations with circular funding of companies funding data centers is all about B - keeping the momentum before it fizzles.

nurettin · 2025-12-26T04:57:31 1766725051

> It quite obvious that open source models are catching up to closed source models very fast they about 3-4 months behind

> Maybe I'm way off on this.

If by open source, you mean downloadable from huggingface and SOTA you mean opus 4.5, yes you are way off.

epolanski · 2025-12-25T10:34:49 1766658889

I don't see where is the benefit for Nvidia to limit the open source models.

The more competition, the more shovels they sell.

It's like saying that Intel would've benefited if only Dell and few others sold servers because they brought in multiple billions per year.

vachina · 2025-12-25T07:32:36 1766647956

More like they’re trying to snuff out potential competitors. Why work as hard to push your own products if NVIDIA gave you money to retire for the rest of your life?

AmazingTurtle · 2025-12-25T10:31:52 1766658712

Show me an affordable open source coding model thats closet to GPT-5.2-codex capabilities. Note: I do not have tons of HBM lying around

mr_toad · 2025-12-25T09:16:06 1766654166

The constant threat of open source (and other competitors) is what keeps the big fish from getting complacent. It’s why they’re spending trillions on new data centers, and that benefits Nvidia. When there’s an arms-race on it’s good to be an arms dealer.

matthewfcarlson · 2025-12-24T22:47:11 1766616431

Idk- cheaper inference seems to be a huge industry secret and providing the best inference tech that only works with nvidia seems like a good plan. Makes nvidia the absolute king of compute against AWS/AMD/Intel seems like a no brainer.

__mharrison__ · 2025-12-24T23:02:49 1766617369

How does this work considering the Nemotron models?

jayanmn · 2025-12-25T10:59:23 1766660363

China may take over the open source part. That is the only country with exposure to hardware, software and political might.

nbardy · 2025-12-25T09:33:54 1766655234

Your way off, this reads more like anti capitalist political rhetoric than real reasoning.

Look at Nvidia nemotron series. They hav become a leading open source training lab themselves and they’re releasing the best training data, training tooling, and models at this point.

impulser_ · 2025-12-15T00:48:57 1765759737

Don't worry, you can use these tools and not be an idiot. Just read and confirm what it does. It's that simple.

fHr · 2025-12-15T01:08:28 1765760908

Did you even read? "but I'm more worried about this happening to those running services that I rely on" The problem is some AI god agentic weaving high techbro sitting at Cloudflare/Google/Amazon not us reasonable joes on our small projects.

fwipsy · 2025-12-15T01:39:08 1765762748

They were responding to the first part of the comment, not the second. Doesn't mean they didn't read the second part.

impulser_ · 2025-12-15T02:02:37 1765764157

You think Cloduflare, Google, and Amazon are allowing engineers to plug Claude Code into production services? You think these companies are skipping code reviews and just saying fuck it let it do whatever it wants? Of course they aren't.

rsynnott · 2025-12-15T18:28:45 1765823325

I mean, I'm not sure I want to base my peace of mind on the thesis that no-one at a FAANG would ever do something stupid.

cyberax · 2025-12-15T02:04:32 1765764272

> You think these companies are skipping code reviews and just saying fuck it let it do whatever it wants?

Yes.

alex1138 · 2025-12-15T01:34:53 1765762493

[flagged]

sunaookami · 2025-12-15T09:36:35 1765791395

Then you should start thinking critically first.

impulser_ · 2025-12-15T00:47:32 1765759652

Rule 1: Never ever run any of these tools in automatic mode.

impulser_ · 2025-12-13T02:41:11 1765593671

Yeah I think it will probably go down as the biggest mistake Tesla has made.

They could have spent all the effort building EV delivery trucks with built in self driving which would help them collect even more data for FSD to tell them rollout robotaxis.

ModernMech · 2025-12-13T17:16:09 1765646169

Camera only is still a bigger mistake because without LiDAR, the EV delivery trucks with built in self driving will not work.

ben_w · 2025-12-16T12:00:39 1765886439

Even if camera-only does work*, it's still a mistake, because it was a bet that LiDAR wouldn't get cheap.

* I think it will, eventually, but "eventually" can be a long time, and the point is that this no longer even matters because of how cheap LiDAR is now.

impulser_ · 2025-12-11T19:32:44 1765481564

The thing about OpenAI is their models never fit anywhere for me. Yes they maybe smart or even the smartest models but they are alway so fucking slow. The ChatGPT web app is literally usable for me. I ask simple task and it does most extreme shit jsut to get an answer that the same as Claude or Gemini.

For example, I asked ChatGPT to take a chart and convert into a table. It went and cut up the image and zoomed in for literally 5 mins to get the a worst answer than Claude which did it in under a minute.

I see people talk about Codex like it better than Claude Code, and I go and try it and it takes a lifetime to do thing and it return maybe an on par result as Opus or Sonnet but it takes 5mins longer.

I just tried out this model and it the same exact thing. It just take ages for it to give you an answer.

I don't get how these models are useful in the real world.

What am I missing, is this just me?

I guess it truly an enterprise model.

wetoastfood · 2025-12-11T19:51:06 1765482666

Are you using 5.1 Thinking? I tended to prefer Claude before this model.

I use models based on the task. They still seem specialized and better at specific tasks. If I have a question I tend to go to it. If I need code, I tend to go to Claude (Code).

I go to ChatGPT for questions I have because I value an accurate answer over a quick answer and, in my experience, it tends to give me more accurate answers because of its (over) willingness to go to the web for search results and question its instincts. Claude is much more likely to make an assumption and its search patterns aren't as thorough. The slow answers don't bother me because it's an expectation I have for how I use it and they've made that use case work really well with background processing and notifications.