I'm curious what the mental calculus was that a $5k laptop would competitively b...

ekjhgkejhgk · 2025-12-21T22:17:29 1766355449

I agree with everything you said, and yet I cannot help but respect a person who wants to do it himself. It reminds me of the hacker culture of the 80s and 90s.

slicktux · 2025-12-21T23:17:38 1766359058

Agreed, Everyone seems to shun the DIY hacker now a days; saying things like “I’ll just pay for it”. It’s not about just NOT paying for it but doing it yourself and learning how to do it so that you can pass the knowledge on and someone else can do it.

davidw · 2025-12-21T23:19:40 1766359180

I loathe the idea of being beholden to large corporations for what may be a key part of this job in the future.

Eupolemos · 2025-12-22T09:04:31 1766394271

And we all know that enshittyfication is coming.

ekjhgkejhgk · 2025-12-22T09:20:26 1766395226

Exactly. Google doesn't show you what it knows is the most appropriate answer, it shows you a compromise between the most appropriate answer and the one that makes them the most money.

Same thing will happen with these tools, just a matter of time.

smcleod · 2025-12-21T22:52:38 1766357558

My 2023 Macbook Pro (M2 Max) is coming up to 3 years old and I can run models locally that are arguably "better" than what was considered SOTA about 1.5 years ago. This is of course not an exact comparison but it's close enough to give some perspective.

menaerus · 2025-12-22T13:04:43 1766408683

OpenAI released GPT-4o in May 2024, and Anthropic released Claude 3.5 Sonnet in June 2024.

I haven't tried the local models as much but I'd find it difficult to believe that they would outperform the 2024 models from OpenAI or Anthropic.

The only major algorithmic shift was done towards the RLVR and I believe it was already being applied during the 2023-2024.

wyldfire · 2025-12-22T02:19:02 1766369942

Is that really the case? This summer there was "Frontier AI performance becomes accessible on consumer hardware within a year" [1] which makes me think it's a mistake to discount the open weights models.

[1] https://epoch.ai/data-insights/consumer-gpu-model-gap

hu3 · 2025-12-22T03:49:27 1766375367

Open weight models are neat.

But for SOTA performance you need specialized hardware. Even for Open Weight models.

40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

Your link starts with:

> "Using a single top-of-the-line gaming GPU like NVIDIA’s RTX 5090 (under $2500), anyone can locally run models matching the absolute frontier of LLM performance from just 6 to 12 months ago."

I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

Lapel2742 · 2025-12-22T09:58:12 1766397492

> I highly doubt a RTX 5090 can run anything that competes with Sonnet 3.5 which was released June, 2024.

I don't know about the capabilities of a 5090 but you probably can run a Devstral-2 [1] model locally on a Mac with good performance. Even the small Devstral-2 model (24b) seems to easily beat Sonnet 3.5 [2]. My impression is that local models have made huge progress.

Coding aside I'm also impressed by the Ministral models (3b, 8b and 14b) Mistral AI released a a couple of weeks ago. The Granite 4.0 models by IBM also seem capable in this context.

[1] https://mistral.ai/news/devstral-2-vibe-cli

[2] https://www.anthropic.com/news/swe-bench-sonnet

menaerus · 2025-12-22T13:06:40 1766408800

> 40k in consumer hardware is never going to compete with 40k of AI specialized GPUs/servers.

For general purpose LLM probably yes. For something very domain-specialized not necessarily.

cmrdporcupine · 2025-12-22T04:21:34 1766377294

With RAM prices spiking, there's no way consumers are going to have access to frontier quality models on local hardware any time soon, simply because they won't fit.

That's not the same as discounting the open weight models though. I use DeepSeek 3.2 heavily, and was impressed by the Devstral launch recently. (I tried Kimi K2 and was less impressed). I don't use them for coding so much as for other purposes... but the key thing about them is that they're cheap on API providers. I put $15 into my deepseek platform account two months ago, use it all the time, and still have $8 left.

I think the open weight models are 8 months behind the frontier models, and that's awesome. Especially when you consider you can fine tune them for a given problem domain...

satvikpendem · 2025-12-21T22:46:11 1766357171

> I'm curious what the mental calculus was that a $5k laptop would competitively benchmark against SOTA models for the next 5 years was.

Well, the hardware remains the same but local models get better and more efficient, so I don't think there is much difference between paying 5k for online models over 5 years vs getting a laptop (and well, you'll need a laptop anyway, so why not just get a good enough one to run local models in the first place?).

Workaccount2 · 2025-12-22T05:06:05 1766379965

Even if intelligence scaling stays equal, you'll lose out on speed. A sota model pumping 200 tk/s is going to be impossible to ignore with a 4 year old laptop choking itself at 3 tk/s.

Even still, right now is when the first gen of pure LLM focused design chipsets are getting into data centers.

lelanthran · 2025-12-22T11:46:39 1766403999

> Even if intelligence scaling stays equal, you'll lose out on speed. A sota model pumping 200 tk/s is going to be impossible to ignore with a 4 year old laptop choking itself at 3 tk/s.

Unless you're YOLOing it, you can review only at a certain speed, and for a certain number of hours a day.

The only tokens/s you need is one that can keep you busy, and I expect that even a slow 5token/sec model utilised 60s in every minute, 60m of every hour and 24 hours of every day is way more than you can review in a single working day.

The goal we should be moving towards is longer-running tasks, not quicker responses, because if I can schedule 30 tasks to my local LLm before bed, then wake up in the morning and schedule a different 30, and only then start reviewing, then I will spend the whole day just reviewing while the LLM is generating code for tomorrow's review. And for this workflow a local model running 5 tokens/s is sufficient.

If you're working serially, i.e. ask the LLM to do something, then review what it gave you, then ask it to do the next thing, then sure, you need as many tokens per second as possible.

Personally, I want to move to long-running tasks and not have to babysit the thing all day, checking in at 5m intervals.

satvikpendem · 2025-12-22T05:12:37 1766380357

At a certain point, tokens per second stop mattering because the time to review stays constant. Whether it shits out 200 tokens a second versus 20, it doesn't much matter if you need to review the code that does come out.

brulard · 2025-12-22T00:12:12 1766362332

If you have inference running on this new 128GB RAM Mac, wouldn't you still need another separate machine to do the manual work (like running IDE, browsers, toolchains, builders/bundlers etc.)? I can not imagine you will have any meaningful RAM available after LLM models are running.

satvikpendem · 2025-12-22T05:10:37 1766380237

No? First of all you can limit how much of the unified RAM goes into VRAM, and second, many applications don't need that much RAM. Even if you put 108 GB to VRAM and 16 to applications, you'll be fine.

brulard · 2025-12-22T12:26:44 1766406404

How about the rest of the resources? CPU/GPU? Would your work not be affected by inference running?

thefourthchime · 2025-12-22T05:26:48 1766381208

I completely agree. I can't even imagine using a local model when I can barely tolerate a model one tick behind SOTA for coding.

littlestymaar · 2025-12-22T11:50:58 1766404258

> Local models are purely for fun, hobby, and extreme privacy paranoia

I always find it funny when the same people who were adamant that GPT-4 was game-changer level of intelligence are now dismissing local models that are both way more competent and much faster than GPT-4 was.

ekianjo · 2025-12-22T02:58:16 1766372296

That's the kind of attitude that removes power from the end user. If everything becomes SAAS you don't control anything anymore.