I could have used this article before I spent the weekend arriving to the same conclusion!
Same laptop, and my contrived test was having it fix 50 or so lint errors in a small vibe-coded C++ repo. I wanted it to be able to handle a bunch of small tasks without getting stuck too often.
GPT OSS 20B was usable but slow, and actually frequently made mistakes like adding or duplicating statements unnecessarily, listing things as fixed without editing the code, and so on.
Qwen 3.5 9B with Opencode was much faster and actually able to work through a majority of the lint warnings without getting stuck, even through compaction and it fixed every warning with a correct edit.
I tried 4bit MLX quants of Qwen 3.5 9B but it eventually would crash due to insufficient memory. I switched to GGUF, which I run with llama.cpp, and it runs without crashing.
It is absolutely not comparable to frontier models. It’s way slower and gets basic info wrong and really can’t handle non trivial tasks in one go. I asked it for an architecture summary of the project and it claimed use of a library that isn’t present anywhere in the repo. So YMMV, but it’s still nice to have and hopefully the local LLM story can get much better on modest hardware over time.
Very different from my experience, Gemma 31b just solved a physics problem Opus 4.7 gave up on. I definitely don't think they're equivalent in general, Opus for sure is way smarter and way more likely to get things right on the edge, but it's still quite likely to get things wrong too it doesn't make it that useful for a lot of stuff. Conversely there are so many things that you would use an LLM for that they will both reliably oneshot. Especially in agentic mode where you have ground truth feedback between turns the difference gets quite small for a lot of tasks.
That all being said I've spent hundreds (maybe thousands?) of hours on this stuff over the past few years so I don't see a lot of the rough edges. I really believe capability is there, Gemma 4 31B is a useful agent for all sorts of stuff, and anything you can reasonably expect an LLM to oneshot Qwen 3.6 35b MoE will handle at like 90tok/sec, absolutely fantastic for tasks that don't require a huge amount of precision.
It may surprise you but over thousands of hours I have actually gathered more than one sample.
EDIT: Here's another sample for ya. I went to the store to buy mixers and while I was out Gemma 4 31b got pretty far along with reverse engineering the bluetooth protocol of a desk thermometer I have. I forgot to turn on the web search tool, so it just went at it, writing more and more specific diagnostic logging/probing tools over the course of like 8 turns. It connected to the thermometer, scanned the characteristics and had made a dump of the bluetooth notification data. When I got back it was theorizing about how the data might be encoded in the bluetooth characteristics and it got into an infinite loop. (local models aren't perfect and i never said they were) I turned on the websearch tool and told it to "pick up the project where it left off", it read the directory, did a couple googles and had a working script to print temperature, humidity and battery state in like 3 turns. Reading back throught it's chain of thought I'm pretty sure it would have been able to get it eventually without googling.
idk, I thought I was a cool and smart engineer type for being able to do stuff like this, if my GPUs being able to do this more or less unsupervised isn't impressive I guess fuck me lol.
Not the person asked but on a medium bug that would span a few python files, I found the MOE be too enthusiastic trying things without trying to understand first the issue, when the dense model though hard and added debug statements to understand how to fix it. But the dense model is quite slow (Q4KM quant, MI50 32GB, llama.cpp, pi)
At least in my experience, local models are very far away from models like Opus 4.7 or ChatGPT 5.5 in coding and problem solving areas.
I find them useful in basic research and learning and question asking tasks. Although at the same time, a Wikipedia page read or a few Google searches likely could accomplish the same and has been able to for decades.
I think you're doing it wrong. Use the frontier moddels for the research, planning etc and once you have a plan give it to a local model for implementation.
I have seen way too many people who are overly optimistic about local LLMs.
Having spent a decent amount of time playing with them on consumer nvidia GPUs, I understand well that they not going to be widely usable any time soon. Unfortunately not many people share that.
It's just not there yet. I have tried all the models from April, including the Gemma 4 variants.
These are so far from Opus it's not even funny. They are not close to being in the same league. Gemma might be like a frontier model from a couple years ago, but with much worse performance in long context chats.
Correct they aren't opus. They are sonnet with a little hand holding. They also run on a single GPU at 40 tps.
No one is saying a local model will give you anthropics business in a 5min download. People are saying, "hmm, maybe I should do this one locally". People are also saying "this is surprisingly good enough for me given the trade offs"
If your time is worth nothing to even triage that question.
Unless you have fanatic needs for data privacy or really don't have Internet, running local models almost certainly results in negative ROI overall.
Not to mention that you need to have decent hardware (that is getting expensive by the day) to even have this conversation in the first place.
People in this post talk as if everyone has a Mac with 24GB or 32GB RAM. When the reality is that most people use a Windows laptop with crappy integrated GPU.
Hm. I think there is a bit of a shifting goalpost dynamic at play here. Those April releases, even the fast MoE versions, are better than big cloud models from 18 months ago. I remember when everyone was gushing about Sonnet 3.7 and what a transformative experience development was using it. So was it useful or wasn’t it? A tool doesn’t lose its usability just because a better one comes along.
To me, these small local LLMs are highly useful (and this “usable”) even though they don’t match the output of today’s frontier models.
Not this. Let's reframe the problem. How many years behind do you think they are? By all accounts Gemma 4 is better than a frontier model from 3 years ago. Back then we were wowed by frontier models but when the local model reaches the same performance it's no good anymore, because you moved the target?
Relatively speaking local models might always be behind the curve compared to frontier ones. You can tell by the hardware needed to run each. But in absolute terms they're already past the performance threshold everyone praised in the past.
Right now in a lab somewhere there's a model that's probably better than anything else. There's a ChatGPT 5.6, an Opus 4.8. Knowing that do you suddenly feel a wave of disappointment at the current frontier models?
A local model is as good as a frontier model for responding on a signal threat with you which requieres basic tool calling.
A local model is as good as a frontier model of writing a joke.
A local model is as good as a frontier model at responding to an email.
Not sure what needs to be said often enough, no one without a clue would play around with local model setup and would compleltly ignore frontier models and their capabilities?!
That's totally fine and dandy as there is a very big, very vocal, very brainwashed crowd that dramatically overstates the capabilities of remote LLMs on HN as well.
Im like 50% convinced that these people are paid by Apple to promote their products. Because the conversation is always just being able to execute models (even larger ones), on mac hardware with unified memory, but nobody ever mentions that inference speed is unusably slow.
You can have good local LLM performance through agents, but you need fast inference. Generally, 2x 3090 or at the minimum 2x3080s (you need 2 to speed up prefill processing to build KV Cache). You just ironically need to be good at prompt engineering, which has a lot of analogue in real world on being able to manage low skilled people in completing tasks.
Honestly surprised to hear that GPT OSS 20B runs slow on mac hardware. It's absolutely one of the fastest models I've run on local GPUs for its size, but only tried Nvidia cards.
Edit: TIL it is MoE and only has 3.6B active, explains a lot.
Yeah, I'm probably wrong there. GPT OSS 20B is certainly much faster than some other models I've tried. I actually gave GPT OSS 20B a few prompts just now and it seems to respond as fast or faster than Qwen 3.5 9B. But I needed many more prompts for GPT OSS 20B to complete my contrived task, so progress felt much slower.
Well, the TI-83/84 are called a graphing calculators for a reason: you can plot equations and datasets with them and look at them right there[1]. Looking at graphs is huge for learning, or at least it was for me, and school isn't just about plugging things in and getting an answer (or shouldn't be, at least).
Doesn't mean it's not overpriced, but that's one reason and you can get a used TI-83/84 for like $30 or less. They pretty much never break.
-----
1. Okay, the Casio can QR-code-link you to a graph, but if I have internet/smartphone there are better graphing tools anyway, like Desmos.
I mean a laptop running windows can use the old power toy calculator or something like speed crunch to do graphing and I'm sure Linux has countless others, with Chromebooks probably having more for free online as well, I can only assume.
A graphing calculator is a fraction of the cost, has no security updates, is standardized, isn't connected to the internet, ... There is value in a thing that does one thing well.
Writing the code hasn’t been the bottle neck to developing software for a long time.
Code may not be the bottleneck, but writing it absolutely does consume time.
Especially with solo game dev, I can prototype ideas, try them out, and then refine or scrap them at a rate I could never do without AI. This type of experimentation is a perfect use-case for AI. It’s actually super fun, and if I pay attention and give the AI decent instructions, I don’t really lose out on code quality.
If you’re asking about a population decrease then, no, Austin has not had a declining population count for decades, and not recently either, although growth has slowed. So it’s not a case of decreased demand.
You are comparing it to other Apple laptops but you should be comparing with its competition at a $600 price point. The aluminum enclosure, touchpad, battery life, display, and performance are all best in class (or near enough) at this price point.
They don’t because of at-will employment. It’s just sort of the more moral, empathetic, right thing to do instead of leaving them with no income, no insurance, etc.
There are, and most of them don't have good bread. (Baguettes are about the only good bread that you can reliably expect to find in them. Sometimes they have San Francisco-style sourdough, which in my experience, tastes like someone dumped a shot of lemon vinegar into it. Just because a bread uses sourdough starter doesn't mean it needs to taste sour. I feel much the same way about hops and beer.)
Regularly visiting the bakery is, for reasons I've mentioned, a lot of friction for one purchase.
My closest one carries... Weird specialty hipster breads (because it is more focused on tarts and pastries and sweets - bread is just an afterthought for it).
The one I'd go to, if my closest grocery weren't stocking them is way out of my way. I would not be making that trip twice a week.
> Regularly visiting the bakery is, for reasons I've mentioned, a lot of friction for one purchase.
That is still not "really hard to come by" as per your original claim. It's very common (not just in large cities!) to have a local bakery where you can get good bread. Whether you choose to go or not, it is available to you.
I mean, let’s at least discuss this in good faith.
“Good” bread according to the majority and bread that is specifically up to your standards are probably two very different things.
My grocery store’s bakery sells many types of fresh bread: sourdough, white, rye, croissants, ciabatta, buns, rolls, bagels, and so on. Many grocery stores in my city have a bakery section with a selection of fresh bread like this. (Even Walmart I think, but I don’t shop there).
It’s not the best bread I’ve ever eaten, but it’s fresh, good, tasty bread. It’s not “mushy garbage” and it’s not “cake” like you described in your original comment. It’s not “weird specialty hipster” bread. It’s just simple, real, fresh bread.
My family pricing went up by 20%, from $59.88 USD to $71.88 per year.
I like 1Password a lot. I've used it for 10 years. It's never lost a single thing, and I don't recall any downtime that impacted me. It's easy to setup and 99% hassle free. Works on my various device types (windows, mac, ios). It supports passkeys and 2FA codes. I like having shared and private vaults. I love the ability to share an auto-expiring, one-time-view link to a password. And the billing is a simple subscription fee.
I could do without some bloat. Watchtower feels like an enterprise need that is otherwise low-value and (by default) noisy for individuals/families. I obviously don't need "AI" forced into my password manager. I didn't love the version 7 to 8 transition that required a new app/extension to be installed. But all of that is really not so bad.
So yeah, I don't feel like I'm getting any additional value that justifies the price increase, but it's still more than worth it for me.
Oh true. Considering inflation, $60 in 2016 is about $80 in 2026 so really the price has gone down in real terms.
(Not actually sure about the price history of the family plan or when family was introduced. I was originally on the individual plan and it was $35 then, and switched to the family plan in 2022. I don’t think prices have changed though)
1Password 8 looks like it was released around 2022. 1Password 7, which seemed to get support until sometime in 2023 supported local vaults and syncing yourself (via Dropbox or whatever).
So it’s really only been about 3 years since people were forced to get accounts with subscriptions, and now it’s going up 33%.
I still have the zip archive of 1Password 7 in my applications folder that the v8 upgrade created. It hasn’t been very long.
From my vault, I can see I got 1Password 7 in 2018. Using 2016 as the price anchor seems generous when subscriptions weren’t required in 2016.
He is 25 years old and trying to cope with a hard life event. Let’s not act like it doesn’t affect him. It affects everyone around her and the strong reaction from him is really a positive reflection on her, isn’t it?
His post is written and edited to garner sympathy and support. I don’t mind that for a naive but noble cause. And there is always a slim chance of success.
Same laptop, and my contrived test was having it fix 50 or so lint errors in a small vibe-coded C++ repo. I wanted it to be able to handle a bunch of small tasks without getting stuck too often.
GPT OSS 20B was usable but slow, and actually frequently made mistakes like adding or duplicating statements unnecessarily, listing things as fixed without editing the code, and so on.
Qwen 3.5 9B with Opencode was much faster and actually able to work through a majority of the lint warnings without getting stuck, even through compaction and it fixed every warning with a correct edit.
I tried 4bit MLX quants of Qwen 3.5 9B but it eventually would crash due to insufficient memory. I switched to GGUF, which I run with llama.cpp, and it runs without crashing.
It is absolutely not comparable to frontier models. It’s way slower and gets basic info wrong and really can’t handle non trivial tasks in one go. I asked it for an architecture summary of the project and it claimed use of a library that isn’t present anywhere in the repo. So YMMV, but it’s still nice to have and hopefully the local LLM story can get much better on modest hardware over time.
reply