Hacker Newsnew | past | comments | ask | show | jobs | submit | butILoveLife's commentslogin

Meanwhile last time I checked, Android bug bounty is higher.

iPhone makes you an easy target. Sorry Besos, security through obscurity was a bad idea... but you should have known better.


Sorry who?

Hierarchies can punish this. Note that the legislature and judicial branches exert their power. Epstein files got released if you need proof.

(However, if we are International Systems Realists, there are inevitable effects that happen. I have a feeling even Biden/Harris would be in Iran right now.)


Some got released, and in the way the Executive wanted them to be.

This proves the opposite IMO - while the Legislative is co-opted, the Judicial branch has shown it is quite inadequate exerting control or punishment of the Executive.


I unsubbed because ChatGPT was no longer SOTA. They def got cheap.

Reminds me of that graph where late customers are abused. OpenAI is already abusing the late customers.

Claude is pretty great.


It's odd because I no longer really like ChatGPT. For chat-type requests, I prefer Claude, or if it's knowledge-intensive then Gemini 3 Pro (which is better for history, old novels, etc).

But GPT 5.3 Codex is great. Significantly better than Opus, in the TUI coding agent.


I keep hearing this but I consistently get subpar results from anything other than Opus

I don't know about Opus, but Codex suddenly got a lot better to the point that I prefer it over Sonnet 4.6. Claude takes ages and comes up with half baked solutions. Codex is so fast that I miss waiting. It also writes tests without prompting.

May be trying Codex on your suggestion. I was recently let down by its regular thinking.

ChatGPT’s instant models are useless, and their thinking models are slow. This makes Claude more pleasant to use, despite them not being SOTA.

But ChatGPT is still SOTA in search and hard problem solving.

GPT-5.2 Pro is the model people are using to solve Erdos problems after all, not Claude or Gemini. The Thinking models are noticeably better if you have difficult problems to work on, which justifies my sub even if I use Claude for everything else. Their Codex models are also much smarter, but also less pleasant to use.


IME ChatGPT is pretty mid at search. Grok although significantly dumber, is really strong at diligently going through hundreds of search results, and is much more tuned to rely on search results instead of its internal knowledge (which depending on the case can be better or worse). It's the only situation where Grok is worth using IMO.

Gemini is really good with many topics. Vastly superior to ChatGPT for agronomy.

You should always use the best model for the job, not just stick to one.


I'd be friends with you. Wish you had contact info in your profile.

Maybe, its opaque how its calculated.

But you are keeping people on high alert, refueling further away, etc...


We better get a liberal democratic Iran government out of this.

We better remove and halt nuclear powers for the rest of my life.

I suppose pick either, and it was successful.

My personal polymarket says we wont get either. Trump and Israel ruin their reputation. But reputation matters close to 0 in international relations, which is why they don't care.


There's next to no chance that whatever comes out of the end of this will be a "liberal democratic Iran government". Obama started a route in that direction with the lowered sanctions and the Joint Comprehensive Plan of Action from 2015. Iran having a democratic government doesn't really help the GOP war hawks so of course they trashed it. The same happened with North Korea in the 90s with the Agreed Framework that had some promise before GWB torpedoed it to please his oinking base.

I also think that nuclear powers mean regional stability. Ukraine gave up its nukes in the 90s and we saw what happened there.


> We better get a liberal democratic Iran government out of this.

> We better remove and halt nuclear powers for the rest of my life.

Neither of those things is a guaranteed outcome of this. Depending on who you ask, it's not even a likely outcome.

The IRGC remains the most powerful group in Iran. Probably a military junta is a more likely outcome, plus or minus a civil war to establish it.


Unfortunately, I think "Theocratic Iran with the bomb" is on the "good" side of the distribution of potential outcomes here.

You're right. It is unfortunate that you think that.

> We better get a liberal democratic Iran government out of this.

I doubt it. US intervention seems to have a habit of creating weakened nations for its rivals to benefit from. In Iraq's case: Iran and in Iran's case maybe the Taliban in Afghanistan.


I'd be happy with the permanent removal of US bases from the Middle East.

The Middle East does not understand Democracy. It will just be another strong man in power. The diaspora is pushing for a new shah

>unified memory

This is just marketing speak. Stop repeating marketing. It isnt a walled garden, its a walled prison.

Unified memory is just regular memory. There is nothing special about integrated GPUs.


Isn’t that is how it’s called though? PS4/PS5, Xbox consoles all referred to it like that on the spec sheets.

>Time to first token measured with an 8K-token prompt using a 14-billion parameter model with 4-bit quantization

Oh dear 14B and 4-bit quant? There are going to be a lot of embarrassed programmers who need to explain to their engineering managers why their Macbook can't reasonably run LLMs like they said it could. (This already happened at my fortune 20 company lol)


I don’t really get why people are smack talking this, are there other laptops available that can do better?

I wonder if Apple has foresight into locally running LLMs becoming sufficiently useful.

It won’t handle serious tasks but I have Gemma 3 installed on my M2 Mac and it is good for most of my needs—-esp data I don’t want a corporation getting its hands on.

What kind of tasks are you using it for? I haven't really found any uses for small models.

I run Qwen 3.5 30B MOE and it’s reasonable at most tasks I would use a local model for - including summarizing things. For instance I auto update all my toolchains automatically in the background when I log in and when finished I use my local model to summarize everything updated and any errors or issues on the next prompt rendering. It’s quite nice b/c everything stay updated, I know whats been updated, and I am immediately aware of issues. I also use it for a variety of “auto correct” tasks, “give me the command for,” summarize the man page and explain X, and a bunch of tasks that I would rather not copy and paste etc.

Nothing like coding, just like relatively basic stuff. Idk its hard to explain but I use AI so frequently for work that I have a sense for what it is capable of.

They do! "You're holding it wrong*

Yeah no it didn’t. If you have a fully speced out M3/4 MacBook with enough memory you’re running pretty decent models locally already. But no one is using local models anyway.

I run a local model on the daily. I have it making tickets when certain emails come in and made a small that I can click to approve ticket creation. It follows my instructions and has a nice chain of thought process trained. Local LLMs are starting to become very useful. Not OpenClaw crap.

What vram you running to allow both a capable model to run and also everything else the device needs to run?

If your company can afford fully speced out M3/4 MacBook, then it can also afford cloud AI costs.

> Yeah no it didn’t

What is "it" and what didn't it do?


With OpenClaw and powerful local models like Kimi 2.5, these specs make a lot of sense.

K2.5 isn't remotely a local model

Technically you can get most MoE models to execute locally because RAM requirements are limited to the active experts' activations (which are on the order of active param size), everything else can be either mmap'd in (the read-only params) or cheaply swapped out (the KV cache, which grows linearly per generated token and is usually small). But that gives you absolutely terrible performance because almost everything is being bottlenecked by storage transfer bandwidth. So good performance is really a matter of "how much more do you have than just that bare minimum?"

Oh sure it is! I’ve helped set up an AI cluster rack with four K2.5s.

With some custom tooling, we built our own local enterprise setup:

Support ticketing system Custom chat support powered by our trained software-support model Resolved repository with detailed step-by-step instructions User-created reports and queries Natural language-driven report generation (my favorite — no more dragging filters into the builder; our (Secret) local model handles it for clients) In-application tools (C#/SQL/ASP.NET) to support users directly, since our software runs on-site and offline due to PPI A cool repair tool: import/export “support file packet patcher” that lets us push fixes live to all clients or target niche cases Qwen3 with LoRA fine-tuning is also incredible — we’re already seeing great results training our own models.

There’s a growing group pushing K2.5s to run on consumer PCs (with 32GB RAM + at least 9GB VRAM) — and it’s looking very promising. If this works, we’ll be retooling everything: our apps and in-house programs. Exciting times ahead!


of course it's not remotely local: remote and local are literally antonyms

You can totally run it locally. If you have 500GB of RAM.

For anyone who has been watching Apple since the iPod commercials, Apple really really has grey area in the honesty of their marketing.

And not even diehard Apple fanboys deny this.

I genuinely feel bad for people who fall for their marketing thinking they will run LLMs. Oh well, I got scammed on runescape as a child when someone said they could trim my armor... Everyone needs to learn.


Yesterday I ran qwen3.5:27b with an M1 Max and 64 GB of ram. I have even run Llama 70B when llama.cpp came out. These run sufficiently well but somewhat slow but compared to what the improvements with the M5 Max it will make it a much faster experience.

my mac mini m4 is getting to be a good substitute for claude for a lot of use cases. LM Studio + qwen3.5, tailscale, and an opencode CLI harness. It doesn't do well with super long context or complexity but it has gotten production quality code out for me this week (with some fairly detailed instructions/background).

I don't know that there would be a huge overlap between the people who would fall for this type of marketing and the people who want to run LLMs locally.

There definitely are some who fit into this category, but if they're buying the latest and greatest on a whim then they've likely got money to burn and you probably don't need to feel bad for them.

Reminds me of the saying: "A fool and his money are soon parted".


In retrospect, was there a better place to learn about the cruelty of the world than runescape? Must've got scammed thrice before I lost the youthful light in my eye

There used to be a polite way to call this out, the "Steve Jobs's reality distortion field".

Now that every CEO has their own reality distortion field I wonder if it's even worth calling out any more.

No current CEO has a RDF comparable to Jobs.

Musk is probably closest, but he’s become so involved in partisan politics it makes his field far less effective at distorting reality.


Musk is leading the build of the biggest objects we have ever sent to space. It does give him some sort of aura that is hard to dismantle, let's be honest.

He can do and say a lot of shit because he will still be viewed as real-life Iron Man, because in some ways he kind of is.


Most are not nearly as smooth and successful at the distorting.

Somehow Tim Cook's many year's position that the lightening port was very important to Apple vs USB-C, fell flat as a parsec wide pancake.

(It didn't help that they couldn't point to a single user facing feature.)

Or that the App Store lock in is for our safety. When anyone who wanted that particular safety, could choose to continue using there store exclusively.

Etc.

He just does not have it. No field. No spiraling eyes. Perhaps he should grow a beard and wave around a tobacco pipe. Works for some.


I run local models on my M1 Max. there are a number of them that are quite useful.

Your first 2 points make me extra bitter about COVID.

Less store hours. Higher prices. Inflation. People in school got a terrible education and it affected my workforce. (But hey 1% of people died, as predicted if we did nothing at all... )

It only reinforces the importance of competition over protectionism.

I used to be a walmart fan, but my local store is cheaper now. I didn't bother to look at prices until things were getting silly.


> (But hey 1% of people died, as predicted if we did nothing at all... )

You're at a football stadium with 100k people. A thousand of them die suddenly. Do you feel safe?

> Less store hours. Higher prices. Inflation.

At this point, that's just greed. They figured out what the market would bear.


> But hey 1% of people died, as predicted if we did nothing at all

Nope. Compare the death rates of Sweden vs its neighbours in the Nordics (the closest comparisons we have with similar weather/culture/etc.). Or if you don't care about minimising variables, in the US between states that did lockdowns and mask mandates and those that didn't. In every comparable (e.g. excluding rural vs urban) case, there were more deaths in "doing nothing" than implementing the same basic public health axioms that have held true for centuries.

> Inflation

That was also helped by Russia invading Ukraine, which increased global prices of multiple important raw materials. But yes, inflation after a period of deflation/economic contraction/restricted travel and consumption was to be expected.

> People in school got a terrible education and it affected my workforce

It's definitely a bigger issue for them than it is for you. And yeah, it sucks for them. Would have been pretty terrible to tell teachers (who overwhelmingly skew older) they should risk their lives just to keep kids occupied too.

> It only reinforces the importance of competition over protectionism.

What has that got to do with COVID?


The thing too many forget is that if we didn't flatten the curve our entire medical system was going to collapse. It's insane that people don't yet understand this concept and can't even empathize with medical professionals. Yes, we all struggled, but try talking to medical professionals to see how they did.

When something doesn't happen because enough measures were taken, then it wasn't worth it because it didn't happen?


> The thing too many forget is that if we didn't flatten the curve our entire medical system was going to collapse

Yep, if things were going well there wouldn't have been makeshift morgues with refrigerated trucks, sick people having to be moved around to different countries, the military deploying field hospitals, corpses piling in the streets. Those examples are from a variety of countries, which shows how bad the situation was globally.


> Compare the death rates of Sweden

As a New Zealander, I like to chuck out our achievement of a negative death rate. Covid lockdowns resulted in less New Zealanders dying than usual.

But, like elsewhere, economic and social harm were both high.


You had 6 weeks of staying at home, and then quarantines for international travellers after that. In return, you had no COVID-19 at all for several years. Seems a fair trade.

> negative death rate.

Norway had that too; without lockdown. Curfews would require a change in the constitution and the last time they happened was during WWII which makes them doubly unpopular.


Sweden all-cause mortality was indeed higher if an immediate pre-pandemic year is taken as a base. However, pre-pandemic years in Sweden show a substantial dip in all-cause mortality, something that neighboring countries did not see. It is not that simple.

I mean sure more people died than were necessary, but think of the shareholder value that was created!

I think its just marketing, and the marketing is working. Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)

I don't mind it, I open Apple stock. But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.


> Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)

Aren't the OpenClaw enjoyers buying Mac Minis because it's the cheapest thing which runs macOS, the only platform which can programmatically interface with iMessage and other Apple ecosystem stuff? It has nothing to do with the hardware really.

Still, buying a brand new Mac Mini for that purpose seems kind of pointless when a used M1 model would achieve the same thing.


It’s exactly that. They are buying the base model just for that. You are not going to do much local AI with those 16GB of ram anyway, it could be useful for small things but the main purpose of the Mini is being able to interact with the apple apps and services.

16GB should be enough for TTS/Voice models running locally no ? I was thinking about having a home assistant setup like that where the voice is local and the brain is API based

I run ministral for my home knowledge database on 24G iMac and some other non-agentic LLM things.

Sure, that’s why I said maybe it’s useful for a few things. But the main reason people were recommending the Mini was for its price (base model) and having access to the Apple services for clawdbot to leverage. Not precisely for local AI.

No one is buying a base model Mac for local LLM. Everyone is forgetting the PC prices have drastically increased due to RAM and SSD. Meanwhile, Macs had no such price change… at least for the models that didn’t just drop today. Mac’s are just a good deal at the moment.

> Meanwhile, Macs had no such price change

Yeah because Mac upgrade prices were already sky high, long before the component shortage. 32GB of DDR5-6000 for a PC rocketed from $100 to $500, while the cost of adding 16GB to a Mac was and still is $400.


I'm kind of curious how Apple's supply contracts actually work, because it's currently more attractive to buy a Mac with a lot of RAM than it usually is, relative to a PC. So if it's "we negotiated a price and you give us as much RAM as we sell machines" the company supplying the RAM is getting soaked because they're having to supply even more RAM to Apple for a below-market price.

But if the contract was for a specific amount of RAM and then people start coming to Apple more for high RAM machines, they're going to exhaust their contract sooner than usual and run out of cheap memory to buy. Then they have to decide if they want to lower their margins or raise the already-high price up to nosebleed levels.


https://www.linkedin.com/pulse/memory-supply-chain-ai-disrup...

  Apple has accepted a 100% price increase for Samsung's LPDDR5X memory, with DRAM supply commitments secured only through the first half of 2026. Tim Cook acknowledged during the Q1 FY2026 earnings call that storage price increases would significantly impact Q2 gross margins.. Apple is evaluating ChangXin Memory Technologies (CXMT) and Yangtze Memory Technologies (YMTC) as new supply sources, attempting to rebuild pricing leverage through supply chain diversification.

the new models cost $200 more for each 8GB of Ram you add.. Ouch...

That's been the case for years. Not new to the M5's

There are so few used Mac Mini around, those are all gone and what is left is to buy new.

Worse than that, they hold their value, so buying a used M1 mini is still a few hundred bucks, and saving $200-300 by purchasing a 5 generation older mini seems like a bad deal in comparison.

Someone came to be excited they got a "deal" on the newest Intel Mac Mini for hosting OpenClaw. 8GB model for $300. I kind of regret bursting their bubble by telling them you can walk over to Costco (nearest one at time of discussion was walking distance) and pay $550 for one with an M4 and 16GB of RAM.

Up until a week ago, the base m4 mini (16gb ram/256gb ssd) was $399 at microcenter, now $499. Pretty shocking how good of a value that is IMO.

Damn. Would be awesome to network a bunch over thunderbolt.

Just like with GPUs and Bitcoin they'll be a flood of old hardware on the market eventually.

Can't they simply run MacOS on a VM on existing Mac hardware?

You aren’t going to run a network connected 24/7 online agent from a laptop because it’s battery powered and portable.

Not if you want it to be able to use the hardware identifiers to register for use with iMessage.

I have it running in a macos VM using lume & BlueBubbles on a throwaway iCloud account. A lot of hoops to jump through, though

https://cua.ai/docs/lume https://docs.openclaw.ai/channels/bluebubbles



> Aren't the OpenClaw enjoyers buying Mac Minis because it's the cheapest thing which runs macOS

That's likely only part of the reason. Mac Mini is now "cheap" because everyone exploded in price. RAM and SSD etc have all gone up massively. Not the mention Mac mini is easy out of the box experience.


It's not cheap, though. Two weeks ago I bought a computer with a similar form factor (GMKtec G10). Worse CPU and GPU but same 16GB memory and a larger SSD for 40% the price of a base mac mini ($239 vs $599). It came with Windows preinstalled, but I immediately wiped that to install linux. Even a used (M-series) mac mini is substantially more expensive. It will cost me about an extra penny per day in electricity costs over a mac mini, but I won't be alive long enough for the mac mini to catch up on that metric.

I considered the mac mini at the time, but the mac mini only makes sense if you need the local processing power or the apple ecosystem integration. It's certainly not cheaper if you just need a small box to make API calls and do minimal local processing.


It's cheap for what you get.

If you just need "a small box to make API calls and do minimal local processing" you an also just buy a RPI for a fraction of the price of the GMKtec G10.

All 3 serve a different purpose; just because you can buy a slower machine for less doesn't mean the price:performance of the M1 Mac Mini changes.


> you an also just buy a RPI for a fraction of the price of the GMKtec G10.

Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.

A 16gb pi5 kit, to match just the RAM capacity to say nothing of the difference in storage {size, speed, quality} and networking, is then also an eye watering $300


>Sadly not really. The Pi 5 8gb canakit starter set, which feels like a more true price since it's including power supply, MicroSD card, and case, is now $210. The pi5 8gb by itself is $135.

At that point buy a used macbook air m1.


>you an also just buy a RPI for a fraction of the price

lol. you need to look at rpi 5 prices again. they are insane.


If you need the CPU power in the Mac Mini then it is a pretty good price-to-performance ratio.

> It came with Windows preinstalled, but I immediately wiped that to install linux.

Do you really need Openclaw now? And not claude code + zapier or Claude code + cron?

That's the point. If you have worse CPU and GPU Windows will be sluggish (it's bloated).


Bro. The used M1 mini and studio are all gone. I was thinking of buying one for local AI before openclaw came out and went back to look and the order book is near empty. Swappa is cleared out. eBay is to the point that the m1 studio is selling for at least a thousand more.

This arb you’re talking about doesn’t exist. An m1 studio with 64 gb was $1300 prior to openclaw. You’re not getting that today.

I would have preferred that too since I could Asahi it later. It’s just not cheap any more. The m4 is flat $500 at microcenter.


yes, and its funny that all these critical people dont know this

Why not? The integrated GPUs are quite powerful, and having access to 32+ GB of GPU memory is amazing. There's a reason people buy Macs for local LLM work. Nothing else on the market really beats it right now.

My M4 MacBook Pro for work just came a few weeks ago with 128 GB of RAM. Some simple voice customization started using 90GB. The unified memory value is there.

Jeff Geerling had a video of using 4 Mac Studios each with 512GB RAM connected by Thunderbolt. Each machine is around $10K so this isn't cheap but the performance is impressive.

https://www.youtube.com/watch?v=x4_RsUxRjKU


If 40k is the barrier to entry for impressive, that doesn't really sell the usecase of local LLMs very well.

For the same price in API calls, you could fund AI driven development across a small team for quite a long while.

Whether that remains the case once those models are no longer subsidized, TBD. But as of today the comparison isn't even close.


It’s what a small business might have paid for an onprem web server a couple of decades ago before clouds caught on. I figure if a legal or medical practice saw value in LLMs it wouldn’t be a big deal to shove 50k into a closet

You would still have to do some pretty outstanding volume before that makes sense over choosing the "Enterprise" plan from OpenAI or Anthropic if data retention is the motivation.

Assuming, of course, that your legal team signs off on their assurance not to train on or store your data with said Enterprise plans.


At least with the server you know what you are buying.

With Anthropic you're paying for "more tokens than the free plan" which has no meaning


Sure, but now double the team size. Double it again.

Suddenly that $40k is quite reasonable because you’ll never pay another dollar for st least 2-3 years.


With M3 Max with 64GB of unified ram you can code with a local LLM, so the bar is much lower

But why? Spending several thousand dollars to run sub-par models when the break-even point could still be years away seems bizarre for any real usecase where your goal is productivity over novelty. Anyone who has used Codex or Opus can attest that the difference between those and a locally available model like Qwen or Codestral is night and day.

To be clear, I totally get the idea of running local LLMs for toy reasons. But in a business context the sell on a stack of Mac Pros seems misguided at best.


Sometimes you can't push your working data to third party service, by law, by contract, or by preference.

I started doing it to hedge myself for inevitable disappearance of cheap inference.

I ran the qwen 3.5 35b a3b q4 model locally on a ryzen server with 64k context window and 5-8 tokens a second.

It is the first local model I've tried which could reason properly. Similar to Gemini 2.5 or sonnet 3.5. I gave it some tools to call , asked claude to order it around, (download quotes, print charts, set up a gnome extension) even claude was sort of impressed that it could get the job done.

Point is, it is really close. It isn't opus 4.5 yet, but very promising given the size. Local is definitely getting there and even without GPUs.

But you're right, I see no reason to spend right now.


Getting Opus to call something local sounds interesting, since that's more or less what it's doing with Sonnet anyway if you're using Claude Code. How are you getting it to call out to local models? Skills? Or paying the API costs and using Pi?

I just start llama.cpp serve with the gguf which creates an openai compatible endpoint.

The session so far is stored in a file like /tmp/s.json messages array. Claude reads that file, appends its response/query, sends it to the API and reads the response.

I simply wrapped this process in a python script and added tool calling as well. Tools run on the client side. If you have Claude, just paste this in :-)


It's not. I've got a single one of those 512GB machines and it's pretty damn impressive for a local model.

Assuming you ran the gamut up from what you could fit on 32 or 64GB previously, how noticeable is the difference between models you can run on that vs. the 512GB you have now?

I've been working my way up from a 3090 system and I've been surprised by how underwhelming even the finetunes are for complex coding tasks, once you've worked with Opus. Does it get better? As in, noticeably and not just "hallucinates a few minutes later than usual"?


I'm not really into AI and LLMs. I personally don't like anything they output. But the people I know who are into it and into running their own local setups are buying Studios and Minis for their at home local LLM set ups. Really, everyone I personally know who is doing their build your own with local LLMs are doing this. I don't know anyone anymore buying other computers and NVIDIA graphics cards for it.

The biggest problem with personal ML workflows on Mac right now is the software.

I'm curious to know what software you're referring to.


I think people buying those don't realize requirements to run something as big as Opus, they think those gigabytes of memory on Mac studio/mini is a lot only to find out that its "meh" on context of LLMs. Plus most buy it as a gateway into Apple ecosystem for their Claws, iMessage for example.

> But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.

But it is Unified Memory? Thanks to Intel iGPU term is tainted for a long time.


I've tried to use a local LLM on an M4 Pro machine and it's quite painful. Not surprised that people into LLMs would pay for tokens instead of trying to force their poor MacBooks to do it.

Local LLM inference is all about memory bandwidth, and an M4 pro only has about the same as a Strix Halo or DGX Spark. That's why the older ultras are popular with the local LLM crowd.

Qwen 3.5 35B-A3B and 27B have changed the game for me. I expect we'll see something comparable to Sonnet 4.6 running locally sometime this year.

Could be, but it likely won't be able to support the massive context window required for performance on par with sonnet 4.6

This would be an absolute game changer for me. I am dictating this text now on a local model and I think this is the way to go. I want to have everything locally. I'm not opposed to AI in general or LLMs in general, but I think that sending everything over the pond is a no-go. And even if it were European, I still wouldn't want to send everything to some data center and so on. So I think this is a good, it would be a good development and I think I would even buy an Apple device for the first time since the iPod just for that.

I’m super happy with it for embedding, image recog, and semantic video segmentation tasks.

What are the other specs and how's your setup look? You need a minimum of 24GB of RAM for it to run 16GB or less models.

Tokens per second is abysmal no matter how much ram you have

Some models run worse than others but I have gotten reasonable performance on my M4 Pro with 24 GB of RAM

This is typically true.

And while it is stupid slow, you can run models of hard drive or swap space. You wouldn’t do it normally, but it can be done to check an answer in one model versus another.


48 GB MacBook Pro. All of the models I've tried have been slow and also offered terrible results.

Try a software called TG Pro lets you override fan settings, Apple likes to let your Mac burn in an inferno before the fans kick in. It gives me more consistent throughput. I have less RAM than you and I can run some smaller models just fine, with reasonable performance. GPT20b was one.

Local LLMs are useful for stuff like tool calling

What models are you using? I’ve found that SOTA Claudes outperform even gpt-5.2 so hard on this that it’s cheaper to just use Sonnet because num output tokens to solve problem is so much lower that TCO is lower. I’m in SF where home power is 54¢/kWh.

Sonnet is so fast too. GPT-5.2 needs reasoning tuned up to get tool calling reliable and Qwen3 Coder Next wasn’t close. I haven’t tried Qwen3.5-A3B. Hearing rave reviews though.

If you’re using successfully some model knowing that alone is very helpful to me.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: