The bubble pops when Apple releases an iPhone that runs a good enough for most t...

jeswin · 2025-10-06T13:05:13 1759755913

As a local LLM enthusiast, I can tell you that it's useless for most real work - even on desktop form factors. Phones catching up is ever farther out.

randomname4325 · 2025-10-06T13:14:31 1759756471

Based on the recently released graph of how people are using chatgpt. ~80% of use cases (practical guidance, seeking information, writing) could presumably run on a local model.

stri8ted · 2025-10-06T14:03:51 1759759431

For video use cases, which will become increasingly popular, we are a long ways away.

echelon · 2025-10-06T16:54:07 1759769647

Wan runs on local GPUs and looks amazing.

Sora 2 takes a lot of visual shortcuts. The innovation is how it does the story planning, vocals, music, and lipsync.

We'll have that locally in 6 months.

aswegs8 · 2025-10-06T12:59:25 1759755565

What's the advantage of that, exactly? Why would you want something very compute intensive run on your phone instead of just using an API to data centers with great economy of scale?

randomname4325 · 2025-10-06T13:19:28 1759756768

My assumption is that most users won't actually care if the LLM is in the cloud or device. That said, quite a few folks have iPhones and Apple's only way into the AI race is to go to it's strength, 1B+ hardware devices that they design the silicon for. They will produce a phone that runs a local LLM and market it as private and secure. People upgrade every couple of years (lose or breaks) so this will drive adoption. I'm not saying people will vibe code on their iphones.

lelanthran · 2025-10-06T13:39:53 1759757993

Price, for one. I don't mind running a local model at half the speed if all it costs is electricity.

A local model basically allows me to experiment with running an agent 24x7, 365 days a year with continuous prompting.

SaaS won't be able to match that.

whitehexagon · 2025-10-06T17:51:58 1759773118

Or just a mini configured default 128GB or 256GB.

I've been using Qwen3:32b on a 32GB M1 (asahi) and it does most of what I need, albeit a bit slow, but not slow enough that I´d pay monthly for remote ad delivery.

I suspect this huge splurge of hardware spending is partially an attempt to starve the market of cheap RAM and thus limit companies releasing 128GB/256GB standalone LLM boxes.

simianwords · 2025-10-06T13:18:31 1759756711

why do you think LLM's will get good enough that they can run locally but the ones requiring nvidia GPU's will not get better?

nerdix · 2025-10-06T15:14:27 1759763667

The models running on $50k GPUs will get better but the models running on commodity hardware will hit an inflection point where they are good enough for most use cases.

If I had to guess I would say that's probably 10 or 15 years away for desktop class hardware and longer for mobile (maybe another 10 years).

Maybe the frontier models of 2040 are being used for more advanced things like medical research and not generating CRUD apps or photos of kittens. That would mean that the average person is likely using the commodity models that are either free or extremely cheap to use.

vachina · 2025-10-06T15:05:53 1759763153

ok, you can technically upload all your photos to Google cloud for all the same semantic labeling features as iOS Photos app, but having local, always available and fast local inferencing is arguably more useful and valuable to the end user.

TiredOfLife · 2025-10-06T13:22:24 1759756944

The new iPhones barely got 12gb of ram. The way Apple is going iPhones will have enough ram for llms in about 100 years

vachina · 2025-10-06T15:08:40 1759763320

Trying to compare RAM size and CPU cores is so yesterday. Apple owns the entire stack they can make anything fit into their core if they so desire.

baq · 2025-10-06T13:06:06 1759755966

that's... some years... from now

kcb · 2025-10-06T12:59:33 1759755573

What's the benefit to running LLMs locally? Data is already remote, LLM inferencing isn't particularly constrained by Internet latency. So you get worse models, performance, and battery life. Local compute on a power constrained mobile device is required for applications that require low latency or significant data throughput and LLM inferencing is neither.

fkyoureadthedoc · 2025-10-06T13:03:20 1759755800

> What's the benefit to running LLMs locally?

At work:

That I don't rent $30,000 a month of PTUs from Microsoft. That I can put more restricted data classifications into it.

> LLM inferencing isn't particularly constrained by Internet latency

But user experience is

kcb · 2025-10-06T13:06:01 1759755961

30k in a month is an enormous amount of tokens with Claude through AWS Bedrock. And companies already commonly trust AWS with their most sensitive data.

mrweasel · 2025-10-06T13:05:00 1759755900

The data you need is mostly not remote. A friend works at a software development company, they can use LLMs, but only local ones (local as in their datacenter) and it can only be trained on their code base). Customer service LLMs need to be trained on in-house material, not generic Internet sources.

The general advantage is that you know that you're not leaking information, because there's nowhere to leak it to. You know the exact input, because you provided it. You also get the benefit of being able to have on device encryption, the data is no good in the datacenter if it's encrypted.

kcb · 2025-10-06T13:08:43 1759756123

Local as in datacenter is the key there. The original comment was about end user devices.