Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The bubble pops when Apple releases an iPhone that runs a good enough for most things LLM locally. At that point cloud hardware investments will plateau (unless some new GPU melting use case comes out). Investors will move from nvidia, AMD into Apple.




As a local LLM enthusiast, I can tell you that it's useless for most real work - even on desktop form factors. Phones catching up is ever farther out.

Based on the recently released graph of how people are using chatgpt. ~80% of use cases (practical guidance, seeking information, writing) could presumably run on a local model.

For video use cases, which will become increasingly popular, we are a long ways away.

Wan runs on local GPUs and looks amazing.

Sora 2 takes a lot of visual shortcuts. The innovation is how it does the story planning, vocals, music, and lipsync.

We'll have that locally in 6 months.


What's the advantage of that, exactly? Why would you want something very compute intensive run on your phone instead of just using an API to data centers with great economy of scale?

My assumption is that most users won't actually care if the LLM is in the cloud or device. That said, quite a few folks have iPhones and Apple's only way into the AI race is to go to it's strength, 1B+ hardware devices that they design the silicon for. They will produce a phone that runs a local LLM and market it as private and secure. People upgrade every couple of years (lose or breaks) so this will drive adoption. I'm not saying people will vibe code on their iphones.

Price, for one. I don't mind running a local model at half the speed if all it costs is electricity.

A local model basically allows me to experiment with running an agent 24x7, 365 days a year with continuous prompting.

SaaS won't be able to match that.


Or just a mini configured default 128GB or 256GB.

I've been using Qwen3:32b on a 32GB M1 (asahi) and it does most of what I need, albeit a bit slow, but not slow enough that I´d pay monthly for remote ad delivery.

I suspect this huge splurge of hardware spending is partially an attempt to starve the market of cheap RAM and thus limit companies releasing 128GB/256GB standalone LLM boxes.


why do you think LLM's will get good enough that they can run locally but the ones requiring nvidia GPU's will not get better?

The models running on $50k GPUs will get better but the models running on commodity hardware will hit an inflection point where they are good enough for most use cases.

If I had to guess I would say that's probably 10 or 15 years away for desktop class hardware and longer for mobile (maybe another 10 years).

Maybe the frontier models of 2040 are being used for more advanced things like medical research and not generating CRUD apps or photos of kittens. That would mean that the average person is likely using the commodity models that are either free or extremely cheap to use.


ok, you can technically upload all your photos to Google cloud for all the same semantic labeling features as iOS Photos app, but having local, always available and fast local inferencing is arguably more useful and valuable to the end user.

The new iPhones barely got 12gb of ram. The way Apple is going iPhones will have enough ram for llms in about 100 years

Trying to compare RAM size and CPU cores is so yesterday. Apple owns the entire stack they can make anything fit into their core if they so desire.

that's... some years... from now

What's the benefit to running LLMs locally? Data is already remote, LLM inferencing isn't particularly constrained by Internet latency. So you get worse models, performance, and battery life. Local compute on a power constrained mobile device is required for applications that require low latency or significant data throughput and LLM inferencing is neither.

> What's the benefit to running LLMs locally?

At work:

That I don't rent $30,000 a month of PTUs from Microsoft. That I can put more restricted data classifications into it.

> LLM inferencing isn't particularly constrained by Internet latency

But user experience is


30k in a month is an enormous amount of tokens with Claude through AWS Bedrock. And companies already commonly trust AWS with their most sensitive data.

The data you need is mostly not remote. A friend works at a software development company, they can use LLMs, but only local ones (local as in their datacenter) and it can only be trained on their code base). Customer service LLMs need to be trained on in-house material, not generic Internet sources.

The general advantage is that you know that you're not leaking information, because there's nowhere to leak it to. You know the exact input, because you provided it. You also get the benefit of being able to have on device encryption, the data is no good in the datacenter if it's encrypted.


Local as in datacenter is the key there. The original comment was about end user devices.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: