The bubble pops when Apple releases an iPhone that runs a good enough for most things LLM locally. At that point cloud hardware investments will plateau (unless some new GPU melting use case comes out). Investors will move from nvidia, AMD into Apple.
Based on the recently released graph of how people are using chatgpt. ~80% of use cases (practical guidance, seeking information, writing) could presumably run on a local model.
What's the advantage of that, exactly? Why would you want something very compute intensive run on your phone instead of just using an API to data centers with great economy of scale?
My assumption is that most users won't actually care if the LLM is in the cloud or device. That said, quite a few folks have iPhones and Apple's only way into the AI race is to go to it's strength, 1B+ hardware devices that they design the silicon for. They will produce a phone that runs a local LLM and market it as private and secure. People upgrade every couple of years (lose or breaks) so this will drive adoption. I'm not saying people will vibe code on their iphones.
I've been using Qwen3:32b on a 32GB M1 (asahi) and it does most of what I need, albeit a bit slow, but not slow enough that I´d pay monthly for remote ad delivery.
I suspect this huge splurge of hardware spending is partially an attempt to starve the market of cheap RAM and thus limit companies releasing 128GB/256GB standalone LLM boxes.
The models running on $50k GPUs will get better but the models running on commodity hardware will hit an inflection point where they are good enough for most use cases.
If I had to guess I would say that's probably 10 or 15 years away for desktop class hardware and longer for mobile (maybe another 10 years).
Maybe the frontier models of 2040 are being used for more advanced things like medical research and not generating CRUD apps or photos of kittens. That would mean that the average person is likely using the commodity models that are either free or extremely cheap to use.
ok, you can technically upload all your photos to Google cloud for all the same semantic labeling features as iOS Photos app, but having local, always available and fast local inferencing is arguably more useful and valuable to the end user.
What's the benefit to running LLMs locally? Data is already remote, LLM inferencing isn't particularly constrained by Internet latency. So you get worse models, performance, and battery life. Local compute on a power constrained mobile device is required for applications that require low latency or significant data throughput and LLM inferencing is neither.
30k in a month is an enormous amount of tokens with Claude through AWS Bedrock. And companies already commonly trust AWS with their most sensitive data.
The data you need is mostly not remote. A friend works at a software development company, they can use LLMs, but only local ones (local as in their datacenter) and it can only be trained on their code base). Customer service LLMs need to be trained on in-house material, not generic Internet sources.
The general advantage is that you know that you're not leaking information, because there's nowhere to leak it to. You know the exact input, because you provided it. You also get the benefit of being able to have on device encryption, the data is no good in the datacenter if it's encrypted.