Hacker News new | past | comments | ask | show | jobs | submit login

The paper was definitely cool but doesn't allow you to run particularly large LLMs on iPhones. It allows you to run a certain kind of LLM (sparse ReLU based LLMs) whose weights are somewhere less than 2x RAM. So, 7b Falcon works, but the competitive-with-gpt-3.5-turbo LLMs are still out of reach (and aren't ReLU based, although maybe that could change in the future). And nothing is competitive with GPT-4 right now.

Of course in the long run I think it will happen — smaller and more efficient models are getting better regularly, and Apple can also just ship their new iPhones with larger amounts of RAM. But I'd be very surprised if there was GPT-4 level intelligence running locally on an iPhone within the next couple years — that sized model is so big right now even with significant memory optimizations, and I think distilling it down to iPhone size would be very hard even if you had access to the weights (and Apple doesn't). More likely there will be small models that run locally, but that fall back to large models running on servers somewhere for complex tasks, at least for the next couple years.

Yea but it's likely to be better than the current iteration of Siri even in that state.

They can still outsource to a much larger LLMs on their servers for anything that can't be done locally like they do now.

> And nothing is competitive with GPT-4 right now.

You mean nothing available? Or you mean nothing that public knows exists? The answers to those two questions are different. There are definitely products that aren't available but the public knows exist and are upcoming that are in GPT-4's ballpark.

I mean nothing that is able to be benchmarked and validated by third parties is GPT-4 quality. I know there are upcoming releases that are hyped as being equal to GPT-4, e.g. Gemini Ultra, which I am very excited to get my hands on — but regardless, Ultra is not small enough to run on phones, even using the sparse ReLU flash memory optimization. And we'll see how it benchmarks once it's released; according to some benchmarks Gemini Pro has somewhat underperformed GPT-3.5-Turbo [1], despite Google's initial claims. (Although there are criticisms of that benchmarking, and it does beat the current 1106 version of GPT-3.5-Turbo on the Chatbot Arena leaderboard [2], although it slightly underperforms the previous 0613 version.)

1: https://arxiv.org/pdf/2312.11444.pdf

2: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

Easy to claim but harder to prove. Name one.

I heard rumours of these claims a few weeks ago, I assume they are talking about the same thing. Nothing concrete but from a reputable person and honestly with how well mixtral performs on the chatbot arena elo board I wouldn't be surprised if it's true.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
