As OP says, it shines in constrained environments where the model is transforming user-owned data. Definitely less useful for anything more open-ended.
Yea I do not recommend treating chromes prompt API as a good example of local LLMs. It's fine and stuff but it's really weak. 8b models from a year ago are better in some ways. And a lot of the recent model drops are meaningfully better.
It's based on a Gemma 3n model, and yeah it's not the best. But if you have a use case that needs constrained JSON output for example, it's pretty neat.
Maybe it would do better with the new Gemma 4 models, which the Chrome devs have been hinting at moving to. And why the API doesn't let you introspect / pick the model, I'm still not sure.
> I've got some demos of what the new Prompt API can do:
> Use surrounding context to rewrite your ad copy:
Yup, that's the plan. No local model, no webpage; more, better and cheaper adtech extortion/surveillance for vendors while everyone else pays for the juice and hardware degradation.
So you're running an llm to do data transformation that deterministic processes would be much better suited for and running 1,000 watt power supply to do so. Wild.
Chromium mostly does not support this, because it doesn't have the binary blob required to run the inference. However, it does still download the model weights and expose the LanguageModel API, because that part is hooked up.
Packagers might eventually disable that but I tested this behaviour in chromium 148 a few hours ago, and it would download the weights but has trouble running them.
Chromium doesn't support this API because it needs a binary blob to run the inference, although in theory it may still be configured to download the weights:
Author here. After trying out the Prompt API over the last week, I wrote up some details on the chromium internals, how to use the API, and made some toy demos.
It's a 4 GB model that can be used to run on-device inference.
If Chrome has the #optimization-guide-on-device-model and #prompt-api-for-gemini-nano flags enabled, either because it's part of some Origin Trial / Early Stable Release or something, then web pages will have access to the new Prompt API which allows any webpage to initiate the (one-time) download of the ~2.7 GiB CPU or ~4.0 GiB GPU model using LanguageModel.create()
When Chrome 148 releases tomorrow, this will be the default behaviour on desktop.
To download, it should check for 22 GiB free disk space on the volume where your Chrome data dir is, and at least double the model size of free space in your tmp dir.
First the tabs came for the RAM and i did not protest, for i had plenty.
Then they came for the chip and i did not protest, for it was dark silcon anyway.
Then they came for the HDD.
The more severe problem is that Google installs model weight files on a per-user basis, meaning Chrome occupies 4 more GB of space for every OS user on your device.
The company I work at has several environments and hundreds of VDI users in each environment. Chrome is the default browser in all of them. By my rough napkin math, this one small change by Google will eat up at least 15 terabytes of new disk space in total. (I sure hope we are using deduplication at the physical storage layer...)
Serious question: why do you use Google Chrome in the first place, when there are better alternatives, e.g. Brave (crypto stuff could be easily disabled) or Vivaldi, both with adblocker?
4GB, $0.10 (whatever the HD price) that is the equivalent of a High School level intelligent brain that can perform many cognitive tasks (and in the future even PhD level intelligence) for free?
Oh, the horror!!!
Wait, let me pay my HVAC guy $500 he deserved because he came all the way from his home to replace a fuse
It doesn't make sense to apply wholesale prices for mass storage. People are running Chrome on specific devices that they already own. Storage is not fungible in this way.
As the saying goes, gp didn't pay $500 to have the fuse replaced, he paid $500 for the training and experience that was required to know that the fuse had to be replaced.
> 4GB, $0.10 (whatever the HD price) that is the equivalent of a High School level intelligent brain that can perform many cognitive tasks for free?
This is better than my current solution of an actual human with masters degreed intelligence performing all my cognitive tasks for free how? I mean, i'm the first to admit i'm extremely lazy and even i'm over here like "really??"
The problem is that some of us are still on connections that charge per GB in rural areas. Here in Montana it's very common to pay about $0.25 per GB regardless of how much you use, so this is a $1 additional cost per desktop device. Places like public school districts have hundreds of computers and this will be somewhat significant for them.
I was thinking a similar thing. Many of our customers have purpose use computers that rarely see physical infrastructure internet, but need a modern browser (many chose Chrome on their own, we never recommended it).
They're going to get blasted with cellular data charges when they fire up their computer in the field.
Google's updater service also currently ignores the windows 11 metered connection hint. It will gladly download that model over your cell connection even if you have a data cap.
This is infuriating behavior.
Silicon Valley must wake up and understand the entire world does not live like them.
They live in a bubble and not a lot of the surrounding world makes it in to them. I know it is hyperbolic, but I lived there for a while and I stand by that opinion.
That said, you might be surprised to learn that some of the models from 3b-9b could probably replace 80% of the things nonvibe coders use chatgpt for.
Its a good idea to run small models locally if your computer can host them for privacy and cash saving reasons. But how can you trust Google to autoinstall one on your machine in 2026? I just couldn't do it.
Sure, local models good and yes, there's no way we can trust Google.
We can be positive the entire motivation of Chrome is user behavior surveillance. There's not a nano-chance in all the multiverses that Chrome model is doing anything privately. They've gone to extraordinary length to accomplish this. It's not for free.
It is entirely about user surveillance as well as pushing their product on to their users because they have the install base. Google Chrome has become Microsoft IE6 in hostile user behavior.
A claim about as useful then as it is now. They never wanted to be anything but, once Sergei left. The Schmidt era had them publicly declare one thing while doing something else entirely behind the curtain.
I don't trust them either, but the same Google makes Gemma 4 available to run as locally and privately as you want, and those models are pretty amazing for their size.
Ooh, this is interesting. There's nothing stopping them from sending jobs down to local machines. That's some 3 billion nodes. We went through this with coin mining and spam botting.
Nothing stopping it except your ire if it's discovered.
> But how can you trust Google to autoinstall one on your machine
Why are AI models something I'd be uniquely unable to trust Google to install, compared all the other code included in Chrome updates? Is your point just that you shouldn't trust Chrome in general?
Yes I would not trust Google or chrome. They have a history of class action lawsuits for doing shady things to users. Enabling them to condense data on your machine and transmit it however they want, should they choose too is suspect to me.
Yeah, so unclear why yer again everyone is so quickly running for the pitchforks & torches. The model doesn't do anything, it's just a sandbox.
I'm really tired of such overinflated ridiculousness shrillness against Google. Yes there are very real tensions to this company and their as business is scary as heck.
But folks don't seem capable of processing duality, don't seem to be able to do much but ad-hominem until they pass out. Its really so exhausting having such empty energy charging in every single time, and it keeps obstructing any ability to think straight or assess.
I was waiting for Google to pull a local LLM onto Chrome/Android devices. It opens up some revenue streams that weren't easily possible before: for example the often memed "I was talking about cigars with my wife one single time and now all I see are adsense ads for cigars" gets much easier with a local model doing speech to text and topic classification.
The point is that what you're "sick of" isn't actually authentic human thought, but in reality you're responding to a recent european-driven propaganda campaign with the goal of deriding anything and everything related to US tech.
Everyone who implemented or approved this should be prosecuted under the Computer Fraud and Abuse Act (18 U.S.C. § 1030). If I was on a jury, I wouldn't hesitate to send them to prison where they belong.
A fair and impartial jury is a fundamental part of freedom. I genuinely cannot believe that we have been reduced to wanting to destroy the jury system to punish companies we don’t agree with. At this point, this is less activism and more weaponized disrespect for fundamental freedoms.
I am amused when people fret about not using Chrome. I get it but… I have literally NEVER used Chrome. Perhaps I just don’t know what I am missing but the web seems to work just fine for me without it?
> That said, you might be surprised to learn that some of the models from 3b-9b could probably replace 80% of the things nonvibe coders use chatgpt for.
Really? I'm a total amateur when it comes to doing anything with local models but I tried a few in this range using ollama at this point, and they didn't seem to know much about anything, and I couldn't figure out how to get them to search the web or run other tools, so that was where the experiment ended.
A small local model that can use bash would be a bit of a game-changer for me.
Local models are improving quickly so if you keep an eye open you’ll find something soon enough. But from experience, I’ll warn you that local models can lose the plot very quickly. Their little self arguments when they get stuck usually come down to:
- It failed? This must be a mistake, I’ll try it again. It failed? This must be a mistake, I’ll try it again because then I will complete the task (repeat about every six seconds until you rescue it).
- You know, the best way to deal with a permissions problem is to erase the entire system. That’ll definitely solve those pesky permissions and I’ll complete the task.
The latest small models are now reliable enough at simple tools like web search I think. It's just afaik none of the user friendly harnesses like ollama or LMStudio have a real one-click setup flow for this. You'll need to download models and do a fair bit of tool configuration.
I find models of this size (not tested this one specifically) at being very good at simple data extraction from user input. Think about things like parsing date and time of an event from a description or parsing a human-typed description of a repeating event rule.
this is considered a large model. i think you might be surprised how many "small" models chrome has already pulled down on your disk.
but to answer your question: one of the services that uses a small model: PermissionsAIv4
"""
Use the Permission Predictions Service and the AIv4 model to surface permission notification requests using a quieter UI when the likelihood of the user granting the permission is predicted to be low. Requires `Make Searches and Browsing Better` to be enabled. – Mac, Windows, Linux, ChromeOS, Android
"""
I'm on my phone now so I can't check if something has changed, but what you want to protect from change is the directory, not the files. A file can be deleted and created again if the process can write the directory.
Searching about:flags for model comes up with a whole bunch:
#omnibox-ml-url-scoring-model
#omnibox-on-device-tail-suggestions
#optimization-guide-on-device-model
#text-safety-classifier
#prompt-api-for-gemini-nano
#writer-api-for-gemini-nano
#rewriter-api-for-gemini-nano
#proofreader-api-for-gemini-nano
#summarizer-api-for-gemini-nano
#on-device-model-litert-lm-backend
Then around gemini but not caught by the search for models:
#skills (maybe? I think this is implied by "gemini in chrome"?)
edit: I don't see a carte blanch AI disabling option. As much as I dislike Mozilla's growing obsession with AI, at least they give me a top level option to disable all AI stuff. I only keep Chrome around for occasional testing reasons.
So my understanding of that is that the download happens only when sites call the Prompt API right?
Because my Chrome stable has been updated to v148 now, and I don't see any AI models in my user profile folder. My profile size is only 328 MB, with the Code Cache subfolder occupying the most space (135 MB).
Next step: Invoke the prompt API from within online ads and run a "p2p" AI inference provider which forwards incoming LLM queries to website visitors. :-)
I believe webpages that use the API must request from the user via a system permissions dialogue to aces the prompt API, according the docs a few months ago.
Depends on where you get it. By default the flags will be enabled, but some packagers may choose to disable them. I haven't seen a major distro release chromium 148 yet.
Weirdly though, chromium won't be able to actually use the model even though it can download it, because the inference engine is a closed-source blob.
Cmd+Shift+V - Stacked clipboard, you can start typing to search or hit a number to choose what to paste (keeps everything you've copied/cut inside jetbrains for a while)
Cmd+Shift+E - Recent locations, you can start typing to search - shows little buffers of where you've been recently
Cmd+Shift+A - Action tab of the command palette - fuzzy search for any command (really the only shortcut you need, other than maybe Shift+Shift for main command palette shortcut)
--- Through the Action bar...
Local History / Local History of Selection - you can start typing to search quite far back the history of all changes of the current file or selection - you can also right click a folder or the project and do the same. Much finer grained than git.
The general concept of being able to search for something and edit directly in the buffer of the search results.
Hero! I had not done my homework/have not been aware, but these all look fantastic! The stacked clipboard is something I periodically mentally complain about (Why is clipboard on every OS/tool I've used single item?)
I will add one that are possibly more well-known:
- ctrl + shift + F: Find text in any file
- ctrl + N: Find types (structs, classes etc)
- ctrl + shift + N: Find any file by name or path
Windows has had a pretty usable stacking clipboard for a while! You just have to activate it. Since you can pin thing into it it’s also quite useful as a rough and ready way to type special characters you use frequently.
Mathematica is the earliest thing I am aware of with this feature where it was Alt+. to expand selection in their notebook interface starting in the early 90s. But the thing I miss most that I still can't shake the muscle memory of after almost a decade of not using much Mathematica, is that single/double/triple/n-click scaled this way as well. So double-click selected a whole word (as in all editors), triple-click selected all the comma-separated multiple args of a function, 4-click for f(a,r,g,s), and so on.
I work on AST based revision control. I have a stack of ideas on how to achieve the same Ctrl+W effect with commits/diffs/cherry-picks. All still in flux. If you have some thoughts to share, please do.
I use it constantly in helix too. The vscode one is meh. I think I saw a discussion in github once about switching to tree-sitter, which would improve AST-related actions. I don't think it went anywhere though.
I love AST aware editing. I think it's one reason it's always been so nice to edit lisps. Stuff that is complicated to describe in javascript (and doesn't have LSP support) p much requires a whole AST parser, but in lisp it's just a simple list operation. When I go back typescript after a weekend of clojure, I reeaally miss slurp! and other paredit commands
I tried it, but it just was too clumsy. Sometimes refactoring/editing needs to go through phases where the AST is invalid, and MPS makes that just too clumsy.
To me, it feels like Zed and VsCode perform most operations in a general way on the text; they don't seem to (in Python and Rust at least) have an understanding of the code structure in the way JB does. (And based on some digging on Ki the way it does as well?) So, I would bet they are using that text-based model, which would be hit/miss here.
I had this issue too, so I remapped Ctrl-W/Shift-Ctrl-W to Ctrl-\/Shift-Ctrl-\ .
(Also git operations became two-key sequences, starting with Ctrl-G and that damn Ctrl-K stopped being the shortcut for commit.)
To keep on-top of tabs in Firefox, I use 'Auto Tab Discard' [1] to discard tabs after a certain amount of inactivity. Then when I need to clean up my list of tabs, I click on any discarded tabs I want to keep, and then use my extension 'Close Discarded Tabs' [2] to clear the rest.
As OP says, it shines in constrained environments where the model is transforming user-owned data. Definitely less useful for anything more open-ended.