NixOS just got tabbyml[1] which is built on llama-cpp. Working on systemsd services the weekend and updating latest tabbyml release which supports rocm in addition to cuda
One thing about LLMs is that they are 6GB+ (and much larger for "smart" ones) just sitting in the background. They suck power and produce heat like nothing else, and they are finicky, especially at smaller sizes.
Running one as a background desktop assistant is whole different animal than calling a Microsoft API.
At least with a GPU that can do power save that's not the case. I have a box with some 3090's in it, each card will idle <50W when it's not doing inference with the weights loaded into VRAM. Only when I ask it to do inference it will spin up and start consuming 300-400W.
> One thing about LLMs is that they are 6GB+ (and much larger for "smart" ones) just sitting in the background. They suck power and produce heat like nothing else, and
Huh? That's not at all true. It's only using processing power (CPU) while it actually generates text, otherwise it sits and wait. Although yes it occupies memory (RAM or VRAM) if you don't unload it, but you can configure it to startup when you need it, and shut down when you don't.
If one uses llama.cpp in CPU mode, then models are mmaped, so they do not really occupy memory when not used (they are just in reclaimable page cache).
Do anyone actually use the CPU for anything besides testing? Last time I tried it, it was horribly slow compared to GPU that there wasn't really any point to use for me, besides getting access to more memory.
On a Mac Studio with NixOS based Asahi Linux and 128Gb of RAM, mixtral 8x7b uses 49GB of RAM. At the same time I load airflow tasks that deal with world wide datasets (using ~60GB on 16 parallel streams with the performance cores) format is parquet and also mmaped.
Computer still has 8 efficiency cores and the whole GPU for visualizing the maps using lonboard / browsing / etc.
The computer uses 8-10W when idle, ~100W when running jobs or actively using the LLM and around ~200W when really using the GPU.
This makes it very efficient energy wise in my book compared to the beast of keeping a modern CPU and nvidia GPU on when idle. My electricity bill is unaffected.
./mixtral-8x7b-instruct-v0.1.Q8_0.llamafile --cli -t 16 -n 200 -p "In terms of Lasso"
I got 15 tokens per second for prompt evaluation and 8 tokens per second for regular eval.
The same hardware can run things much faster on OSX, or if you use more quantization but I prefer to run things at Q8 or f16 even if they are slow. In the future I how to use GPU, ANE and the crazy 1.58 or 0.68 bit quantization but for now this does the trick handsomely.
Some of us like to experiment with new technology but don't physically own the kind of a hardware that is ideal for it. So yes, I've actually gotten passable results running on CPU (on a 2019 laptop at that)
true but as engineers we should sometimes sacrifice and live in the future a bit to maximize opportunity and in the future hardware will adapt to software as it always has
I'd love this to be true, and it might be for some specific well tested situations with a narrow set of data that you can be confident about. But that's a bit wishful, isn't it?
That is an example of a good, narrow task area that a small model could be good at, with current tech, which differs from general AI assistants like GPT-4. Using mixture of experts with task specific fine-tuning, I can see it being possible, but I was mainly saying Phi 2 ain't it. It may be a good starting place! Also, a code completion model could totally end up easily installed in a major Linux distro's default package manager soon, if not already.
None. Microsoft has Copilot in preview mode in Windows and it's not very integrated apart from a chat window. I doubt GNOME/KDE will be able to dedicate enough resources to adding an assistant that is well integrated with the desktop environment any time soon.
A search in Fedora yields a single GSoC project[0] limited in scope to NetworkManager and it's not clear if anyone actually is working on that.
If the use case you're interested in is actually having the LLM doing things for you in SaaS applications, that wouldn't need deep integration but, considering Google is yet to deliver a Google Drive client for Linux, I wouldn't hold my breath waiting for a native Linux AI-assisted assistant.
Your best option right now is to interface with the assistants through their web interface and hope they have plugins/extensions to interact with things you want.
Other than that, some people have built prototypes running LLMs locally that talk to things like Home Assistant. But again, no deep desktop integration.
Given the fact that one can control damned near everything over command line in linux, and command line is a much more stable interface than a gui, I'd guess that there's a great deal more potential for assistants in linux than windows.
The other day I wanted to figure out how to turn my dock red if I dropped the vpn in gnome. I found the file that controlled my wireguard gnome shell extension and with the help of gpt3.5 and some very rudimentary js knowledge (I'm a backend dev, don't hate me), I was able to add a js function to toggle the color on vpn up / down events. This didn't even take me an hour to do and I'd never even thought to try it before GPT.
Sure, things are janky now, but the future potential of LLMs with linux and OSS is huge.
Recently came across https://www.warp.dev/ on HN, which includes a AI part for your terminal. That's a paid feature running nonlocal, but it's a start I suppose.
"I wouldn't hold my breath waiting for a native Linux AI-assisted assistant"
A simple chat window and a automated script to install a existing small modell should be doable, but sounds not very exciting to me.
But mid term, having a locally run LLM and integrated into the OS that scans my files and can summarize folders for me, would be nice. I have big folders with mixed stuff, AI would be nice to sort that. I do believe some people are working on something like this, but the bulk of it is not OS specific. And not OSS.
The idea is to have a personal AI, that is trained with my content, my files. My emails, my pictures, projects, notes, etc. How that can be implemented in the best way, is not my expertize, but I believe is subject of heavy research.
Not OP, but when searching for files, spelling something wrong, or using the wrong synonym is a big problem. We're just used to computers being inflexible.
It sounds like you want... folders. Genuinely. Or a tag system. Or some other metadata.
Like, take this query for example: show me the folder(s), where my old University projects are stored. How would an AI, however powerful, know what are "university projects" if they aren't tagged as such? And if they were, why is the AI necessary?
One approach I've tried before is: if you have a folder /projects/ with so many project folders in it that you don't even know anymore what is what anymore, you just create a text file called /projects/index.txt and write the name of each folder in there and what it's for, so you don't forget later.
Well, but they ain't tagged or sorted. And an AI could know, because of the context. If it knew what year I studied and what courses etc then this information would be enough to separate content. But I am aware, that this tech ain't there yet.
> I wouldn't hold my breath waiting for a native Linux AI-assisted assistant.
On Mac when I press Command + Space, it brings up Spotlight search
That can't easily be added to be the equivalent of some kind of LLM prompt on GNOME/KDE/XFCE?
I don't quite know what you'd ask it/do with it that would be of much value? Seems like a quicker way/a wrapper around either asking an LLM questions via CLI or basically Electron wrapping HTML (like this https://github.com/lencx/ChatGPT)?
Unrelated, but is there something like Bonzi Buddy for linux? Not the spyware part, just the friendly looking clippy-esque character that can tell you about your new e-mails, weather, or whatever? I kind of wish I had something like that.
Why? They take lots of space and lots of computing power. Linux has always been about lightweight and a bundle only containing essential things. You can always install one if you need it but as it stands right now LLMs are not useful enough to warrant their bundling in a distro. Just my 2 cents
> Linux has always been about lightweight and a bundle only containing essential things.
I really don't think that's true. There have always been distros that are based on being tiny, of course, but I think most of the normal distros are concerned with hitting a happy medium of size and features. Otherwise I can't imagine why anything would be shipping GNOME or KDE over LXDE, or why libreoffice would be installed by default. So the question is more where LLMs are on cost/benefit... which granted, may not be there yet, but I could easily see it turning into a checkbox at install time - "this machine has 16+GB of RAM; add SomeLLM?"
you can't really compare the utility of an LLM against libreoffice or similar. There is no comparison. Libreoffice is something that you would definitely use unlike an LLM.
Wouldn't it be ironic if chatgpt assistant on linux gives linux end-user desktop dominance? lol... Microsoft would be like, "you fools, you've doomed us all"
Edit: And Arch packages ollama officially - https://archlinux.org/packages/?sort=&q=llama&maintainer=&fl... - and a few things in the AUR - https://aur.archlinux.org/packages?O=0&K=llama