Use llamafile [1], it can be as simple as downloading a file (for mixtral, [2]), making it executable and running it. The repo README has all the info, it's simple and downloading the model is what takes the most time.
In my case I got the runtime detection issue (explained in the README "gotcha" section). Solved my running "assimilate" [3] on the downloaded llamafile.
Either https://lmstudio.ai (desktop app with nice GUI) or https://ollama.com (command-like more like a docker container that you can also hook up to a web UI via https://openwebui.com) should be super straightforward to get running.
I am the author of Msty [1]. My goal is to make it as straightforward as possible with just one click (once you download the app). If you try it, let me know what you think.
Looks great. Can you recommend what GPU to get to just play with the models for a bit? (I want to have perform it fast, otherwise I lose interest too quickly).
Are consumer GPUs like the RTX 4080 Super sufficient, or do I need anything else?
Why is this both free and closed source? Ideally, when you advertise privacy-first, I’d like to see a GitHub link with real source code. Or I’d rather pay for it to ensure you have a financial incentive to not sell my data.
There’s incredible competition in this space already - I’d highly recommend outright stating your future pricing plans, instead of a bait-and-switch later.
Check out PrivateGPT on GitHub. Pretty much just works put of the box. I got Mistral7B running on a GTX 970 in about 30 minutes flat first try. Yep, that's the triple-digit GTX 970.