I have a 128 core Threadripper, a 2080 Ti and a 3080 Ti. How can I play with ope...

brucethemoose2 · on July 14, 2023

Kobold.cpp is your best bet.

You can leverage those big CPUs while still loading both GPUs with a 65B model.

... If you are feeling extra nice, you should set that up as an AI horde worker whenever you run koboldcpp to play with models. It will run API requests for others in the background whenever its not crunching your own requests, in return allowing you priority access to models other hosts are running: https://aihorde.net/

pmarreck · on July 14, 2023

oooh, this is a great idea

brucethemoose2 · on July 14, 2023

Also, I would suggest this model as one to play with:

https://huggingface.co/ycros/airoboros-65b-gpt4-1.4.1-PI-819...

Check the prompting syntax here, it has a huge effect on the output:

https://huggingface.co/jondurbin/airoboros-65b-gpt4-1.4

estreeper · on July 14, 2023

If you're just looking to play with something locally for the first time, this is the simplest project I've found and has a simple web UI: https://github.com/cocktailpeanut/dalai

It works for 7B/13B/30B/65B LLaMA and Alpaca (fine-tuned LLaMA which definitely works better). The smaller models at least should run on pretty much any computer.

brucethemoose2 · on July 14, 2023

That project seems unmaintained, which is a problem because llama.cpp is changing extremely rapidly.

Also, it has no "1 click" exe release like kobold.

freedomben · on July 14, 2023

May I ask why you have such an amazing machine, and two nice graphics cards? Feel free to tell me it's none of my business, it's just very interesting to me :-)

pmarreck · on July 14, 2023

Career dev who had the cash and wanted to experiment with anything that can be done concurrently, such as in my language of choice lately, which features high concurrency (https://elixir-lang.org/) or these LLM's, or anything else that can be done in massively parallel fashion (which is, perhaps surprisingly, only a minority of possible computer work, but it still means I can run many apps without much slowdown!)

I originally had 2 2080ti's to experiment also with virtio/proxmox (you need 1 for the host and 1 for any VM you run). I never got that running successfully at the time, but then Proton got really good (I mainly just wanted to run windows games fast in a VM, but that circumvented that). Later on I upgraded one of them to a 3080ti.

It's a System76 machine, they make good stuff

nickthegreek · on July 14, 2023

Check out r/LocalLlama for a bunch of resources.