You can leverage those big CPUs while still loading both GPUs with a 65B model.
... If you are feeling extra nice, you should set that up as an AI horde worker whenever you run koboldcpp to play with models. It will run API requests for others in the background whenever its not crunching your own requests, in return allowing you priority access to models other hosts are running: https://aihorde.net/
If you're just looking to play with something locally for the first time, this is the simplest project I've found and has a simple web UI: https://github.com/cocktailpeanut/dalai
It works for 7B/13B/30B/65B LLaMA and Alpaca (fine-tuned LLaMA which definitely works better). The smaller models at least should run on pretty much any computer.
May I ask why you have such an amazing machine, and two nice graphics cards? Feel free to tell me it's none of my business, it's just very interesting to me :-)
Career dev who had the cash and wanted to experiment with anything that can be done concurrently, such as in my language of choice lately, which features high concurrency (https://elixir-lang.org/) or these LLM's, or anything else that can be done in massively parallel fashion (which is, perhaps surprisingly, only a minority of possible computer work, but it still means I can run many apps without much slowdown!)
I originally had 2 2080ti's to experiment also with virtio/proxmox (you need 1 for the host and 1 for any VM you run). I never got that running successfully at the time, but then Proton got really good (I mainly just wanted to run windows games fast in a VM, but that circumvented that). Later on I upgraded one of them to a 3080ti.
How can I play with open source LLM's locally?