AFAIK you want a model that will sit within the 24GB VRAM on the GPU and leave a...

AFAIK you want a model that will sit within the 24GB VRAM on the GPU and leave a couple of gigs for context. Once you start hitting system RAM on a PC you're smoked. It'll run, but you'll hate your life.

Have you ever run a local LLM at all? If not, it is still a little annoying to get running well. I would start here:

https://www.reddit.com/r/LocalLLaMA/