- I wouldn't use anything higher than a 7B model if you want decent speed. - Quantize to 4-bit to save RAM and run inference faster.