Performance for text generation is memory-limited, so lack of native fp8 support...

		threeducks 7 hours ago \| parent \| context \| favorite \| on: Llama.cpp AI Performance with the GeForce RTX 5090... Performance for text generation is memory-limited, so lack of native fp8 support does not matter. You have more than enough compute left over to do the math in whichever floating point format you fancy.