Performance is good enough for non-reasoning models even if they're FP8 or FP4. ...

Performance is good enough for non-reasoning models even if they're FP8 or FP4. Check the phoronix article, the difference between the 3090 and 4090 is rather small.

There's weight-only FP8 in vLLM on NVidia Ampere: https://docs.vllm.ai/en/latest/features/quantization/fp8.htm...