There's weight-only FP8 in vLLM on NVidia Ampere: https://docs.vllm.ai/en/latest/features/quantization/fp8.htm...
There's weight-only FP8 in vLLM on NVidia Ampere: https://docs.vllm.ai/en/latest/features/quantization/fp8.htm...