Dunno. It could also just mean the so-called "Quantization-aware training" where your weight, activation and gradient is still bf16 and just before use it gets quantized to int8 in the same way you'd do it during inference.
This gives you the same "no mismatch between training and predict", and was a standard technique back in vision days (~2018).
This gives you the same "no mismatch between training and predict", and was a standard technique back in vision days (~2018).