Hacker News new | past | comments | ask | show | jobs | submit login

Dunno. It could also just mean the so-called "Quantization-aware training" where your weight, activation and gradient is still bf16 and just before use it gets quantized to int8 in the same way you'd do it during inference.

This gives you the same "no mismatch between training and predict", and was a standard technique back in vision days (~2018).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: