Hacker News new | past | comments | ask | show | jobs | submit login

Typically, you need to use some tricks for pre-training in lower precision (finetuning seems to work at low precision), with FP16 you need loss scaling for example. With MX, you can train in 6 bits of precision without any tricks, and hit the same loss as FP32.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: