Hacker News new | past | comments | ask | show | jobs | submit login

It doesn't matter, operations are the same in forward and backward mode.

"Made for inference" just means "too slow for training" if you are pessimistic or "optimized for power efficiency" if you are optimistic.

Otherwise training and inference are basically the same

You can do inference pretty easily with 8-bit fixed point weights. Now attempt doing the same during training.

Training and inference are only similar at a high level, not in actual application.

... because the gradient that is being followed may have a lower magnitude than can be represented in the lower precision.

You also need a few other operations for training, such as transpose, which may or may not be fast in a particular implementation.

(ETA: In case it's not obvious, I'm agreeing with david-gpu's comment, and adding more reasons that training currently differs from inference.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact