Hacker News new | past | comments | ask | show | jobs | submit login

Yes. The optimizer is keeping a higher precision copy. It's likely slower and requires more memory than an equivalent full precision model when it comes to training. I'd also imagine it requires a multiple of epochs to get one epoch equivalent because the forward pass will need several goes to get the right choice between three states, rather than just moving a little bit in the right direction.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: