Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: IEEE-754-Conformant FP64 on Metal (Apple Silicon) (github.com/guyfischman)
1 point by guyfischman 29 days ago | hide | past | favorite | 1 comment


Bit-exact SW-emulated FP64 on Metal, 5-11x faster than CPU HW-accelerated FP64.

I was learning about randomx and wanted to play with the algorithm on Mac, discovered Metal has no FP64 math. Further discovered this has been a frustration for a lot of people in ML/Science/Gaming.

I went down a rabbit hole. The naive implementation was ~10% the throughput of hardware CPU fp64 on the same machine. After obsessively squeezing every bit of juice out of the GPU, the final version is 5–11× faster than a 14-thread CPU hardware-fp64 baseline on arithmetic, and 10–35× on conversions and comparisons (M4 Pro, 20 GPU cores).

I hope you find this useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: