Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah standalone GPUs do indeed have more bandwidth, but most of these Copilot PCs that have NPUs just have shared memory for everything I think.

fetching 16 8 bit values vs 32 4 bit values is the same, this is the form they are stored in memory. Doing some unpacking into more registers and back is more or less free anyway, if you are memory bandwidth bound. Largely on these lower end machines everything is memory bound not compute bound, although the CPUs cant often use the full memory bandwidth in some systems (eg the Macs) but the GPU can.



Yes, agree. Probably the main thing is the NPU is just a dedicated unit without the generality / complexity of a CPU and so able to crunch matmuls more efficiently.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: