Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Certainly big for China, but if it can't run CUDA then it's not going to help them catch up in the AI race




Macs can't run CUDA but everyone is buying those for AI anyway.

GP was likely referring to training, not inference.

Only those who don't care/know about prompt processing speed are buying Macs for LLM inference.

Don't know and don't care are definitely things that I could be, but it also makes sense if they want to keep lookups private.

Even 40 tokens per second is plenty enough for real time usage. The average person reads at ~4 words per second, 40 tokens per second is going to be 15-20 words per second.

Even useful models like gemma3 27b are hitting 22 t/s on 4bit quants.

You aren't going to be reformatting gigabytes of PDFs or anything, but for a lot of common use cases, those speeds are fine.


Maybe they'll make a better abstraction layer than cuda.

Vulkan already achieves 95% of cuda, with the remaining 5% being scheduling.

Dunno, training maybe, for inference pytorch and llama seem more important.

I thought they were already working around cuda?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: