Certainly big for China, but if it can't run CUDA then it's not going to help th...

LeoPanthera · 2025-08-30T23:27:26 1756596446

Macs can't run CUDA but everyone is buying those for AI anyway.

jszymborski · 2025-08-30T23:31:17 1756596677

GP was likely referring to training, not inference.

wolfgangK · 2025-08-30T23:38:42 1756597122

Only those who don't care/know about prompt processing speed are buying Macs for LLM inference.

esseph · 2025-08-30T23:42:24 1756597344

Don't know and don't care are definitely things that I could be, but it also makes sense if they want to keep lookups private.

com2kid · 2025-08-31T00:56:47 1756601807

Even 40 tokens per second is plenty enough for real time usage. The average person reads at ~4 words per second, 40 tokens per second is going to be 15-20 words per second.

Even useful models like gemma3 27b are hitting 22 t/s on 4bit quants.

You aren't going to be reformatting gigabytes of PDFs or anything, but for a lot of common use cases, those speeds are fine.

aswanson · 2025-08-30T23:25:36 1756596336

Maybe they'll make a better abstraction layer than cuda.

chickenzzzzu · 2025-08-30T23:28:05 1756596485

Vulkan already achieves 95% of cuda, with the remaining 5% being scheduling.

sliken · 2025-08-31T02:26:38 1756607198

Dunno, training maybe, for inference pytorch and llama seem more important.

chermi · 2025-08-30T23:47:09 1756597629

I thought they were already working around cuda?