Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, that’s disappointing since the Mac Studio 128GB is $3,499. If Apple happens to launch a Mac Mini with 128GB RAM it would eat Nvidia Sparks’ lunch every day.




Only if it runs CUDA, MLX / Metal isn't comparable as ecosystem.

People that keep pushing for Apple gear tend to forget Apple has decided what industry considers industry standards, proprietary or not, aren't made available on their hardware.

Even if Metal is actually a cool API to program for.


It depends what you're doing. I can get valuable work done with the subset of Torch supported on MPS and I'm grateful for the speed and RAM of modern Mac systems. JAX support is worse but hopefully both continue to develop.

CUDA is equally proprietary and not an industry standard though, unless you were thinking of Vulcan/OpenCL which doesn’t bring much in this situation.

Yes it is an industry standard, there is even a technical term for it.

It is called De facto standard, which you can check in your favourite dictionary.


CUDA isn't the industry standard? What is then?

Agreed. I also wonder why they chose to test against a Mac Studio with only 64GB instead of 128GB.

Hi, author here. I crowd-sourced the devices for benchmarking from my friends. It just happened that one of my friend has this device.

FYI you should have used llama.cpp to do the benchmarks. It performs almost 20x faster than ollama for the gpt-oss-120b model. Here are some samples results on my spark:

  ggml_cuda_init: found 1 CUDA devices:
    Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes
  | model                          |       size |     params | backend    | ngl | n_ubatch | fa |            test |                  t/s |
  | ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | --------------: | -------------------: |
  | gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | CUDA       |  99 |     2048 |  1 |          pp4096 |       3564.31 ± 9.91 |
  | gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | CUDA       |  99 |     2048 |  1 |            tg32 |         53.93 ± 1.71 |
  | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | CUDA       |  99 |     2048 |  1 |          pp4096 |      1792.32 ± 34.74 |
  | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | CUDA       |  99 |     2048 |  1 |            tg32 |         38.54 ± 3.10 |

Is this the full weight model or quantized version? The GGUFs distributed on Hugging Face labeled as MXFP4 quantization have layers that are quantized to int8 (q8_0) instead of bf16 as suggested by OpenAI.

Example looking at blk.0.attn_k.weight, it's q8_0 amongst other layers:

https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/tree/main?s...

Example looking at the same weight on Ollama is BF16:

https://ollama.com/library/gpt-oss:20b/blobs/e7b273f96360


I see! Do you know what's causing the slowdown for ollama? They should be using the same backend..

Dude, ggerganov is the creator of llama.cpp. Kind of a legend. And of course he is right, you should've used llama.cpp.

Or you can just ask the ollama people about the ollama problems. Ollama is (or was) just a Go wrapper around llama.cpp.


Was. They've been diverging.

Now this looks much more interesting! Is the top one input tokens and the second one output tokens?

So 38.54 t/s on 120B? Have you tested filling the context too?



Makes sense you have one of the boxes. What's your take on it? [Respecting any NDAs/etc/etc of course]

Curious to how this compares to running on a Mac.

TTFT on a Mac is terrible and only increases as the context increases, thats why many are selling their M3 Ultra 512GB

So so many… eBay search shows only 15 results, 6 of them being ads for new systems…

https://www.ebay.com/sch/i.html?_nkw=mac+studio+m3+ultra+512...


Just don't try to run a NCCL

Wouldn't you be able to test nccl if you had 2 of these?

What kind of NCCL testing are you thinking about? Always curious what’s hardest to validate in people’s setups.

Not with Mac studio(s), but yes multi host NCCL over RoCE with two DGX Sparks or over PCI with one



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: