Well, that’s disappointing since the Mac Studio 128GB is $3,499. If Apple happens to launch a Mac Mini with 128GB RAM it would eat Nvidia Sparks’ lunch every day.
Only if it runs CUDA, MLX / Metal isn't comparable as ecosystem.
People that keep pushing for Apple gear tend to forget Apple has decided what industry considers industry standards, proprietary or not, aren't made available on their hardware.
Even if Metal is actually a cool API to program for.
It depends what you're doing. I can get valuable work done with the subset of Torch supported on MPS and I'm grateful for the speed and RAM of modern Mac systems. JAX support is worse but hopefully both continue to develop.
FYI you should have used llama.cpp to do the benchmarks. It performs almost 20x faster than ollama for the gpt-oss-120b model. Here are some samples results on my spark:
Is this the full weight model or quantized version? The GGUFs distributed on Hugging Face labeled as MXFP4 quantization have layers that are quantized to int8 (q8_0) instead of bf16 as suggested by OpenAI.
Example looking at blk.0.attn_k.weight, it's q8_0 amongst other layers: