I know Nvidia has never really care much about TDP, but this still seems unbelie...

schmuelio · on March 9, 2022

Unless the M1 Ultra is actually magic I don't think it is possible.

My guess is they're putting a lot of weight on the phrase "relative power" and hoping you assume it means "relative to each other" and not "relative to their previous generation" (i.e. M1 Ultra -> M1 Max and RTX 3090 -> RTX 2080Ti) or "relative to the stock power profile".

Put bluntly, if the M1 Ultra was capable of achieving performance parity with an RTX 3090 for any GPU-style benchmark then Nvidia (who are experts in making GPUs) would have captured this additional performance. Bear in mind the claim seems to be (on the surface) that the M1 Ultra is achieving with 64 GPU cores and 800GB/s memory bandwidth what the RTX 3090 is achieving with 10,496 GPU cores and 936.2GB/s memory bandwidth.

kllrnohj · on March 10, 2022

It's actually kinda magic but in the opposite way. The M1 Ultra is absolutely massive. 114 billion transistors. The 3090? A measely 28 billion. Now granted the M1 also has some CPU cores on there but even still, it seems safe to say that the M1 Ultra has more transistors spent on the GPU than a 3090 does. More transistors does often mean more faster when it comes to GPUs.

But you'll all but certainly see the 3090 win more benchmarks (and by a landslide) than the M1 Ultra does. Because Nvidia is really, really fucking good at this, and they spend an absurd amount of money working with external projects to fix their stuff. Like contributing a TensorFlow backend for CUDA. Or tons of optimizations in the driver to handle game-specific issues.

Meanwhile Apple is mostly in the camp of "well we built Metal2, what's taking ya'll so long to port it to our tiny marketshare platform that historically had terrible GPU drivers?"

schmuelio · on March 22, 2022

Given things like the AMD Epyc cores sit at around 40 billion transistors, I would assume a (roughly) even split.

It is also worth noting that the M1 Ultra is an SoC so it'll have more than just CPU/GPU on it, by the looks of things it has some hefty amounts of cache, it'll also have a few IP blocks like a PCIe controller, memory controller, SSD controller (the current "SSDs" look to just be raw storage modules).

All told it likely still has somewhere in the region of 30-40 billion transistors for the GPU. Each GPU core being physically bigger than the 3090 is probably pretty good for some workflows and not so good for others. Generally GPUs benefit from having a huge number of tiny cores for processing in parallel, rather than a small number of massive cores.

Current benchmarks put it at roughly the performance of an RTX 3070, which is good for its power consumption, but not even close to the 3090. As I mentioned in the previous post, it just doesn't have the cores or memory bandwidth needed for the types of workloads that GPUs are built for (although unified memory being physically closer can help here ofc.), certainly not enough to make it a competitor for something like a 3090.

Edit: Oh also, for massively parallel workloads (like what GPUs do), more cores and better bandwidth to feed those cores will be one of the biggest performance drivers. You can get more performance by making those cores bigger (and therefore faster) but you need to crank the transistor count up a _lot_ to match the kinds of throughput that many tiny cores can do.