It's not even clear that the M1's big leap is due to ARM vs x86 rather than say ...

mlyle · on Dec 20, 2020

Specialized accelerators doesn't explain it, because we're measuring a lot of general purpose CPU tasks for the most part.

Big/little is good for power consumption, not so much for performance which is still good.

There's a lot of microarchitectural goodness here beyond ARM, though. Apple's got lots of little details right, and fat connection to memory helps, too. It doesn't hurt to be on leading fab, too.

vinay_ys · on Dec 20, 2020

Having dedicated silicon for most frequently used primitives (specialized accelerators) helps in getting those out of the way for the main core's pipeline execution to run predictably fast.

mlyle · on Dec 20, 2020

That makes no sense here. For compilation workloads and a lot of these other tests where we're showing benefits, basically any machine is able to give well over 99% of CPU to the task. Just how exactly do you think that having any dedicated silicon is helping clang compile benchmarks, etc?

floatboth · on Dec 20, 2020

> fat connection to memory

It's the same "thickness" as desktop (or good laptop) DDR4, i.e. 128 bits. Apple is running a very high clock though, and with other manufacturers, even for laptops with soldered memory they were quite conservative with memory tuning, basically running JEDEC spec. Maybe now they'll feel the kick in the ass and overclock their RAM already.

enos_feedler · on Dec 20, 2020

The top level things like process node, ISA and memory controller are big. But a lot of boils down to being able to shape the entire chip design exclusively around system level traces of real mac workloads. Intel needs to factor so many different kinds of traces into their chip design. Even windows vs apple makes a huge difference.

tsimionescu · on Dec 20, 2020

So your prediction is that the chip will be bad at running Linux and windows?

To me it seems a priori quite unlikely that the patterns of MacOS, windows and Linux are so different that this would be a major win. There may be a few specific things, but any CPU that prioritizes to much for some particular os would have big problems running CPU-intensive user-space-only workloads.

Gibbon1 · on Dec 20, 2020

I'm out of my league here but I've seen references to 8 bit cores that can run at a couple of giga instructions a second. It's hard to understate the performance vs power cores like that are cable of. Also sub nanosecond interrupt latency.

Think a small coprocessor with local memory that's pulling commands out of a queue and managing an io controller. Couple of wins, lower power consumption, fewer context switches, and cache pressure.