I may be out of date or wrong, but I recall when the M1 came out there was some ...

Remnant44 · 2025-08-25T23:58:32 1756166312

ARM instructions are fixed size, while x86 are variable. This makes a wide decoder fairly trivial for ARM, while it is complex and difficult for x86.

However, this doesn't really hold up as the cause for the difference. The Zen4/5 chips, for example, source the vast majority of their instructions out of their uOp trace cache, where the instructions have already been decoded. This also saves power - even on ARM, decoders take power.

People have been trying to figure out the "secret sauce" since the M chips have been introduced. In my opinion, it's a combination of:

1) The apple engineers did a superb job creating a well balanced architecture

2) Being close to their memory subsystem with lots of bandwidth and deep buffers so they can use it is great. For example, my old M2 Pro macbook has more than twice the memory bandwidth than the current best desktop CPU, the zen5 9950x. That's absurd, but here we are...

3) AMD and Intel heavily bias on the costly side of the watts vs performance curve. Even the compact zen cores are optimized more for area than wattage. I'm curious what a true low power zen core (akin to the apple e cores) would do.

mycall · 2025-08-26T02:24:11 1756175051

When limited to 5 watts, the Ryzen HX 370 works pretty darn well. In some low-power user cases, my GPD Pocket 4 is more power efficient than my M3 MBA.

aurareturn · 2025-08-26T06:33:03 1756189983

We are going to need to see some numbers for your claim. That’s not believable.

ZiiS · 2025-08-26T06:59:05 1756191545

A 8.8" screen takes a lot less power.

aurareturn · 2025-08-26T07:04:51 1756191891

When you say efficiency, I assume you’re factoring in performance of the device as well?

Maybe run Geekbench 6 and see.

ZiiS · 2025-08-26T11:26:32 1756207592

I am not the original commenter; but they said "low-power user cases" i.e. very much not when running Geekbench; rather when it is near idle.

mycall · 2025-08-31T22:57:02 1756681022

For my use case, I plug in a external SSD drive, setup an MP3 playlist, close the lid then put the computer into my backpack for low wattage listening. The M3 MBA uses about 50% more power than the P4 under this use case (usbc watt meter shows this).

aurareturn · 2025-08-26T11:41:41 1756208501

FYI, AMD chips are notoriously bad at idle.

happymellon · 2025-08-26T07:09:37 1756192177

We will need some citations on that as the GPD Pocket 4 isn't even the most power efficient pocket pc.

Closest I've seen is an uncited Reddit thread talking about usb c charging draw when running a task, conflating it with power usage.

mycall · 2025-08-31T22:59:19 1756681159

When the computers are at 100% charge, the usbc watt meter between device and charging brick is power draw.

mmcnl · 2025-08-26T06:50:15 1756191015

How about single-core performance?

ozgrakkurt · 2025-08-26T07:27:24 1756193244

But is the uOp trace cache free? It surely doesn’t magically decode and put stuff in there without cost

Remnant44 · 2025-08-26T07:47:46 1756194466

For sure.. for what it's worth though, I have run across several references to arm also implementing uop caches as a power optimization versus just running the decoders, so I'm inclined to say that whatever it's cost it pays for itself. I am not a chip designer though!

hajile · 2025-08-26T18:19:06 1756232346

Apple never used a uop cache in their designs. ARM dropped uop caches when they removed 32-bit support. Qualcomm also skipped uop cache.

uop made sense with 32-bit support because the 32-bit ISA was so complex (though still simple compared to x86). Once they went to a simplified instruction design, the cost to decode every single time was lower than the cost of maintaining the uop cache.

saati · 2025-08-26T07:43:22 1756194202

Zens don't have a trace cache, just an uop cache.

astrange · 2025-08-26T03:58:28 1756180708

They can always catch up, it may just take a while. x86's variable size instructions have performance advantages because they fit in cache better.

ARM has better /security/ though - not only does it have more modern features but eg variable length instructions also mean you can reinterpret them by jumping into the middle of one.

scarface_74 · 2025-08-25T23:44:47 1756165487

No one ever said that. The M1 was not the fastest laptop when it was introduced. It was a nice balance of speed/battery life/heat