Hacker News new | past | comments | ask | show | jobs | submit login

Is ARM really the reason the M1 is so much better than the current x86 CPUs? If you have a look at the competition (Qualcomm), it sounds to me that it's Apple expertise at designing a SoC and using a more advanced node that makes it such a good CPU, not really the architecture.

Said another way, I highly doubt we will see a non-Apple ARM CPU outclass the x86 competition in such a way anytime soon, so saying "never x86 again" might not be very wise. ARM is not a magic bullet.




100% agree.

A lot of people asking for ARM based machines do not really realize that the reason is not because of ARM, but they just want faster/more efficient machines. I think to make a analogy for HN frontend sw devs, imagine a non-technical marketing person going to a developer and saying "I want this website in AMP, because AMP is fast.". And sure, a lot of sites with AMP framework are fast... but that doesn't mean that you have to build a site in AMP to make it fast. You could build a simple site with similar principles and get it fast too. Similarly, not all ARM laptops are fast. Even if the laptops hw is fast, you want something that can efficiently run x86 code as well for compatibility purposes for many years.


It’s true that ARM alone isn’t the reason for the M1’s performance, but it’s definitely a significant factor. x86 is old — modern x86 chips are still backwards-compatible with the original 8086 from 1978 — and it’s stuck with plenty of design decisions that might have been the correct choice sometime within the past 45 years but no longer today. Whereas the M1 only implements AArch64, a complete redesign of the ARM architecture from 2012, so it doesn’t have to deal with legacy architectural baggage. (We’ve known x86 was the wrong design since the 80’s — hence why there’s no Intel chips in smartphones — but it hasn’t been realistic for anybody except Apple to spend 10 years and billions of dollars to make a high-performance non-x86 chip.)

Some examples:

- x86 guarantees strong memory ordering on multi-processor systems, which adds completely unnecessary overhead to every memory access. arm64 uses a weak memory model instead, providing atomic instructions with relaxed or acquire/release semantics (see https://youtu.be/KeLBd2EJLOU?t=28m19s for a more detailed discussion). This significantly improves performance all around the board, but especially with reference counting operations (which are extremely common and often a bottleneck in code written in ObjC/Swift): https://twitter.com/Catfish_Man/status/1326238434235568128

> fun fact: retaining and releasing an NSObject takes ~30 nanoseconds on current gen Intel, and ~6.5 nanoseconds on an M1

- x86 instruction decode is pretty awful, a significant bottleneck, and not parallelizable due to the haphazardly-designed variable-length CISC instruction set. arm64’s instruction set is highly regular and easy to decode, so Apple can decode up to 8 instructions per clock (as opposed to 4 for x86 chips). Most sources agree this is why the M1 can have such a big out-of-order-execution window and achieve such high instruction-level parallelism compared to Intel/AMD.

- x86_64 has only 16 architectural registers, compared to 32 for arm64. This means the compiler has a much harder time generating efficient, parallelizable code and must resort to spilling registers much more often.


The issue for me is that ARM is also really old now. I mean, just look at the ISA Apple has to use to run their MacOS on it: it's littered with NEON extensions and more cruft than you can shake a stick at. Simply put, Apple's implementation of ARM is decidedly CISC. On top of this, I'm still dumbfounded by the fact that they didn't go for a chiplet design where ARM could truly shine: if Apple had went the chiplet route, the M1 could have had a much higher IO ceiling and might have a shot at addressing more than 16 gigs of RAM.

Apple has a much bigger issue, though. ARM doesn't scale: it's a fundamental conceit of the architecture, one that a lot of people are probably willing to take on a laptop that will mostly be used for Twitter and YouTube. This presents issues for the rest of the market though, and it will be fascinating to see how Apple retains their pro userbase while missing out on the high-performance hardware sector entirely.

I think x86 is pretty terrible too, if it's any consolation, but really it's the only option you've got as a programmer in the 21st century. I hopped on the Raspberry Pi bandwagon when I was still in middle school, I grew up rooting for the little guy here. Looking out on the future landscape of computer hardware though, I really only see RISC-V. ARM is an improvement on x86, but I don't think it's profound enough to make people care. RISC-V, on the other hand, blows both of them out of the water. On consumer hardware, it's able to accelerate pretty much any workload while sipping a few mW. On professional hardware, you can strap a few hundred of those cores together and they'll work together to create highly complex pipelines for data processing. On server hardware, it will probably move like gangbusters. Even assuming that cloud providers pocket half the improvements, a 5x price/performance increase will have the business sector racing to support it.

So yeah, it is a pretty complex situation. Apple did a cool thing with the M1, but they have a long ways to go if they want to dethrone x86 in it's entirety.


Where to start?

> ARM is really old now.

Well aarch64 was announced in 2011 so not really that old.

> Apple’s implementation of ARM is decidedly CISC.

CISC is a description of the instruction set not the implementation.

> ARM doesn’t scale.

No idea what this means but you can get 128 core Arm CPUs and address huge amounts of memory but perhaps you have another definition of scaling.

And so on.


As far as I understand it, “CISC” doesn’t mean “has a lot of instructions”, it means the individual instructions are themselves complex/composable/expressing more than one hardware operation. For instance, on x86 you can write an instruction like ‘ADD [rax + 0x1234 + 8*rbx], rcx’ that performs a multi-step address calculation with two registers, reads from memory, adds a third register, and writes the result back to memory — and you can stick on prefix bytes to do even more things. ARM doesn’t have anything like that; it is a strict load/store architecture where every instruction is fixed-width with a regular format and either accesses memory or performs a computation on registers.

Stuff like hardware primitives for AES/SHA, or the FJCVTZS “JavaScript instruction” don’t make a processor CISC just because they’re specialized. They all encode trivial, single-cycle hardware operations that would otherwise be difficult to express in software (even though they may be a bit more specialized than something like “add”, they’re not any more complex). x86 is CISC because the instruction encoding is more complicated, specifying many hardware operations with one software instruction.

I’m not exactly sure whar all the “cruft” is in ARM that you’re referring to. The M1 only implements AArch64, which is less than 10 years old and is a completely new architecture that is not backwards-compatible with 32-bit ARM (it has been described as being closer to MIPS than to arm32). NEON doesn’t strike me as a good example of cruft because SIMD provides substantial performance gains for math-heavy programs, and in any case 10 years of cruft is much better than 45.

I’m curious as to why RISC-V is different or better? I don’t know much about RISC-V — but looking at the Wikipedia article, it just looks like a generic RISC similar to MIPS or AArch64 (and it’s a couple years older than AArch64 as well). Is there some sort of drastic design difference I’m missing?


The only advantage I’ve heard put forward for RISC-V on single threaded applications is the existence of compressed instructions - which could reduce cache misses albeit at the expense of a slightly more complex decoder. I’m a bit sceptical as to whether this is material though as cache sizes increase.

Of course the flexibility of the RISC-V model allows approaches such as that being pursued by Esperanto [1] with lots and lots of simpler cores.

[1] https://www.esperanto.ai/wp-content/uploads/2021/08/HC2021.E...


ARM had THUMB, which definitely improved performance back in the GameBoy days — but they dropped that with AArch64, so presumably they decided it wasn’t beneficial anymore.


Indeed and IIRC the increased code density got them into Nokia phones too.

I find it hard to believe that they didn't drop Thumb from AArch64 without a lot of analysis of the impact on performance.


> On top of this, I'm still dumbfounded by the fact that they didn't go for a chiplet design where ARM could truly shine: if Apple had went the chiplet route, the M1 could have had a much higher IO ceiling and might have a shot at addressing more than 16 gigs of RAM.

Remember that M1 is just a mobile SoC that work for iPad/MacBook Air. It's exceptionally great so people tend to confuse M1 is targeted higher end. 16GB max is fine for a mobile SoC in 2021. I can't wait M1X.


If you don't think arm can scale any further, why do you think x86 can? They could easily double all the specs in the "M2" and slap two+ of them into a mac pro.


ARM isn’t _the_ reason. It is a reason.

If we were to go back in time to before apple introduced it’s own SoC and before it had acquired chip design start ups, even before than ARM was on the table as that is what the iPhone ran.

RISC-V wasn’t a thing back then. So the alternatives were MIPS, Power or maybe a home grown instruction set.

So we just have to go with what the reality is now. The apple silicon has been well optimised for performance per watt. And it runs on an ARM instruction set that apple itself helped in the design of.


On the competition, like Qualcomm, many Apple engineers (IIRC several tens of them) from the chip design team left and formed a new company called Nuvia, which was then acquired by Qualcomm in the beginning of this year. Apple has had a big lead on SoCs for some years, but I wonder when/if Qualcomm’s acquisition will start beating Apple’s lead here.


Agreed that architecture is only a (possibly small) part but Arm’s business model and the other things that Arm brings to the table are also relevant. For example the small / large core approach that Apple uses hasn’t previously been an x86 feature (although clearly that is changing).

Also non Apple Arm CPUs do outclass x86 in a huge segment of the cpu marketplace - on smartphones and tablets.

I’d don’t think that anyone previously has thrown the amount of cash at laptop / desktop / server Arm designs that Intel spends on x86 designs - we are now seeing a number of firms having a serious go (AWS / Ampere / Qualcomm / Nvidia possibly) so with Intel fighting back on its process issues it will be an interesting few years!


I'm curious to see how Alder Lake's clone of the ARM big.LITTLE design performs. I wouldn't be surprised if at the very least it lets Intel beat out AMD in the mobile CPU space. (Which would be great simply because Ryzen 4000 laptops are near impossible to get)


ARM is not a magic bullet but the M1 stems from the trend of phone processors growing up to be competitive in the PC market.

Since almost (all?) mobile phones use ARM I can see a similar process of adapting other ARM processors to the PC market.

But of course Apple is so far ahead that we really don't know wether and when there will be a competitive alternative, so it may as well be a streamlined x64, or a new ISA, sure.

It seems to all depend on some very experienced CPU design teams and it seems that actually Apple lost some of them in the last year, I would track these news to get a better model of predicting the future than the ARM/x64 divide.


From listening to interviews of Apple tech folks, they also have done a lot of optimization of their CPUs based on calls frequently used by Mac OS. I don’t recall specific examples but the fact that they can control both the hardware and software allows them to profile and tune the system as a whole.


M1 has very fast memory and BUS access. It takes money and is not so sexy for marketing, but it works.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: