
ARM processors like A12X are nearing performance parity with desktop processors - kristianp
https://reveried.com/article/arm-processors-nearing-performance-parity-with-x86
======
phire
CISC vs RISC is almost completely irrelevant these days.

They both borrowed so many ideas from each other that the architectures are
nearly identical these days. Neither modern ARM or modern x86 architectures
deserve to be called RISC or CISC.

Still, there are a few areas where the A64 architecture is theoretically
"better" than x64, due to the lack of legacy.

The first is instruction decoding, x86 had to deal with a whole bunch of weird
instruction length modifiers, prefix bytes, weird MOD RM encodings and years
of extensions. To decode the 4 or 5 instructions per cycle that modern out of
order microarchitectures demand, Intel CPUs have to attempt decoding
instructions at every single byte offset in a 16 byte buffer, throwing away
unwanted decodings. That's got to waste transistor and power budgets. In
comparison, when an ARM CPU is in A64 mode, all instructions are the same
length, making decoding multiple instructions per cycle trivial.

The second area is memory consistency guarantees. x64 has relatively strong
guarantees, allowing simpler assembly code but at the cost of more complex
memory subsystem between cores. A64 has much weaker ordering guarantees, which
saves on hardware complexity, but required the programmer (and/or compiler) to
insert memory fences whenever stricter ordering is required.

This is all theoretical, I have no idea what difference this makes at the
practical level.

~~~
ajross
> Still, there are a few areas where the A64 architecture is theoretically
> "better" than x64, due to the lack of legacy.

Every extant A64 processor needs to deal with ARMv8 instructions, thumb
instructions, thumb2 instructions, shift encoding, microcoded multiple
load/store instructions... It's true that the subset of the ISA that makes up
the overwhelming bulk of instructions actually executed is very simple. But
frankly that's true of x86 as well.

> when an ARM CPU is in A64 mode

That's... not the way hardware works. Those transistors are still there, it's
not like you can make them go away by switching "modes". They still are ready
to switch every cycle, and in any case having transistors that "aren't needed"
doesn't make anything faster, because they all execute in parallel (or at
worst in an extra pipeline stage or two) anyway.

Either you have a simple architecture or you don't. A64 is a simple ISA. Real
CPUs have complicated legacy architectures.

~~~
repiret
Its not meaningful to distinguish Thumb and Thumb2 - Thumb2 was just a bunch
of new instructions added to Thumb.

A typical ARMv8-A processor needs to decode the A64, A32 and T32 instruction
sets, but that doesn't strike me as a significant burden. A32 and T32 are
essentially just different encodings of the same instruction set - there are
very few A32 or T32 instructions that don't have an equal in the other
instruction set. A64 has more and wider registers, but is otherwise broadly
similar in capabilities. I would expect that most ARMv8-A implementations
unify the three instruction sets very early in the decoding process.

Retaining the system level aspects of AArch32 strikes me as more expensive,
especially support for short page tables, the subtly different system register
layout, the more complex relationship between PSTATE.M and the security state,
and the banked system registers between the secure and non-secure states. I'm
surprised ARM-designed cores haven't pushed harder to eliminate AArch32 at the
higher exception levels (although I'm aware of some cores designed by ARM
Architecture licensees that do so). Perhaps that ARM has retained AArch32 all
the way up to EL3 is evidence that they believe doing so isn't very expensive.

~~~
als0
> I'm surprised ARM-designed cores haven't pushed harder to eliminate AArch32
> at the higher exception levels

It seems like they are very slowly phasing it out and heading to 64-bit only.
A32 ISA is only supported in userspace mode (EL0) on the Cortex-A76[1].

[1]
[https://www.theregister.co.uk/2018/05/31/arm_cortex_a76/](https://www.theregister.co.uk/2018/05/31/arm_cortex_a76/)

------
clarry
Ok, where is the benchmark that shows the performance parity? All I've ever
seen is comparisons to low power mobile chips. And then extrapolation from
there.

You can't just extrapolate and assume your 2W (or whatever) phone CPU will be
like a 95W desktop if you just stick on a heatsink & fan and feed higher
voltage & clocks to it.

If it were that simple, the CPU manufacturers could fire a whole lot of
engineers.

It's like, you know, my Honda Accord is reaching performance parity with your
Bugatti. (If I extrapolate based on how much I think I'd get performance by
sticking in a big turbo and new exhaust, intercooler, and higher RPM redline.
It's that simple, right?! No, actually it's not..)

~~~
klodolph
Intel and ARM are just approaching the same points from different directions,
that's all.

We can equally say that you can't extrapolate and assume that your 45W x86,
with lowered clock and voltage, will fit in the power budget of a mobile phone
and still give good performance. But all these companies are throwing a lot of
engineering effort at it and making solid progress. There's not some reason
why x86, the architecture, _should_ be faster than ARM. Intel has been
enjoying a process advantage for many years now, and an enviable R&D budget,
but that R&D budget is spent fighting against diminishing returns and in the
meantime, competitors are catching up. As long as everyone can buy roughly the
same node (a big "if", but looks like we're close enough), diminishing returns
will bring everyone closer to parity.

But, benchmarks show this a bit better. Geekbench (I'm looking at Single
Core):

[https://browser.geekbench.com/mac-
benchmarks](https://browser.geekbench.com/mac-benchmarks)

[https://browser.geekbench.com/ios-
benchmarks](https://browser.geekbench.com/ios-benchmarks)

iPad Pro 11-inch (iPad8,1) has the A12X at 2.5 GHz, score ~5000.

iMac 27-inch retina (iMac18,3) has the i7-7700K at 4.2 GHz, score ~5700.

This is not a cherry-picked comparison... this is just a comparison of
whatever happens to be top of the line in both categories. Note that multi-
core benchmarks will paint a slightly different picture, but that's very
natural, since you can get a Mac with 18 core Xeons. Presumably, adding more
cores to a mobile processor when you switch to desktop and can handle
proportionally higher TDP, while not trivial, is not especially difficult
either.

I assume that different benchmarks give different results as well. This is
just one benchmark I know has results for both platforms.

~~~
MR4D
You make a great point. I remember when I got my 12.9" iPad Pro (gen 2), and
the graphics were fast as hell - Waaaaaay faster than my poor Mac Mini could
do.

Based on that (and yes, both devices are now "old"), I'd trade my Mac Mini
performance for the performance of my iPad without hesitation. Bring it, ARM !

~~~
HeWhoLurksLate
I was going to post a text-heavy reply to you about that, but I decided not
to. Essentially, my family got an iPad 4 and bluetooth keyboard case. A few
years later, when Apple was prepping for their iPad Pro and positioning
tablets as desktop replacements, I found that the work that I did was much
easier on the iPad than it was on the Mac Mini (quad-core) that we had,
including doing text editing and stuff of the sort.

Personally, if someone put an ARM laptop in front of me with a workable, non-
spyware OS, I would take it in a heartbeat.

------
ch_123
I really think people put more focus in the RISC vs CISC dichotomy than it
deserves. ARMv8 CPUs have microcode, a mixture of instruction sizes,
instructions which take more than one clock cycle to complete, and a very
large number of instructions overall - i.e. probably doesn't fit into most
traditional definitions of RISC.

The challenges that ARM face while competing with x86 are software maturity,
and moving away from low-cost low-power designs towards larger and more
performant design (which will consume more power and cost more than
traditional ARM designs)

~~~
baybal2
I am frequently asked about that. This is what I tell: ARM feels free to do a
complete ISA revision from ground up once or twice a decade, x86 on other hand
is still bound to incremental improvement on a frail foundation laid in
seventies. X86 only had three major revisions 16 bit to 32 bit and to 64 bit
(the last one wasn't even done by Intel itself.)

x86 biggest weakness is its dependence on wintel that precludes them from the
very needed major isa revisions.

If you look at atom dies, the overcomplicated decoder and other x86 vestiges
take more area than the rest of the core.

A12x is remarkable in that it gets close to 15 watt Intel CPUs with _LESS_ die
area and lower power consumption. And if you remove the useless things like
NPU, DRM stuff, security coprocessor, and other useless peripherals from
calculation, the comparison will really begin to look dire for Intel.

Adding to that, even if we take into consideration that Intel is still on 14nm
and A12x is a 7nm part, A12x still wins even if Intel will make a die shrink
on 7nm. And you also have to consider that Intel has squeezed all and
everything in terms of power efficiency from 14nm node after 5 years of active
development on it, while Apple really went for the very first baseline
revision of TSMC 7nm.

Moreover, what I hear from the scene here in Shenzhen is that in A12x Apple
did not really put much into power saving: A12x has nothing comparable to
Intel's complex runtime power management, power and clock gating, separate
power domains, and on-package smart dc-dc converters. If Apple will commit
itself to squeezing more power efficiency from their chips with equal zeal, I
believe they add additional 25-35% to their power advantage.

~~~
userbinator
_If you look at atom dies, the overcomplicated decoder and other x86 vestiges
take more area than the rest of the core._

The Atom is a bit of an edge-case since it has barely any cache (and the
performance is exactly what you'd expect from that), yet it still takes up a
significant amount of the die; in all other CPUs, the caches are far bigger.

------
robocat
From a CloudFlare article: "In our analysis, we found that even if Intel gave
us the chips for free, it would still make sense to switch to ARM, because the
power efficiency is so much better."

[https://www.datacenterknowledge.com/design/cloudflare-
bets-a...](https://www.datacenterknowledge.com/design/cloudflare-bets-arm-
servers-it-expands-its-data-center-network)

~~~
incompatible
Any tips on good/cheap ARM-based boards for a desktop system that would pay
for themselves in electricity savings?

~~~
robocat
CloudFlare have unusual needs where ARM is very competitive, unlike desktop
usage. From article:

“Every request that comes in to Cloudflare is independent of every other
request, so what we really need is as many cores per Watt as we can possibly
get,” Prince explained. “The only metric we spend time thinking about is cores
per Watt and requests per Watt.” The ARM-based Qualcomm Centriq processors
perform very well by that measure. “They've got very high core counts at very
lower power utilization in Gen 1, and in Gen 2 they're just going to widen
their lead.”

~~~
snaky
So the ideal arch for that would be non-CPU raw state machine, like Silego
GreenPAK5 ASM cores, but they would better ask Dialog to remove all the
peripherals, make state machines a bit bigger, and pack a thousands of ASM
cores into one chip. And then sign NDA, get low-level proprietary format docs,
and write the state machine code generator in Haskell or Coq.

~~~
fb03
Sounds fun, but still on your hypothetical scenario: all that complexity and
proprietary cores and ndas would essentially tie them to the supplier of
chips, essentially gagging the evolution of the whole CF architecture in the
long run.

If the race is for request por watt alone, then you're probably right, but
there's always real-world grittiness that needs to be addresses. that's how
Google succeeded right? leveraging common platforms and hardware to their
highly specialized software combo.

------
chx
The title is misleading:

> ARM processors like the A12X Bionic are nearing performance parity with
> high-end desktop processors

the reality is

> ARM processors from Apple like the A12X Bionic are nearing performance
> parity with high-end desktop processors

There are no other ARM CPUs that are this fast. Not even close. Ye, geekbench
is not the best but still, if you look at
[https://browser.geekbench.com/android-
benchmarks](https://browser.geekbench.com/android-benchmarks) vs
[https://browser.geekbench.com/ios-
benchmarks](https://browser.geekbench.com/ios-benchmarks) the difference is
staggering, in multicore the difference at the top is close to 100%. And no, I
am not an Apple fanboy, couldn't be further from the truth, see
[http://drupal4hu.com/future/freedom](http://drupal4hu.com/future/freedom) my
post from almost a decade ago.

------
jbk
From what we are seeing with the AV1 video decoder dav1d, where we wrote a lot
of assembly by hand, the A12X can do 40 fps when a desktop can go beyond
120fps. (4cores)

So, it is getting closer than ever before, but we're still quite far to
closing the gap.

~~~
ynniv
You're probably not using a desktop CPU with the same wattage. If Apple makes
a 40 watt ARM laptop CPU, it might behave similarly to the x86.

~~~
SomeHacker44
I am not a hardware engineer, but my guess is that making a high performing
single digit Watt processor may Take different skills than doing the same
thing than a 15-28-45-90-135-160W (or more) sustained TDP CPU.

My only point is that assuming Apple makes the best low power CPUs should not
necessarily imply they can make good high power CPUs. They may have to build
up the competency over time just as they did in the initial iterations of the
A series.

~~~
Symmetry
Once you're making wide out of order application processors the skills are
pretty much the same for either. But it does take quite a while to do a new
architecture from scratch and you would almost need to do just that to re-
design the A12 for such high power targets.

------
sliken
There's quite a few fallacies in this article. The worst offenders being that
performance scales with clockrate, that different CPU designs can scale to the
same clock rate, and that power use/heat is anywhere near linear in respect to
clock rate.

However it is still impressive that the A12X could hit 80% of a fairly
aggressive Intel design at 60% of the same block.

Certainly active cooling would help ARM chips sustain the performance they can
get for short periods of time without active cooling.

So Arm is closing the gap and the A12x is a pretty impressive chip. Certainly
plenty for many use cases met today by Intel Desktops/Laptops.

But to hit the same clock speeds Intel's using might well require Apple to add
an extra stage in the pipeline and/or running cache at a lower fraction of the
CPU clock. Not to mention increasing clock speeds without decreasing memory
latency will also hurt IPC. Any of these changes would hurt IPC and make it
that much harder to reach 100% parity with Intel.

~~~
gpderetta
> Arm is closing the gap and the A12x

s/ARM/Apple/ really. They have a great chip development team, but they are
focusing, obviously, on mobile and do not seem to be interested yet in
desktop, and even less in servers, except possible as a byproduct of their
mobile development.

~~~
pault
> but they are focusing, obviously, on mobile

Isn't it pretty much taken as a given that Apple is going to produce an ARM
powered laptop in the next couple of years?

~~~
monocasa
It's not an obvious win like the last couple of architecture switches because
of the end of Moore's law. Like the PowerPC chips they were initially using
ran 68k code quicker in a non-JITing emulator than any 68k they could buy.
They had to switch to a JIT with Rosetta, but that same perf distinction was
still true for the high end for PowerPC/Intel during their switch.

Running x86 code faster in an emulator than on a real chip might not ever
happen.

And in not too long the x86-64 patents will have expired all the way through
SSE4... I think Apple making their own x86 chips is just as likely as
switching to ARM.

------
imtringued
I really don't understand the logic behind this article. First you test two
chips one trimmed for sustained loads and then one trimmed for short bursts.
You pick tests that prioritize short bursts and do not require cooling which
then of course result in similar performance numbers. The gap in power/cooling
requirements is then considered impressive but then you turn around and
extrapolate how much faster or better the ARM chip could be if it had the same
level of cooling as the desktop chip. Except this completely defies the logic
in the first part of the paragraph that the power and cooling do not affect
the performance in the short burst tests. Those ARM chips won't get faster,
they will just have the same speed as they have today.

~~~
w0utert
>> _Except this completely defies the logic in the first part of the paragraph
that the power and cooling do not affect the performance in the short burst
tests. Those ARM chips won 't get faster, they will just have the same speed
as they have today._

I think the logic is that with better cooling, you could have the same ARM
chip running at higher voltage and clock speeds, under sustained load, and the
result would compare favorably for the ARM chip both on 'burst performance'
and 'sustained performance' metrics.

What makes you think this is not true? Is there anything in fast ARM chip
designs that makes them only optimized for burst loads and hence inherently
unusable for sustained loads? In a sense, you could make the same argument for
x86 desktop CPU's, seeing they are also not able to maintain boost clocks very
long for sustained loads either.

The article specifically addresses this point: current ARM chips are mostly
held back by the passive cooling of phones and tablets, which is a property of
the device itself and not of the possible performance you could theoretically
get from the CPU.

~~~
LoSboccacc
> the logic is that with better cooling, you could have the same ARM chip
> running at higher voltage and clock speeds

so it's netburst all over again, see how _that_ turned out.

~~~
detaro
No, the fact that small, passive-cooled devices have to be thermally limited,
and the same chips with better cooling can run at high speeds longer is not
"Netburst all over again".

~~~
LoSboccacc
but that's the whole point ain't it? there's zero indication that making
cooling better would make the chip faster.

higher voltage could, cooling being a consequence. at best cooling could stave
off throttling.

but increasing voltage at 7nm gets you massive leakage currents real fast so
you hit a wall in scaling that's not just about cooling it better.

so yes, basically all the misconception that netburst had about power, density
and scaling, reproposing themselves as "but no! it's about the cooling"

basically everyone defending this is assuming power, cooling, clock speed and
transistor density are independent variables, which aren't.

> "the fact"

yeah.

~~~
detaro
Look at modern laptops/tablets/mini-PCs using Intel CPUs. The exact same part,
with different cooling systems and accordingly set power profiles, is used in
different devices, with performance differing accordingly. Variants of
throttling (or temporary boost, but that's the same with different names)
based on cooling performance happen in lots of compact devices (x86 or ARM),
which better cooling can delay or completely remove. Of course there's limits
to that, and a 5W part won't just scale up to a good 50W part, but there is
room there. These approaches weren't really a thing for Netburst, which was
firmly a desktop architecture from the start and got pushed higher and higher.

~~~
LoSboccacc
> The exact same part

they have less cache on board, less cores, different integrated gpu; they
don't support higher frequency ddr and they have limits on the memory
bandwidth.

are you sure you don't want to check in with any of those facts of yours
before pursuing further conversation?

modern cpu architecture are built to fit their own constraints maximally. once
you start changing voltage and make some part of the cpu hotter than the
original spec you'll might very well find out whole part of the chip need to
be shifted around or redesigned to spread the load differently.

of course a slower chip is better performing watt for watt, the whole point is
that the relationship is not linear! that doesn't mean you can just upclock
the chip adding cooling, neither that a upclocking a chip won't require a
significant redesign.

those parts are already pushing their envelope, or are you implying Apple is
specifically wasting money on their chips?

this seems a good time to remind how clock speed and chip features are
intertwined with yeld and impurities
[https://en.wikipedia.org/wiki/Overclocking#Factors_allowing_...](https://en.wikipedia.org/wiki/Overclocking#Factors_allowing_overclocking)

it's not like "just add cooling to the cpu and it'll tolerate a higher
voltage" \- not at all.

~~~
w0utert
_> > those parts are already pushing their envelope, or are you implying Apple
is specifically wasting money on their chips?_

I would say the envelope Apple is pushing with their designs is currently
_almost exclusively_ bound by the working environment their SoC's run in:
limited cooling and limited battery. They probably spend more time optimizing
their software to make more efficient use of their chips than optimizing their
cooling solution, because there simply is no room for fans and airflow.

That does not say anything about how suitable these chips _could_ be with
better cooling though. The fact that they are optimized for low power does not
mean they cannot, or be redesigned minimally, to run at higher clock speeds.
In fact, that's _exactly_ what Apple is already doing, by using virtually
identical variations of their SoC's across iPhone, iPad and AppleTV, running
at different clock speeds.

 _> > are you sure you don't want to check in with any of those facts of yours
before pursuing further conversation?

>> this seems a good time to remind how clock speed and chip features are
intertwined with yeld and impurities_

I don't know why you need to be so dismissive and agressive in your comments,
especially since so far you have not brought up anything countering any of the
arguments made by anyone else in this thread.

Maybe you can address the observation already made by dotaro about Intel
making literally 20 different variations of the same CPU's, scaling from the
ULV end with low clock speeds and limited cooling options, all the way up to
the HPC end where clock speeds, TDP, etc. are large multiples of what goes
into the ULV parts? What makes you think this is only possible with x86 chips
and not with the ARM-based designs Apple uses? Do you think each variation of
an Intel x86 chip from the same generation is a completely different design
that was built from the ground up to fit that particular use case?

You seem to be stuck on equating having the option of a better cooling
solution so you can push the design of e.g. an A12 chip to higher clock speeds
and close the already small gap with x86 chips, with going full-scale Netburst
architecture with low IPC compensated by crazy clock speeds and ultra-deep
pipelines, by means of nothing more than pushing an imaginary turbo button and
call it a day. Nobody suggested that but yourself.

~~~
0815test
> Maybe you can address the observation already made by dotaro about Intel
> making literally 20 different variations of the same CPU's, scaling from the
> ULV end with low clock speeds and limited cooling options, all the way up to
> the HPC end where clock speeds, TDP, etc. are large multiples of what goes
> into the ULV parts?

Just because you _can_ scale down an HEDT- or HPC-focused design to the point
of making it run as a ULV chip under very challenging thermals, doesn't mean
that the resulting chip will perform very well. We've seen this time and time
again with x86 vendors trying and failing to enter the lucrative "mobile"
segment. And it's not clear why we should expect a different outcome when
mobile-focused vendors try the reverse play, by attempting to "scale up" their
existing designs. One size very much doesn't fit all in the semiconductor
industry.

~~~
Symmetry
Exactly. Part of the reason that Apple's chips do so well against Intel at a
given power level is that there Apple is operating at the frequencies their
chips are designed for whereas Intel is far away from their sweet spot. You
can cover a variety of power targets with one microarchitecture but you're
going to do so less efficiently when you're far away form your design point.

And while I bet you could overclock an Apple core somewhat if you used liquid
nitrogen or whatever it will still have more logic between clock latches than
an Intel processor does. That deeper pipelining means that Intel will be able
to clock higher than you can for any given process/voltage/temperature
combination. Apple has some very talented CPU architects and I'm sure they
could design a high performance chip. But it won't be the same one that runs
in iPhones.

------
DCKing
The article has its problems, but regardless of that, it seems clear that
Apple has had the technology to launch desktop class hardware based on ARM for
some time now. I'm not just talking about laptops - Apple's microarchitecture
expertise makes it entirely likely that their fabled next Mac Pro is entirely
ARM-based. If you vastly expand the power, die size and IO constraints on this
microarchitecture, it seems to easily make it a better choice than Xeons based
on sheer performance.

It's not as easy as it sounds, both technically and business wise (does Apple
have enough economy of scale just on the high-end desktop compared to Intel?
Doubtful.), but it's entirely feasible.

~~~
dm3730
> Apple's microarchitecture expertise makes it entirely likely that their
> fabled next Mac Pro is entirely ARM-based

I would like to understand this statement better. In 2012, I read this article
saying ARM chips were matching x86 chips and that we'd be seeing ARM desktops,
ARM servers, and ARM laptops within 2-3 years.
[https://liliputing.com/2012/02/fastest-arm-chips-are-
compara...](https://liliputing.com/2012/02/fastest-arm-chips-are-comparable-
to-intels-slowest-atom-chips.html)

But it is now, 2019. Aside from my phone, access point, and tablet which are
ARM based, everything else is still x86-64. Am I an outlier? What data leads
you to believe that "entirely likely that their fabled next Mac Pro is
entirely ARM-based"?

> does Apple have enough economy of scale just on the high-end desktop
> compared to Intel? Doubtful

I don't understand this statement. Is the volume of chips "manufactured" by
Apple significantly lower than the volume of chips manufactured by Intel?

~~~
DCKing
I would argue that x86's dominance is actually Intel's dominance. Intel just
has a very good tactical position in terms of high-performance hardware. Intel
is a better source for hardware reliability, platform reliability, logistics
and being able to deliver in volume, mostly already has existing contracts in
place, and is still one of the leaders in performance. Even AMD - which makes
x86 hardware and should be the most easy to transition to - has trouble
getting more than single digit market share gains despite making a mostly
superior product performance wise for the first time in more than a decade.

Apple is affected the least by this Intel lock-in, since they are the biggest
seller of high-performance hardware to consumers and are capable of doing
their own support infrastructure. Moreover, they have been heavily investing
and building up a very successful own processor unit themselves. Finally,
Apple has famously transitioned their entire hardware platform multiple times
already when it felt like their current hardware plaform didn't suit them
strategically. They've shown to be capable of supporting their own oddball
hardware platforms before Intel, when they were a lot smaller still. Given
Apple's level of control over their own technology and Intel's recent
stagnation, I think it's very likely Apple _wants to move in this direction_
if they are capable of doing it from a business perspective.

The Mac Pro might be a bold product to begin with since it's so focused on
professional users and Apple is really peculiar about having a bold vision an
the high-end desktop. It would also be clear signal of their strategy to
transition the Mac Pro to this new architecutre. The reason I threw in that
sentence about economy of scale is that the high-end desktop is a small market
for Apple, and it might not be worth for them to make large workstation class
chips for such a market. Especially considering that Intel's high-end Xeon
W-line is based on the exact same silicon as Intel's high end servers, and
therefore Intel just makes a lot more high end chips than Apple would.

Of course, Apple would be able to work around this with a chiplet architecture
like AMD is doing, but I feel that we're already too far in speculation
territory.

------
throw2016
Anything serious on arm runs the risk of throttling. Upping performance will
see more consumption and power usage.

The even bigger problem is ARMs closed nature with no support for the open off
the shelf culture that has made the PC industry what it is today. ARM is about
closed SOCs, closed drivers and closed vendors and this in effect closes up
the drivers and software ecosystem.

Something like Linux and the open source movement would not have happened with
this hardware model, and it a paradox of our times that it is Linux developed
because of the open culture of x86 that is used to support this closed model.
There is something ironic even parasitic about this.

There are large forces of centralization and control currently in play and
getting excited about a closed ecosystem becoming mainstream seems
shortsighted for the tech ecosystem and consumers who have benefited from
widespread choice, competition and the open source movement in x86.

------
nickpsecurity
One thing author misses is standard cell vs full-custom design. Most ARM’s are
standard cell that I’m aware of. Intel and AMD do their x86’s with full-custom
design. Standard cell lets you write in a high-level language that’s
synthesized into low-level, logical form using combinations of building blocks
(“standard cells”). Like high-level programming vs assembly, there’s all kinds
of performance costs to this vs making a custom, low-level solution ideal for
the problem and process it runs on. Like doing huge apps in assembly, you need
specialists that might cost more doing work that will take way, way, way
longer with more difficulties in verification (more rework).

I don’t know if Apple’s ARM is fully-custom. It wouldn’t surprise me if the
fast-path parts are. Standard cell designs can be pretty fast due to constant
advances in synthesis. They’ll always be behind full-custom on same process
node just because the latter puts more optimization effort in. Most choose
standard cell since it’s faster to develop (time to market) and cheaper. Those
wanting max performance or lowest energy will be using full-custom if they can
afford it. Also worth noting that the Apple A12 is 7nm vs Core i7’s 14nm per
Intel’s site. Apple to apples would compare that design on 14nm or node with
similar performance to it.

Btw, there’s detailed analysis below of the A12 with specs, parts breakdown,
and die shot.

[https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-
re...](https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-review-
unveiling-the-silicon-secrets/2)

[https://en.wikipedia.org/wiki/Standard_cell](https://en.wikipedia.org/wiki/Standard_cell)

[https://en.wikipedia.org/wiki/Application-
specific_integrate...](https://en.wikipedia.org/wiki/Application-
specific_integrated_circuit)

------
FavouriteColour
Great article! It certainly supports the idea that Apple switching to ARM
processors for their laptops isn’t crazy talk. Perhaps retaining an intel-
compatible CPU for a few generations to execute intel binaries until the shift
is complete.

BTW, the article uses the acronym IPC without explaining it. It stands for
Instructions Per Cycle. CPUs can and do execute multiple instructions per
clock cycle so this is just a measure of how many.

~~~
pjc50
Apple are more likely to choose "fat binaries" again. They're possibly the
only company who could announce an architecture switch, OS revision bump, and
corresponding changes to development tools all at once.

~~~
gruturo
Also, they're known for having pulled this off successfully, twice (68000 to
PPC, PPC to Intel), which certainly would lend them credibility if they
decided to go for it.

I would buy such a laptop with zero qualms about them bungling the migration.
If it had a decent keyboard and got rid of the touch bar. I'm much more
concerned about having my experience ruined by those.

~~~
whynotminot
Aside from missing the escape key when I'm in VIM, the Touch Bar hasn't
"ruined" my experience with a Macbook Pro.

I'll readily admit the thing is more cool than genuinely useful. It's just not
such a gimmick as to actually "ruin" an experience. And the gain from TouchID
offsets my pain from not having a physical escape key.

In this respect, the Macbook Air gets things right.

WRT the keyboard... your mileage will vary. I hated it at first. Pretty used
to it now.

~~~
gruturo
I don't hate the keyboard. I tried it in a store and after getting over the
surprise of the extremely short key travel distance, it is quite usable - but
the failure rate, the annoyances, the potential out-of-warranty cost turn it
into a complete dealbreaker. It is simply not reliable enough, despite the
redesign. Maybe now at the second redesign it's finally OK but no way I'm
paying for one until we're certain, and that certainty will take a couple
years of real world usage to materialize.

Regarding the touch bar: Indeed the Escape key is what breaks the deal.
Honestly if it began _after_ ESC I'd tolerate it - expensive, close to
useless, but tolerable.

------
bryanlarsen
The A12X is a 10 billion transistor 7 nm 12W chip.

The i6700K is a 3 billion transistor 14 nm 95W chip.

If you assume linear scaling on all three metrics (bad assumption, but rough
rule of thumb) you get 10/3 * 14/7 * 12/95 -> 85%, roughly in line with
benchmark results.

~~~
Entalpi
The A12X includes far more cicuits (due to it being an entire SoC) to it than
a typical Intel CPU.

I believe the GPUs are very different as well and that could play a large role
given how GPUs love to eat upp mm2

~~~
DCKing
The i7 6700K is also a SoC and includes a GPU (weaker than the A12X's) and
many other components that are also included in the A12X. It doesn't have
quite the same level of integration as the A12X, but characterizing one as a
'SoC' and the other as a 'CPU' is inaccurate.

~~~
Symmetry
It's all a matter of degree. On the 6700K the CPU+GPU take up around 80% of
the die whereas on the A12X it's around 40%.

[https://thinkcomputers.org/intel-skylake-die-layout-
detailed...](https://thinkcomputers.org/intel-skylake-die-layout-detailed/)

[https://www.techinsights.com/uploadedImages/Public_Website/C...](https://www.techinsights.com/uploadedImages/Public_Website/Content_-
_Primary/TechInsights_2017/Technology_Blogs/APL1W81_TMJA46P_floorplan.jpg)

~~~
DCKing
That's a die shot of the Apple A12, not the A12X (which has four big cores).
Curious what the extra space goes to though.

------
gumby
BTW ARM does not stand for “Advanced RISC Machines” as author says but “
_Acorn_ RISC Machines”. ARM was originally a joint venture between Acorn and
Apple.

~~~
detaro
And that joint venture was called "Advanced RISC Machines Ltd." Acorn RISC
machines was the project name before that cooperation.

~~~
gumby
Yeah, I remembered some apple folks working on the 610 at Acorn but that was
before the JV was formed. The JV I was talking about did, as you say, use
"advanced".

------
asdfrouge
I've recently found this interview from 2012 where Amazon's VP explains why he
thinks mobile CPU architectures will take over server space. 7 years later -
we are getting closer.

[https://www.youtube.com/watch?v=BOYdKht1YwE](https://www.youtube.com/watch?v=BOYdKht1YwE)

~~~
akhilcacharya
This is a really interesting video, not only for the content but because AWS
ended up canceling their relationship with AMD to buy Annapurna labs to build
their first ARM CPU.

------
tonyedgecombe
It doesn't really matter whether they are comparable to desktop processors.
What's important is are they good enough. Judging by the performance of the
iPad Pro the answer is almost certainly yes for many or even most users.

------
raesene9
Worth noting that you can buy Windows ARM laptops now, and that there are ARM
apps available for them.

[https://www.neowin.net/news/lenovo-yoga-c630-review-
windows-...](https://www.neowin.net/news/lenovo-yoga-c630-review-windows-on-
arm-in-a-real-laptop-with-8gb-ram/)

~~~
akhilcacharya
The problem for me is they only support Windows currently - I'd love for a
fast 8CX machine that ran Linux. The closest thing right now is the OP1/RK3399
in some of the mainline Chromebooks but I'm not sure if those have full Linux
support yet.

------
altmind
The article already mentions this, but I want to re-state that Geekbench
results do not correlate with real world prefroamance and systematically favor
ios.

The community never received explanation from the authors on these
discrepancies, so everybody should be cautious of these promises that these
biases are eliminated in GB4.

~~~
dep_b
> Geekbench results do not correlate with real world prefroamance and
> systematically favor ios.

So what benchmark would be fair to you? Even Cinebench doesn't closely
approach a daily workload for the typical Cinema 4D user, it's just a
different way to tax the system at full.

Also, throttling is less of (or even completely not?) an issue on iPads than
on iPhones. You see that on benchmarks that typically tend to reach the
throttling limits on iPhones like AnTuTu. The performance gap is much wider
while the SoC isn't that much more powerful.

A desktop system with an ARM processor should closely match the synthetic
results of GeekBench since GeekBench measures maximum performance not
persistent performance.

------
sigi45
As long as it is not much faster and cheaper than x86, there is no reason for
me to switch. I think i can't even switch atm.

But i don't mind more competition on the market as long as it doesn't cost me
headaches like 'this package doesn't compile on arm' or whatever.

------
falsedan
Really looking forward to a ARM-based JS co-processor in my desktop that just
runs a bunch of V8 VMs

------
AmVess
A12X has reached parity or better with Intel in terms of integer performance,
but what about floating point? Even if they are way behind Intel in that
regard, I could see Apple making a desktop version with a beefy fp unit that a
phone SOC wouldn't need.

~~~
baybal2
Floating point is not a totem animal of the computing world anymore. If you
have a lot of FP math to do, a specialised accelerator will do that 100 times
faster than CPU.

Integer math is on the other side is what the most of programs you use every
day are made of. And their complex, branching rich code is near impossible to
feed to any specialised DSP.

Modern CPUs must be compared on integer math and logical operations
performance, followed by their IO performance.

What an average user understands as performance today is really (integer perf
+ logic op perf) * effective I/O throughput.

~~~
jononor
That accelerator for floating point is usually the GPU.

------
jinpan
Given that the A12X runs at < 10% of the TDP of an Intel i7 chip, would it be
feasible to utilize many of these A12X's in a 10+ socket motherboard to
significantly increase NUMA-compatible workloads?

------
russellbeattie
Question: One of the points in the article is that in order to compare a
desktop x86 chip to an ARM chip, you have to take clock speed into account. I
assume both architectures are optimized for certain speeds at a low level
(like literally the electricity flowing through the transistors), so you can't
just overclock the hell out of an ARM (with cooling, etc.) to reach desktop
speeds, nor vice versa to make a low-powered x86 chip.

Is this correct? Is the final output of the silicon from each architecture
fundamentally different?

------
markhahn
insight-free. no one has cared about RISC vs CISC for decades; they care about
TCO (which includes the network effects of their preferred software stack,
etc). ARMs big problem is that it's been overpromising (to the desktop and
server world) for years. ultimately, joules-per-flop is going to be the same,
no matter whether the FPU is wrapped inside x86 or ARM. so the question
becomes: how cheap are decent/high-end ARM chips going to be? it's not nice to
be "ahead" in a race to the bottom.

------
nottorp
So can you build a system with an A12X or equivalent that can saturate a SATA
3 ssd? Not to mention NVME.

Can you put 32+ G of dual channel ram in there?

It's not only the CPU performance that matters...

~~~
Synaesthesia
The iPhones have high speed SSD’s and it’s a big part of their performance.
Yeah they absolutely can saturate it.

------
sneak
> _To be clear, this doesn 't indicate that ARM chips are slower, just that
> they aren’t natively supported by desktop OSes like Windows and MacOS._

This made me chuckle.

------
ahartmetz
Now if Apple could be bothered to sell its fine CPUs for Linux machines. Not
going to happen of course, so we have to wait for Qualcomm, AMD (K12), Huawei,
...

------
kbumsik
What about the impacts by archtecture-specific optimizations such as AVX vs
NEON? I'm not sure but I suppose many desktop applications currently assumes
the users are x86 so that they never think about ARM SIMD instructions.

------
C1sc0cat
The Intel processor compared is a 14Nm part from 2015 how old is this blog
post.

~~~
Synaesthesia
Intel CPU’s haven’t really made huge leaps in performance or power consumption
since then so it’s a fair comparison.

~~~
C1sc0cat
The i9-9960X is over 40% and is a 16/32 vs a 4/8

------
thisisit
I have heard that even actual desktop level processors like ThunderX2 aren't
as good as Intel processors. So, what chances do A12X have really?

~~~
Veedrac
ThunderX2 isn't made by Apple.

------
PaulHoule
Note that high end phones sell for more than low end PCs, and as long as that
is the case there will not be a push to ARMify desktop machines.

------
simula67
I hope its a RISC V processor that wins out and not an ARM one. RISC V is more
open than ARM

Make it happen Apple !

~~~
detaro
Unlikely. Apple has tons of ARM expertise now, while RISC-V will still take
lots of work to get in the same league, and they aren't so price-sensitive the
ARM license costs are a big problem.

~~~
childintime
Apple is performance sensitive though, which matters a few years down the
road. RISC-V has some pretty compelling technical advantages in that
department: overal simplicity, compressed code density. Now add easy
customization and the fact that the open ecosystem will gravitate towards, and
it is likely a winner regardless. So after 5-10 years Apple may have to
abandon ARM.

------
throwaway2048
Call me when there is something to show besides geekbench synthetic scores
(which heavily favor Apple iOS devices, going as far as utilizing asic and
specialized instructions for stuff like compression and javascript) and
javascript benchmarks which run for milliseconds and are super prone to highly
specific tuning (See the general uselessness of stuff like sunspider for
actually measuring anything a user would consider to be performance)

from the article:

    
    
        The next issue I want to address are fallacies 
        I've seen permeate discussions around ARM performance: 
        that a benchmark is not at all useful because it is 
        flawed or not objective in some way. I understand the
        reaction to a degree (I once wrote an article 
        criticising camera benchmarks that reduce complex data 
        into single scalar numbers), but I also believe that 
        it’s possible to understand the shortcomings of a 
        benchmark, not to approach it objectively, and to 
        properly consider what it might indicate, rather than 
        dismissing it out of hand. 
    

This is some pretty insipid hand waving, they then go on to address exactly
nothing about any shortcomings, and keep pretending that these benchmarks
generalize meaningfully.

~~~
BeeOnRope
Actual application benchmarks and many Spec2006 sub-tests show up right in the
same ballpark.

Where Intel still has a big lead is in multi-threaded performance and heavy
SIMD use.

