
Arm unveils 7nm Cortex-A76 CPU - signa11
https://www.anandtech.com/show/12785/arm-cortex-a76-cpu-unveiled-7nm-powerhouse
======
supernova87a
I'm not a hardware knowledgable guy, but would love to read/hear an intro
about what chip design is all about (at a lay person's level I guess).

I understand that ARM sells core designs, and that each company assembles them
/ designs them in the way suits their performance needs, and then gets them
manufactured?

But what does that really mean? What is the "user" doing? Are they arranging
them like kids' Lego blocks on a die surface? Dragging and dropping blocks of
code? Are they tweaking the voltage settings? Are they combining them in ways
that do sequential operations special to them? What's the added step here?
Like, why doesn't ARM just create the chips that the end OEMs want?

Is it like cooking or chemistry? What's the closest analogy? I would love to
get a more intuitive feel about what chip design is all about.

Thanks!

~~~
fpgaminer
You've got the right general idea already :)

> I understand that ARM sells core designs, and that each company assembles
> them / designs them in the way suits their performance needs, and then gets
> them manufactured?

Most consumers of ARM cores are simply interested in integrating them into a
larger design; usually an SoC. Not so much in tweaking performance, though
certainly they'll choose whether to prioritize performance or power in their
application.

> But what does that really mean? What is the "user" doing? Are they arranging
> them like kids' Lego blocks on a die surface? Dragging and dropping blocks
> of code?

Pretty much like Lego blocks. Most shops just want an SoC that does X, Y, and
Z with Foo requirements. So they grab an ARM core, an HDMI 2.0 RX core, and
H.265 core, and glue them together.

Depending on what tools they're using this "gluing" is specified in different
ways. You can design it somewhat abstract in a block diagram, where you
specify all the cores you want, specify what (virtual) pins from each core
connect to what others, and maybe tweak a few parameters on some of the cores.
It looks kinda like this: [https://www.altera.com/content/dam/altera-
www/global/en_US/i...](https://www.altera.com/content/dam/altera-
www/global/en_US/images/support/examples/quartus/images/altpll_fig2.gif)

Or there are even higher level tools that let you specify and hook these
things together in a GUI specifically designed for building designs like this.
They look like this:
[https://i.stack.imgur.com/yud23.png](https://i.stack.imgur.com/yud23.png)

But those and other higher level tools all basically just have a compiler that
converts specifications into code (Verilog or VHDL).

That is then handed to another compiler which creates a netlist. Think of this
like assembly code but for hardware. It specifies the whole design at the
level of logical operations. 2-bit AND here, 4-bit Full Adder there, etc.
Finally that netlist is thrown through _another_ compiler which does final
place and route. Place and route is where the netlist is converted into the
transistors and wires for the actual die, and then rendered out into all the
layer masks that will be sent to the fab. (I'm glossing over a few details
here. E.g. place and route is actually working from a library of transistor
designs for each possible logical gate, pre-designed by the silicon fab that
your going to send your masks to.). It sounds simple, but until place and
route your design is just an abstract spaghetti of logical operations and
connections between. Place and route has to solve an NP-Complete optimization
to figure out where, on the physical die, all the transistors are going to go,
given a set of constraints (transistors need to be close enough to their
neighbors to achieve the performance requirements).

Anyway, stepping back, the design phase is where you specify various
parameters for the cores. These parameters typically change things like
enabling/disabling features of the core. For example on an ARM core you might
disable the math co-processor to save space/power if you don't need it; make
the pipelines smaller, etc. These are configuration options; you aren't
editing the ARM core's code. The core itself specifies how its code changes
depending on configuration options.

The place and router phase is where you have some control over
power/performance. You can tell the tools to focus on power if your design
needs to be low power. It'll then design the transistors in such a way that
it'll use less power but also lose some performance. Or the opposite.

Of course how the cores themselves are designed is important to power versus
performance, and ARM cores probably have some configuration options that
affect their behavior in that regard.

And then the silicon you target is of the utmost importance for design trade
offs. Targeting 7nm fabs versus something larger will usually mean a more
power efficient and performant design, but will have _much_ higher up-front
costs (the physical masks you deliver to the fab cost tens of millions of
dollars or more).

To be clear, this is the level at which _most_ consumers of ARM cores are
working at. They're just putting Lego blocks together. The ARM core is just a
black box in that regard. ARM delivers the core's "code" to you as a pre-
processed netlist. You can't see its real code (you just see a chaotic
spaghetti of logical operations). But some companies have more special needs
and want to tweak the core in specific ways (e.g. Apple). They'll have special
deals with ARM that give them access to the core's source code where they can
make custom tweaks. But this is rare.

Some companies have their own IP which they might integrate into a design.
Their own cores, coded from scratch. These are coded in Verilog or VHDL, for
the most part.

> Like, why doesn't ARM just create the chips that the end OEMs want?

ARM cores get used in TONS of custom silicon. Wifi routers, drones, cell phone
chips, cell phone coprocessor chips, etc. So part of it is that ARM just
couldn't possibly design and build all these different kinds of chips.

It's also that ARM does what ARM does best: design ARM cores. That's a job
that takes an entire company all to itself to accomplish. Anything else is
just beyond the scope of their company (for now).

~~~
fpgaminer
Addendum:

I suppose one way to think about this is to imagine old-school computers. I'm
talking about the ones built from TTL logic chips; pre-6502/8080/etc.

ARM is basically selling a virtual "board" with their CPU implemented using
those logic chips. You, as the designer, can then connect their board up to
other boards to have other functionality you want. A graphics board, a sound
board, etc.

The difference between those days and today is that these are all virtual. So
after you've plugged the boards together a compiler can come through and
optimize everything into a final single "board". Which is actually a set of
masks used to fab chips on a single piece of silicon.

And these boards are somewhat abstractly specified, letting you enable and
disable whole portions of it and have the design adapt accordingly (disabling
instructions/functionality/etc).

This analogy isn't far from the truth, since a netlist is really just a list
of logical operations, aka just like TTL logic chips, and their connections.

Modern chip development is simply an evolved form of these primordial design
techniques. We've replaced manual place and route with "compilers" and
optimization algorithms (the original 6502's masks were _hand drawn_.
Engineers crawling over a giant plastic sheets making cuts to draw all the
transistors and wires). We've replaced manually specifying netlists with
higher level languages like Verilog and VHDL. We replaced TTL level CPUs with
integrated CPUs. And then eventually replaced whole boards with SoCs.

So if you want to learn chip design; start from the beginning; history is very
illuminating. Transistors -> TTL logic chips -> 6502/Z80 designs -> SoCs.

~~~
supernova87a
Oh a brief followup question if you have interest -- How does one know when
it's time to design your own chip? Versus take something off the shelf? At
what level of company or product maturity is this realization even likely to
be discovered?

------
awill
I do find these articles frustrating. They continue to rave about ARM (or
Qualcomm) improvements, while casually mentioning that they're at least 2
years behind Apple. Being 2-3 years behind Apple should be front and centre of
the article. It's a big deal! Unfortunately, as an Android user, I don't have
a choice, and I suspect my next phone will have an SD845.

You didn't see that kind of leniency when AMD was releasing slow/inefficient
CPUs pre Zen.

~~~
solarkraft
The difference is that Apple's chips are only in Apple devices and not on the
market for anyone to build a system around - So the other chips are all we
have.

~~~
astrodust
It's more likely that Apple doesn't release a lot of technical information
about their chips that sites like Anandtech can rework into some kind of
article.

"Apple releases new X series chip, 40% faster, internal details unknown" is
not really a compelling story.

------
nadioca
I wanna see a realistic benchmark that compares it against intel's x86 i"X"
(6th, 7th, 8th gen) to decide if it can be called a laptop class processor in
first place... BTW, it's interesting to see how good apple is in designing
their custom Arm processors.

~~~
signa11
> it's interesting to see how good apple is in designing their custom Arm
> processors.

ooh, probably because apple acquired a company called pa-semi (300m iirc)
which had a bunch of cpu dudes doing power-efficient chips for quite a
while...

~~~
nadioca
Looking online about this company. Apple has acquired pa-semi for $278m in
2008 which translated in having the fasted arm processor by a wide margin a
decade later, definitely paid out very well.

~~~
simonh
It took a lot less than a decade, they were already ahead when the A7 launched
in 2013. So they took the lead in 5 years, and now have held it for another 5.

------
Brakenshire
I wonder whether attempting to move into a laptop form factor will translate
to more emphasis to get these reference SoC's fully on the mainline kernel.
You'd have thought it would make it a lot easier for any non-Windows products
to be launched if that was in place.

------
djrogers
So the newest Cortex will roughly match the 2016 Apple A10 CPU? No wonder so
many android tablets are painfully slow...

~~~
baybal2
You need to compare few things other than raw performance. Latest Apple chips
are humongous for ARM based SoCs.

This is in big part due to them having bigger caches than some server CPUs.

~~~
strmpnk
I was curious about this claim so I looked some numbers up. The latest A11
chip has 8MiB of L2 cache. I could only find a small handful of xeon models
with 6MiB of cache while almost all of them contain equal or greater amounts
of L2 cache. The chip does not have L3 cache so there is nothing to compare
here. The A10 has total package cache less than 8MiB (split into L2 and L3).
So I'm not sure this is a good explanation of the performance or size
difference.

From what I can tell, while cache is still a large portion of the area, GPU
die space is quite large now. The CPU performance seems to be due to expanded
execution ports leading to more dispatched instructions per cycle, which is
generally a good idea as long as one can keep the execution ports fed with a
good amount of speculation in other stages of the pipeline.

~~~
baybal2
You also have to add SRAM on the GPU, decoders, and etc that adds to 13mb

------
d33
Any idea if it would be affected by Spectre?

~~~
tambre
It's no longer vulnerable to variant 2 and 3.[1]

[1] [https://developer.arm.com/support/arm-security-
updates/specu...](https://developer.arm.com/support/arm-security-
updates/speculative-processor-vulnerability)

