Hacker News new | past | comments | ask | show | jobs | submit login
Intel x86 documentation has more pages than the 6502 has transistors (righto.com)
300 points by ingve on Dec 7, 2016 | hide | past | web | favorite | 133 comments

Some comments miss the point. I don't think it's a suggestion that x86 has too much documentation or too much transistors. It just gives a picture in to how the hardware exploded in capacity, complexity and so on. Which is great (though obviously comes with a cost).

I disagree that it's great. The immense complexity associated with creating a competitve x86 processor gives Intel a huge competitive moat. You can't make a competing processor without billions of dollars of R&D.

As a result Intel is able to deploy highly unfavorable features like the Management Engine and consumers don't feel like they have an alternative. A reasonable alternative like Talos costs $3700.

If startups could reasonably compete with traditional volumes of funding, we'd probably have better hardware.

The catch is in the economy of scale. You can't make a Power8 cost comparably to an i7 unless you also produce it in huge numbers.

ARM chips are cheap not (only) because the R&D is excessively cheap but because the chips are produced in hundreds of millions. Were it not for the explosion of mobile phones, they'd stay in the "nice but expensive for the performance" niche.

Hundreds of millions is peanuts for ARM. https://www.arm.com claims "ARM’s partners shipped 14.9 billion ARM-based chips in 2015."

That's about two CPUs for everyone on earth and likely two more this year.

People don't realise how many things have CPUs these days. PPC (not POWER) and MIPS are also still within the same magnitude as x86 chips (hundreds of millions/year) - they both used to be selling more than x86 because they were more widespread in embedded niches (networking, automotive, set-top boxes), but I don't know whether or not that's still true or not.

It's not economies of scale that gives Intel its sustained advantage as much as it is huge margins that allow them to continue re-investing in their fabrication process advantage, which again protects their margins by keeping them on top in the only segments of the CPU market that are both high price, high margin (at east for the top SKUs) and high unit.

> Were it not for the explosion of mobile phones, they'd stay in the "nice but expensive for the performance" niche.


Reportedly only 1.5B phones have been sold in 2015, compared to 15B of ARM cores.

There is apparently a huge market of small cheap ARM microcontrollers, to the point that I heard they are sometimes getting cheaper than simple 8bit micros.

Economies of scale are governed by how difficult it is to optimize something, and how accessible the research is.

A huge part of the cost of a chip is the design. You can get custom fab jobs for under $10,000. But if the design requires you to implement hundreds of op codes, the chip will be nowhere near that cheap.

With simpler specs, a lot of this design cost is reduced. You may not be able to get 11nm chips, but you don't need that to be competitive. You just need to be close enough that your performance hit is not a deal breaker, at a price that's not a deal breaker.

It seems to me more and more that Intel has relied way too much on x86 to shut out competition in the last 10-15 years. The world is currently moving onto greener pastures (lower power / higher performance per watt) and Intel is loosing more and more of their market at the bottom and top ends. Xeon Phi and their failed x86 mobile adventures to me are the prime indications that they bet too much on one horse.

Nah. Intel tried several times to quit the x86. They couldn't because the market rejected anything that wasn't backwards compatible.

The market rejected it for desktops and servers. But not in other markets. Which is why Intel is still selling in the hundreds of millions of CPUs, while ARM licensees are selling in the billions, and MIPS and PPC licensees are also selling in the hundreds of millions (but reap far less revenue than Intel, as more of are low price units going into embedded systems).

I have a feeling that the market rejected it because rewriting a ton of software didn't make sense for a negligible performance "upgrade", not because it wasn't backwards compatible. There were also emulators available to run x86 code.

> When first released in 2001, Itanium's performance was disappointing compared to better-established RISC and CISC processors. - Wikipedia

I think the backwards compatibility problem wasn't that it lacked backwards compatibility, but rather that the proposition was to pay more to run your existing binaries slower. I recall Pentium Pro having a similar problem with 16-bit code.

well, did they try it for the mobile or the HPC market?

You mean with XScale and Itanium? Yes they tried.

From what I've read, it seems that the Itanium was Intel's attempt to get away from x86 but it didn't work out as they had hoped. It seems to me that it is market-forces that have kept x86 as the incumbent instruction set.

People keep bringing this up when talking about Intel's botched mobile strategy, but IMO it just doesn't apply. Itanium was marketed based on features for servers that frankly weren't really popular. People always wanted one of the following things:

* performance per dollar

* performance per watt

* highest single threaded performance

* lowest possible power draw

Itanium AFAIK improved neither of these, or it did so only for very narrow usecases. What Intel needed was either (a) an ARM competitor and/or (b) a processor with large vectors and a SIMT-like parallelisation model (i.e. a Tesla competitor). In both of these cases they didn't think twice before just throwing x86 at the problem until the pain goes away... which it never did.

Just because the Itanium was a failure doesn't change the fact that Intel have tried several times to get away from x86. So it's unfair to claim that Intel have relied purely on x86 in order to shut out competition. Of course, they're not going to quit x86 until another instruction set has been proven more popular to the market and they have made attempts to create that instruction set.

Agreed, some people just don't realize how many arches were tried at Intel, i432, i860, i960 etc...

Still missing the point. Which non-x86 architectures targeted at low power or HPC - if you count computer graphics as a subset the main growth markets - were tried in the last 15 years?

Why trying to win people over on compatibility if they have to rewrite their software anyways for other reasons, especially after ARM took the largest market share in mobile?

I think you look at it the other way round -- it is not so much Intel who wanted to win customers over compatibility, but rather the customers who had to port their software, and suddenly saw a possibility to not do that.

It took a while to Intel to accept that the mobile-app providers won't invest the same effort in porting their apps from ARM to x86 given the market penetration, but I don't believe it was a suprising outcome to them.

Is the x86 architecture itself limited in its adaptability compared to others such as ARM? If yes, why and if no, why are Intel's mobile/low-power attempts failing?

The way I understand it, x86 does quite a lot in hardware what could be done in Software (compiler) just as well. That includes backwards compatibility to a whole host of instruction sets / extensions. All this seems closely linked to the IBM PC standard and Microsofts software strategy. All of this costs power. The result is that you can either have a mobile x86 that's competitive on power or on performance, but not both.

Hmm, I don't believe that's true. The power budget of the backwards compatibility is very small.

Intel's problem is a case of optimization I think - their design teams have been optimizing towards producing a processor with an effective power budget of whatever the cooler could dissipate, rather than optimizing for low power, and that's a very different task.

There's no one processor that will fit all workloads.

I think compatibility to IBM PC has lots of hidden costs. One is that it seems to make Intel much less nimble at producing SoCs with features that are specific for each usecase. It's not just the ISA, it's basically the whole ecosystem around PC that's not well adapted. That's why we don't have Dell, HP and Intel phones and hardly any Microsoft ones today. x86 may not be the only reason, but from what I can see it's at least one of the major ones.

So if Intel made worse hardware, it would have competition. Yes, that is true.

On the other hand, it's a completely useless situation for everybody who actually wants to use hardware, instead of getting rich in the hardware business.

Would you rather we were in a world where the 6502 was still current-generation hardware and any hillbilly with a fab in their shed could crank out a top-end CPU?

Intel's got a lock on top-end performance, but don't think for a second they're in control of the CPU market.

You can get a custom ASIC fabricated for under $10K in low volumes using process that's a generation behind. You can get things fabbed on the current process if you're willing to pay more, but it's on the order of $1-2M or so. It's not often people make new CPUs, but look at how the Bitcoin space took advantage of low ASIC prototyping and production costs to produce high-performance hashing devices.

The cost of producing a 6502 chip in that era was basically the cost of building your own foundry. Now we have places like TSMC that will make chips for anyone who can afford the tape-out costs.

Would you rather we were in a world where the 6502 was still current-generation hardware and any hillbilly with a fab in their shed could crank out a top-end CPU?

In a heartbeat!

Play more Fallout then. That's what that world looks like.

I'd rather live here where if you wanted you could tape out some 60nm chips for less than it costs to print some posters.

That moat is accompanied by staggering amounts of risk. They spend well over $1 billion on each new fab plant, which they have no guarantee of breaking even. I hate the ME too, by the way. But we still have a choice. No one forces us to use Intel.

On this topic, Microsoft just partnered with Qualcomm to use their chips in new devices.


Won't RISC MIT-licences hardware fix that issue to some degree?

It it is one of those interesting metrics, or one I came across researching the Cortex-M which is that at modern process node geometries the CPU part of a Cortex-M chip (not including all of the on chip peripherals, RAM, and Flash) easily fits inside the bond pad of an 8080A. As the 8080A had 40 pins that is 40 Cortex-M CPUs in silicon that Intel "threw away" by depositing a square of gold on the silicon.

But in terms of documentation that is related directly to the transistors, it would be interesting to evaluate number of lines of VHDL to the number of inferred transistors. I know you get a report after you have finished place and route on your typical work flow but has anyone rolled that up to "2.5 lines of VHDL per 100 transistors" or something?

Someone here did the math before. Motorola 68k was fabricated on 3.5um node. If it were made today with 14nm node, it (the whole 68k) would fit on an area of a single transistor from the original 68k made with 3.5um node. That's 68,000 Motorola 68000s inside an original Motorola 68000! With newer nodes, even more.

They continue to add transistors to newer processors. It is a good sign that someone keeps track of them and writes documentation on how to use all these new transistors.

I heard they added a couple of transistors to the latest Intel chips. One of the transistors hasn't been documented yet, though, which is unlike Intel.

I'm imagining the documentation for an individual transistor that extends all the way up the stack... and thinking about Escher for some reason.

This makes me wonder how many undocumented transistors are on the chips -- there because nobody knows why.

Apparently Nvidia had to drop VBIOS a bit early because it started having issues but they no longer had enough people who knew how it worked to continue supporting the feature.

So this post is supposed to show a relationship between transistors and documentation. Fine.

x86 has 4181 pages of documentation. Quad-core Skylake has 1.75 billion transistors. This is 418560 transistors per page.

The 6502 had 12 pages of documentation. It had 3,510 transistors. This is 292 transistors per page.

Advantage x86.

You're comparing a single-core to a quad-core. Moreover, the Skylake is an SoC that contains stuff other than the CPU cores.

Yes, Skylake is superscalar, hyperthreaded, multicore, SIMD, OOO, speculative and a few other things too.

Yes, Skylake contains GPU stuff in its transistors and also in its documentation.

Cache is the majority.

SPARC M7 wins at 10,000,000,000 transistors.

That's because the SPARC M7 has a significantly larger die size. It's a very specialized chip that costs an order of magnitude more than consumer Intel hardware and is manufactured with tech a generation behind.

If Intel were to manufacture a chip as big as that, it would have 20-30 billion transistors.

AFAIK their biggest one is Knight's Landing at 8B transistors.

But if you have a box of 10k 6502's you can use the same 12 pages of docs and get 2920000 transistors per page.

But it's only fair to compare Skylake with at most a dozen or two of 6502s because that's how many cores are included in the Intel chip.

I'd be keen to see a comparison of transistors counts between modern and older processors with caches removed (I could be wrong, but I thought built-in caches made up the majority of transistor counts nowadays). That is, how much more complex is a single cache-less core now vs then? Not as much as the overall transistor count would indicate, I suspect.

Annotated die photo of a single-core VIA Nano chip from some 10 years ago:


As you can see, caches are merely half of the chip and half of the rest is about dynamic instruction reordering, branch prediction, register renaming and stuff.

These things have pipelined execution units so they can start a new instruction before the previous one is finished executing, enough duplication to start executing two or three instructions per cycle (sometimes even of the same kind, say two floating point SIMD additions) and logic to schedule instructions wrt data dependencies, not program order, so that instructions which need input data not yet available can wait for a few cycles while other, later instructions are executing.

And all of this has to be done with some degree of appearance of executing instructions serially, so if say some instruction causes a page fault and a jump to the OS fault handling code, the CPU has to cancel all later instructions which may have already finished executing :)

And, btw, this is not in any way specific to x86. POWER, high-end ARM, they all do it.

Fascinating. Where do you learn this stuff? Any recommended reading you can point me to as a complete novice with regard to CPU architecture?

That's somewhat hard to answer because I've been accumulating knowledge from many sources over many years, and it started with some dead-tree book from the '90s :)

Maybe Agner's site, in particular his microarchitecture manual, would be a reasonable place to start:


There are "software optimization manuals" from CPU vendors, but these may not be particularly novice-friendly. I think I've used Wikipedia at times for general CPU-agnostic concepts, though it has a tendency to use jargon with tittle explanation. Occasionally somebody submits something to HN.

On the lowest level, it may be helpful to know some simple digital circuits (decoders, multiplexers, adders, flip-flops, ...?) just to have an idea of what kind of things can be done in hardware.

Many CS departments teach from Patterson and Hennessey, Computer Organization and Design for intro, Computer Architecture: A Quantitative Approach for advanced.

They're pretty complex. For example, the modern x86 (afaik) uses lookup tables to make multiplication faster. The 6502 didn't even have integer multiplication.

Yes, we rolled it ourselves with ASL's (Arithmentic Shift Left) and ADC's (ADd with Carry) and the like, typically hardcoding it if we knew one of the operands. Particularly fun with pretty much always having to be prepared to deal with overflow since the registers are only 8 bit. I love the 6502, but I can't say I miss that part.

Probably be good to add (2013) to the title as that was both when it was written and the latest x86 documentation at the time. Since then, it looks like the manual as 4670 pages now[0], which surpasses the 6502's "all transistors but ROM/PLA" count.

At this rate, 1419 pages over 3 years, the Intel x86 documentation's page count will exceed all transistor counts of the 6502 around 2020.

[0]: https://software.intel.com/en-us/articles/intel-sdm#combined

Somebody posted a great educational video narrated by William Shatner in the late 1970s: https://www.youtube.com/watch?v=VJmero_L7g0 (14 mins).

Shatner plays it pretty straight here, and he does a great job of making the subject mattter interesting to audiences of the day. What's interesting about the video now is that every time he promises us "Thousands of transistors on the head of a pin," you can remind yourself that what we actually got was billions of transistors.

It's humbling from a software engineering perspective to contemplate how poorly we're taking advantage of the semiconductor industry's Promethean gift. My computer still looks a lot like the Apple IIs and TRS-80s did in that video, and the same is true for my workflow.

Now just imagine how many tens of thousands of pages Intel has on internal documentation...

And still no one knows why x86 control registers are CR0, CR2 and CR3. What was CR1 supposed to be used for?!?

(This is actually true. I asked x86 architects when I met some).

According to http://www.pagetable.com/?p=364 (which also shows part of the reason the x86 needs so much documentation, by the way), it started its life "reserved", and never got a real role in life.

Sounds like someone meant to use CR1 after CR0, but someone bit-swapped the CR-number bus so you have to encode 4 in the instruction to get 1 where it needs to go.

Yes, but: "instead of overflowing the new bits into CR1, Intel decided to skip it and open up CR4 instead – for unknown reasons."

Sounds like someone took "reserved" a bit too seriously and/or couldn't figure out who had marked it reserved or why, and decided the latter was the safer option.

"We don't talk about CR1". ;-)

Maybe I'm naive, but it seems incredibly impressive that the 6502 only has that many transistors!

<showing my age here>

When I remember all the stuff you could do on an AppleII (and a BBC Micro) with their tiny four odd thousand transistor cpus and 4 whole KB of ram - and consider how much time I spend waiting for this laptop with it's billion-or-so transistor cpu and 16 GB of ram - it's almost enough to make you weep about the profligate waste of resources of the entire software engineering profession... ;-)

I sometimes think that. I am amazed at when I look back at the BBC Micro and think of the software I used to run on there, and how they did it with such tiny amounts of RAM.

I do realise that the last 20 years of GUI progress has stalled and that you could take a Mac from yesteryear or PC from ~1991 and know your way around it without any trouble at all.

Of course software development strategies have changed and languages now let us express ourselves in previously unimaginable ways, but we've come so far and not far at all.

I am particularly struck with the craze over the last 5+ years with regard of "cloud" and shoving data to the other side of the world, particularly given the microcomputer revolution and the lack of need to shove your data elsewhere. That's what the microcomputer is for!

Well, it's a pretty simple CPU. It only had 56 instructions, 40 pins, and 8 bytes of registers. It assumes direct, single-cycle access to memory. It was originally supposed to have a ROtate-Right instruction, but it had a bug, so the instruction was not included in the manual. Also, invalid instructions are not detected, leading to the discovery of ROtate-Right and some accidental instructions, as well as the creation of some cool extra-instruction-trapping hardware.

The LGP-21 [0] has the fewest transistors for any mass-market computer I've read about - 460, and 300 diodes.

[0] https://en.wikipedia.org/wiki/LGP-30#LGP-21

I would say that roughly 25% of the documentation applies to ancient modes of operation like real mode and protected mode. Unless you REALLY need to know the fine details of these modes you can skip right to the long mode stuff.

Windows still uses a lot of programs in emulated protected mode, so it's pretty relevant still.

Does that include all the secret documentation for stuff like LOADALL, ICEBP etc?



You will love this paragraph:

"Unlike the 286 LOADALL, the 386 LOADALL is still an Intel top secret. l do not know of any document that describes its use, format, or acknowledges its existence. Very few people at Intel wil1 acknowledge that LOADALL even exists in the 80386 mask. The official Intel line is that, due to U.S. Military pressure, LOADALL was removed from the 80386 mask over a year ago. However, running the program in Listing-2 demonstrates that LOADALL is alive, well, and still available on the latest stepping of the 80386."

Just imagine whats in Intel chips now due to NSA pressure :/

For those curious about the "what" and who don't know x86 opcodes from memory, from the first link and in its earliest incarnation,

"LOADALL restores the microprocessor state from the State Save Map that is saved during the transition from user mode to ICE mode. LOADALL loads enough of the microprocessor state to ensure return to any processor operating mode."

Why would the US military be pressuring Intel to remove obscure instructions in ancient CPU designs?

Not sure what to love here, it's a debug feature which, according to your source, Intel promised the US Mil to remove for some reasons but ultimately didn't.

There certainly are undocumented debug facilities in modern CPUs. For one example, the leaked Socket AM3 datasheet clearly shows a JTAG interface, though I don't know if it's operational in production silicon.

Hopefully, debug capabilities cannot be used to pwn the CPU from unprivileged code without external debug hardware which could pwn the CPU anyway by itself. It's not even clear if they are enabled in production chips at all.

LOADALL for example worked only in RING0 and got ultimately removed early in the 486 days so it seems Intel cared about security somewhat (and probably also about future compatibility, to be honest, it's not fun when software relies on features you want to change in the next generation).

Nowadays they should care even more - if software backdoors were available and leaked to the public, the magnitude of shit happening in all those cloud companies would be monumental.

> if software backdoors were available and leaked to the public


conveniently "discovered" by a 3 letter agency favorite principle contractor (Batelle Memorial Institute - have fun researching them) employee just after everybody switched to the next(fixed) cpu generation.

I doubt that this can be used for VM escape because it requires access to the physical LAPIC and afaik hypervisors wouldn't allow VMs to touch this.

It also doesn't work from userspace so pretty much all you can do with it is hacking SMM from a kernel running on the bare metal. Maybe useful for rootkits, but truth be told 3LAs seem to have no problem making non-SMM malware undetectable by commercial AVs. See stuxnet :)

> conveniently "discovered" by a 3 letter agency favorite principle contractor

Not sure what you are alluding to. 3LAs wouldn't want this to be known if it was their job, methinks.

I think he's saying that the 3LAs knew about it for a long time, but publicly "discovered" the flaw when it was no longer useful to them (after everyone had upgraded)

It is probably not difficult to create a new x86 version that is user mode compatible with most modern programs but lacks things like segmentation and real mode. New OS versions would be required, but most modern user mode programs would work with few if any modifications.

Things like Linux should work fine with a CPU that strips 16 bit mode entirely (32 bit too, possibly? not sure) as long as you have a BIOS / boot loader that can handle it and - as of when I last looked at the Linux kernel initialisation code over a decade ago - change / strip out a handful of lines that took care of changing the mode.

It'd be interesting, but I don't think it'd save all that much unless you strip 32 bit compatibility as well, and even then it might be less than you think or they probably would have tried to see if the market would want it...

A lot of microcode is about things like segmentation and TSS.

Somebody needs to come up with a law that correlates the size of a processor's documentation to the number of transistors on that processor.

Mark Papermaster[0] should formulate that law.

[0] https://en.wikipedia.org/wiki/Mark_Papermaster

Moore's law for documentation?

Not sure about that one, seems like an inverse law to me, computer games used to come with big booklets with documentation and backstory, now nothing (other than user made wikis of course).

Or a mobile dumbphone came with a manual explaining all the menus and options. Now the only paperwork with a smartphone is legal and warranty.

Oh, I swear it is not inverse for chip documentation (unless it's from a Chinese manufacturer, you have to do with a 2-page leaflet for a 80-pin chip in that case). But it doesn't necessarily mean it is exhaustive high quality doc.

First, we have to acknowledge that most texts (if works for law too) are very diluted now compared to a few decades. There is a lot of blah-blah that doesn't bring information. Information density decreased.

Then, there are docs that are so big (many many thousands pages), that I am sure no editor can read them fully. They pile up copy-paste from older or similar models without checking if it applies to the chip. They don't write a clean doc specifically for the chip. So as a user you can trash parts of the doc. Problem is that you don't know which ones.

Since they don't print manuals any more, they don't have to care about fitting the doc in the book, it's no-limit.

Heh - the first computer in my home was an Osbourne "portable" - a Z80 CP/M machine. It's user manual came with a wiring diagram!

(Which I used as a ~12 year old to work out how to connect a home built 4 micro switch and 2 puch button joystick to the printer port, so I could write blocky text/graphic versions of video games I wanted to play.)

How can i order these books

You can download them from intel website for free.

Source: I've got the x86 and x64 manual instructions set from there, which is thousands of pages in PDF. Rootkits ain't gonna write themselves =)

Ordering details for the Intel architecture documentation are at https://software.intel.com/en-us/articles/intel-sdm - the volumes range from $8 to $23 depending on size. I think the documentation was free when I got it but times have changed. Edit: you can download the PDF (for free) from that link too.

It's quite a surprise to go to HN and see my post from 2013 here, by the way.

If anyone's interested in the 6502, or CPU design in general, this is a very good simulator: http://visual6502.org/JSSim/index.html

Thanks for this link. I have no idea what it's doing but it is an interesting start!

Is the situation appreciably better with ARMv8? How about ARMv7?

Most ARMv8 CPUs in the wild are backwards compatible with 32-bit ARM and Thumb (able to switch modes on CPU exceptions), but AArch64 is a near-complete redesign of the ISA to eliminate cruft. It's very nice that ARM removed things like the conditional execution, weird behavior of the "pc" register, and most of the barrel shifter complexity. It is not an architectural requirement for ARMv8 processors to implement the 32-bit ARM ISA (though of course for practical reasons they do today). So, eventually, if we start seeing ARMv8 processors that eliminate the 32-bit compatibility, we may see a nice architectural simplification (though I wouldn't be surprised if the transistor count doesn't decrease that much).

I think Apple will be the first to drop 32-bit compatibility, since they've been requiring 64-bit support in all iOS App Store submissions for a while now.

Definitely. As a dominant platform owner they can push everyone to bend over backwards all the time. And they will since any fab and power savings will be totally worth it for the user, even if they marginal.

So AArch64 is simpler? Could we expect a simpler architecture to take a performance lead in future?

Could we expect a simpler architecture to take a performance lead in future?

That didn't happen in the past and I doubt it will be true in the future; in fact I'd say one of the reasons ARM remained competitive is because of conditional execution, the "free" barrel shifter, and Thumb mode, which really help with code density (directly related to cache usage) and avoiding branches.

AArch64 looks very much like MIPS, an ISA that hasn't really been known for anything other than being cheap and a good simple pedagogical aid (despite plenty of people being convinced it would easily outperform x86 at a fraction of the cost.) I'd guess any perceived performance increases over AArch32 are primarily due to the widening to 64 bits, and in any case much of the benchmark performance rests on the special functional units (SIMD, crypto, etc.)

> in fact I'd say one of the reasons ARM remained competitive is because of conditional execution, the "free" barrel shifter

No compiler developer would agree with you. The conditional execution wreaks havoc with dependencies, and branches are very cheap if correctly predicted. The barrel shifter is not as useful as you would think (what fraction of instructions are shl or shr?)

Thumb mode does help code density, but not as much as you might think due to Thumb-1 not being practical and Thumb-2 being fairly large. AArch64 is quite a bit denser than x86-64 already.

It is true that the ISA doesn't matter too much from a performance point of view. But why not take advantage of the necessary compatibility break to clean things up? There's a lot of needless complexity in our ISAs from the programmer's point of view, and cleaning it up is just good engineering practice. Let's not saddle future generations with the mistakes of the 1980s.

The barrel shifter is not as useful as you would think (what fraction of instructions are shl or shr?)


The immediate value encoding is still there. What's gone is the barrel shifter on arithmetic instructions, other than those that explicitly mention that they perform a shift.

If you go by pages in the specification https://people-mozilla.org/~sstangl/arm/AArch64-Reference-Ma... is 5183 pages, which is about 10% longer than the Intel manual, which currently has 4670 pages.

2,036 of those pages are for AArch32, though. In contrast to ARM, AMD didn't introduce a new instruction set for 32-bit x86.

Personally, I prefer ARM's move, because while it's a complexity increase now, it paves the way to someday drop support for 32-bit ARM, which would be a major architectural simplification. It also means that, as a programmer, when you're in 64-bit mode you aren't burdened by all of that weird backwards compatibility stuff going back decades—you get a relatively clean ISA.

(I do have some gripes with AArch64, to be fair: there are too many addressing modes and the condition codes are unnecessary. But I'll take anything that moves us in a more RISC-y direction.)

Yes, it's significantly simpler. I would be very glad to see that yield a performance increase, though I suspect the difference would be small.

Is the situation with x86 bad to begin with?

Granted, I didn't read the full documentation provided online for my hardware before I powered it on. Honestly, I didn't read any documentation and it just works, kind of.

x86-64 has tons of cruft: complex instruction encoding (mod R/M and SIB bytes), bloated instruction encoding thanks to REX prefixes, real mode, virtual 8086 mode, odd SIMD limitations, pointless instructions like XLAT, binary coded decimal, a non-orthogonal instruction set with some 3-register instructions (LEA, IMUL, AVX2) mixed with a bunch of other 2-register instructions, individually addressable low and high bytes of certain registers but not others…

Most of that cruft consumes minimal die area and results in absolutely minimal slowdown compared to an "optimized" minimal architecture. Fun exercise: do a histogram of instructions in some large program's binary. Not a ton of weirdness in practice.

I'm not disputing that. I'm talking about complexity of the programming model (number of pages in the manual), not performance.

What was the main reason to add REX prefix for 64bit? Why not create longer register bits to hold 16 registers? Was it for easier binary to binary transformation?

Because AMD was deathly afraid of AMD64 going the way of Itanium. So they went out of their way to make their architecture as similar to 32-bit x86 as possible, right down to reusing the encoding.

They also probably figured that they could reuse decoding logic.

The paper on the Design of RISC-V has a really good survey of existing ISA like x86:


To show complexity of x86-64 it is best to look at boot process. You processor starts in 16bit mode, then is upgraded to 32bit and then to 64bit mode. You want to do some call to BIOS now? You have to downgrade through 32bit mode to 16 bit mode to do that and then back up to handle the response.

And it is just very small component of cruft that x86-64 has.

The rate at which the complexity of the amd64 boot process is increasing is quite alarming.

UEFI is an overcomplicated, buggy monstrosity, but that's just the tail end of the "boot process". Nowadays, to get an x86 CPU to execute a single opcode, you need to have a Management Engine (or Platform Security Processor, in AMD-speak) firmware blob resident in the firmware flash chip. More modern CPUs, for Intel, say, oblige you to use Intel-provided "memory reference code" and other "firmware support package" blobs just to initialize the CPU in the early stage. AFAIK, Intel isn't even bothering to document the details of its CPU and chipset initialization sequences anymore, in favour of just making people use unexplained blobs. These are just some of the issues the coreboot project is having to deal with. It really feels like at least in the world of x86, the window is rapidly closing on projects like coreboot being able to accomplish anything useful, although there are at least some major users like Chromebooks.

And then of course we have things like SMM, and the way in which secure firmware updates are facilitated (which relies on things like flash write protect functionality)...

Those blobs are run by the BIOS/UEFI correct? Like Grub/the Linux kernel don't need those Intel blobs just to get booting do they?

It's correct that these blobs are loaded way before GRUB or a Linux kernel gets booted. To be precise they are part of the firmware image; UEFI refers to a boot protocol specification. So for example with coreboot, you can select one of many "payloads". Payloads include UEFI boot, MBR boot, etc. So it's probably best to distinguish between the boot protocol and the firmware package as a whole.

The ME firmware is loaded by the CPU itself before anything begins executing; there's a header in the firmware image stored on the CPU to let the CPU find it. These are cryptographically signed, so all projects like Coreboot can do is incorporate the binaries provided by Intel.

The MRC/FSP blobs are executed by the x86 firmware, they're x86 code which runs very early. Theoretically projects like Coreboot could replace these blobs with their own code, but it would require reverse engineering these blobs to figure out what they're doing. The fact that this would be a major effort is a testiment to the complexity of the initialization routines implemented in these blobs.

The order is basically something along the lines of:

1. CPU loads ME firmware, verifies signature, starts it running on the ME coprocessor.

2. First x86 opcode is executed; this is part of the 3rd party firmware (Coreboot, AMI, etc.)

3. The 3rd party firmware will probably start by executing the Intel MRC/FSP blob. (Possibly this blob even expects to be the reset vector now, wouldn't surprise me; I'm not an expert on this.)

4. The memory controllers/chipset/etc. are now setup. The 3rd party firmware can do what it likes at this point.

5. Typically, firmware will implement a standard boot protocol like MBR boot or UEFI boot. Coreboot executes a payload at this stage.

I should add that microcode is another (signed, encrypted) blob. Modern x86 CPUs are so buggy out of the factory that they're often unable to even boot an OS unless a microcode upgrade is applied, so 3rd party firmware often performs a microcode upgrade before booting. Historically I don't believe it was uncommon for the OS kernel to perform a microcode upgrade, if configured to do so because a newer microcode was available than was incorporated in the firmware; Linux has functionality to do this. However I seem to recall that late (kernel boot or later) microcode application is being phased out; recent x86 CPUs want microcode updates to be completed very early, before kernel boot.

This particular issue is overhyped IMO. 64-bit UEFI mostly bypasses it. Sure, the firmware entry point has some bootstrapping to to, but this isn't a big deal.

The really weird initial state of SMM is a bigger deal since it happens at runtime.

Depends on how much of the complexity described in those thousands of pages of documentation is actually necessary, and how much could be eliminated with a better design.

Lots of it isn't so much bad design as not willing to give up backwards compatibility. Some examples:

- the old floating point register stack and its 80-bit registers

- I don't know how many iterations on SIMD instructions (MMX, a few iterations of SSE, a few iterations of AVX, various prefixes to make older instructions use newer registers)

If you got rid of those, you also could get rid of quite a few prefix instructions, maybe a configuration bit here and there, etc.

It also doesn't help that, at times, Intel and AMD independently added stuff to the x86.

ARMv7 was already approaching x86 levels of complexity. Something like 1200 instructions and 30 years of cruft.

Tried to compile it, it worked, but it segfaults on executing `./vm`

I'm guessing you meant to comment on the vm thread.

Yes, my bad.

That documentation has more letters then sunny days in Phoenix. bullshit statistic in action

The point is that this is wrong. It's hardware; hardware should be simple. It's operating systems, languages, libraries and applications that (if anything) should have the proverbial "wall of manuals", not the machine architecture.

Power on reset, shift, decode, execute, repeat.

Intel loves complexity, which is why they invented USB: another tree killer.

The processor doesn't do anything. In all that silicon and its pages of documentation, you can't even find a parser for assembly language; you need software for that.

In spite of 4000+ pages of documentation, printing "Hello, world" on a screen requires additional hardware, and a very detailed program. Want a linked list, or regex pattern matching? Not in the 4000 pages; write the code.

And this is just the architecture manuals software developers. This is not documentation of the actual silicon. What it contains:

This document contains all seven volumes of the Intel 64 and IA-32 Architectures Software Developer's Manual: Basic Architecture, Instruction Set Reference A-L, Instruction Set Reference M-Z, Instruction Set Reference, and the System Programming Guide, Parts 1, 2 and 3. Refer to all seven volumes when evaluating your design needs.

Instruction set references and system programming guide; that's it!

Note also that this is not the programming documentation for a system on a chip (SoC). There is nothing in this 4000+ page magnum opus about any peripheral. No serial ports, no real time clocks, no ethernet PHY's, no A/D D/A converters; nothing. Just CPU.

"Intel loves complexity"

Intel loves performance, because people want performance. Complexity is the cost of increased performance. As an example, I would guess that of the ~2000 pages of the instruction set reference, at a minimum 1000 pages document the various SIMD instructions. You don't need those, or the floating point operations, or SHA instructions, but I don't see any harm done by making them available.

don't you technically just need mov which is said to be turing complete?

It's not clear to me how [simple unconditional] mov could possibly do the job alone. I believe it could only work if it incorporates "magic" memory locations - e.g., storing at location x executes math combining location x and location y in some way and alters location z. This simply begs the question by moving logic behind the curtain.

I think the single instruction which can do the entire job without any magic assist is subneg x, y, z:

Subtract location x from location y; store the result in location y; and branch to location z if result is less than 0; else proceed to next.

Or various trivial variations of the same idea.

Any complication beyond this is no more than syntactical sugar and performance optimization.

Once I checked out the reference at the end of README.md, I like it. I could try to object that the "magic" has been moved into the addressing modes of the mov, but that would be a bit arbitrary.

If you focus only on direct memory addressing (no indirect or indexed), mine does still work, but mov doesn't. I think.

1000 pages comes close to ANSI Common Lisp [1994], which is still criticized by some bitter old Lisp trolls for its size.

It's just some CPU instructions!

Maybe they aren't doing a good job of describing them succinctly?

They probably aren't describing them very succinctly, because there isn't really a benefit to dropping information in the complete instruction set reference. If you don't want all of the information, look at the instruction set summary instead (~40 pages in volume 1).

These things aren't written to be brief, they are written to be complete.

> The processor doesn't do anything.

It's difficult to argue with someone starting from this level of wrong.

If you take words literally, you will find it difficult simply to converse with humans.

So what did you mean? The processor clearly does everything, so why say that it does nothing? You're railing against "complexity" that you show no sign of understanding.

Whereas a processor whose programming view can be documented in only 200 pages doesn't do everything. Gotcha!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact