
Intel x86 documentation has more pages than the 6502 has transistors - ingve
http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html
======
dpc_pw
Some comments miss the point. I don't think it's a suggestion that x86 has too
much documentation or too much transistors. It just gives a picture in to how
the hardware exploded in capacity, complexity and so on. Which is great
(though obviously comes with a cost).

~~~
Taek
I disagree that it's great. The immense complexity associated with creating a
competitve x86 processor gives Intel a huge competitive moat. You can't make a
competing processor without billions of dollars of R&D.

As a result Intel is able to deploy highly unfavorable features like the
Management Engine and consumers don't feel like they have an alternative. A
reasonable alternative like Talos costs $3700.

If startups could reasonably compete with traditional volumes of funding, we'd
probably have better hardware.

~~~
nine_k
The catch is in the economy of scale. You can't make a Power8 cost comparably
to an i7 unless you also produce it in _huge_ numbers.

ARM chips are cheap not (only) because the R&D is excessively cheap but
because the chips are produced in hundreds of millions. Were it not for the
explosion of mobile phones, they'd stay in the "nice but expensive for the
performance" niche.

~~~
Someone
Hundreds of millions is peanuts for ARM.
[https://www.arm.com](https://www.arm.com) claims _" ARM’s partners shipped
14.9 billion ARM-based chips in 2015."_

That's about two CPUs for everyone on earth and likely two more this year.

~~~
vidarh
People don't realise how many things have CPUs these days. PPC (not POWER) and
MIPS are also still within the same magnitude as x86 chips (hundreds of
millions/year) - they both used to be selling more than x86 because they were
more widespread in embedded niches (networking, automotive, set-top boxes),
but I don't know whether or not that's still true or not.

It's not economies of scale that gives Intel its sustained advantage as much
as it is _huge_ margins that allow them to continue re-investing in their
fabrication process advantage, which again protects their margins by keeping
them on top in the only segments of the CPU market that are both high price,
high margin (at east for the top SKUs) _and_ high unit.

------
ChuckMcM
It it is one of those interesting metrics, or one I came across researching
the Cortex-M which is that at modern process node geometries the CPU part of a
Cortex-M chip (not including all of the on chip peripherals, RAM, and Flash)
easily fits inside the _bond pad_ of an 8080A. As the 8080A had 40 pins that
is 40 Cortex-M CPUs in silicon that Intel "threw away" by depositing a square
of gold on the silicon.

But in terms of documentation that is related directly to the transistors, it
would be interesting to evaluate number of lines of VHDL to the number of
inferred transistors. I know you get a report after you have finished place
and route on your typical work flow but has anyone rolled that up to "2.5
lines of VHDL per 100 transistors" or something?

~~~
Keyframe
Someone here did the math before. Motorola 68k was fabricated on 3.5um node.
If it were made today with 14nm node, it (the whole 68k) would fit on an area
of a single transistor from the original 68k made with 3.5um node. That's
68,000 Motorola 68000s inside an original Motorola 68000! With newer nodes,
even more.

------
static_noise
They continue to add transistors to newer processors. It is a good sign that
someone keeps track of them and writes documentation on how to use all these
new transistors.

~~~
Cyph0n
I heard they added a couple of transistors to the latest Intel chips. One of
the transistors hasn't been documented yet, though, which is unlike Intel.

~~~
ethbro
I'm imagining the documentation for an individual transistor that extends all
the way up the stack... and thinking about Escher for some reason.

~~~
mycall
This makes me wonder how many undocumented transistors are on the chips --
there because nobody knows why.

~~~
andromeduck
Apparently Nvidia had to drop VBIOS a bit early because it started having
issues but they no longer had enough people who knew how it worked to continue
supporting the feature.

------
CalChris
So this post is supposed to show a relationship between transistors and
documentation. Fine.

x86 has 4181 pages of documentation. Quad-core Skylake has 1.75 billion
transistors. This is 418560 transistors per page.

The 6502 had 12 pages of documentation. It had 3,510 transistors. This is 292
transistors per page.

Advantage x86.

~~~
Narishma
You're comparing a single-core to a quad-core. Moreover, the Skylake is an SoC
that contains stuff other than the CPU cores.

~~~
CalChris
Yes, Skylake is superscalar, hyperthreaded, multicore, SIMD, OOO, speculative
and a few other things too.

Yes, Skylake contains GPU stuff in its transistors and also in its
documentation.

------
petercooper
I'd be keen to see a comparison of transistors counts between modern and older
processors with caches removed (I could be wrong, but I thought built-in
caches made up the majority of transistor counts nowadays). That is, how much
more complex is a single cache-less core now vs then? Not as much as the
overall transistor count would indicate, I suspect.

~~~
qb45
Annotated die photo of a single-core VIA Nano chip from some 10 years ago:

[https://en.wikipedia.org/wiki/File:VIA_Isaiah_Architecture_d...](https://en.wikipedia.org/wiki/File:VIA_Isaiah_Architecture_die_plot.jpg)

As you can see, caches are merely half of the chip and half of the rest is
about dynamic instruction reordering, branch prediction, register renaming and
stuff.

These things have pipelined execution units so they can start a new
instruction before the previous one is finished executing, enough duplication
to start executing two or three instructions per cycle (sometimes even of the
same kind, say two floating point SIMD additions) and logic to schedule
instructions wrt data dependencies, not program order, so that instructions
which need input data not yet available can wait for a few cycles while other,
later instructions are executing.

And all of this has to be done with some degree of appearance of executing
instructions serially, so if say some instruction causes a page fault and a
jump to the OS fault handling code, the CPU has to cancel all later
instructions which may have already finished executing :)

And, btw, this is not in any way specific to x86. POWER, high-end ARM, they
all do it.

~~~
72deluxe
Fascinating. Where do you learn this stuff? Any recommended reading you can
point me to as a complete novice with regard to CPU architecture?

~~~
qb45
That's somewhat hard to answer because I've been accumulating knowledge from
many sources over many years, and it started with some dead-tree book from the
'90s :)

Maybe Agner's site, in particular his microarchitecture manual, would be a
reasonable place to start:

[http://www.agner.org/optimize/](http://www.agner.org/optimize/)

There are "software optimization manuals" from CPU vendors, but these may not
be particularly novice-friendly. I think I've used Wikipedia at times for
general CPU-agnostic concepts, though it has a tendency to use jargon with
tittle explanation. Occasionally somebody submits something to HN.

On the lowest level, it may be helpful to know some simple digital circuits
(decoders, multiplexers, adders, flip-flops, ...?) just to have an idea of
what kind of things can be done in hardware.

------
tetrep
Probably be good to add (2013) to the title as that was both when it was
written and the latest x86 documentation at the time. Since then, it looks
like the manual as 4670 pages now[0], which surpasses the 6502's "all
transistors but ROM/PLA" count.

At this rate, 1419 pages over 3 years, the Intel x86 documentation's page
count will exceed all transistor counts of the 6502 around 2020.

[0]: [https://software.intel.com/en-us/articles/intel-
sdm#combined](https://software.intel.com/en-us/articles/intel-sdm#combined)

------
CamperBob2
Somebody posted a great educational video narrated by William Shatner in the
late 1970s:
[https://www.youtube.com/watch?v=VJmero_L7g0](https://www.youtube.com/watch?v=VJmero_L7g0)
(14 mins).

Shatner plays it pretty straight here, and he does a great job of making the
subject mattter interesting to audiences of the day. What's interesting about
the video now is that every time he promises us "Thousands of transistors on
the head of a pin," you can remind yourself that what we actually got was
_billions_ of transistors.

It's humbling from a software engineering perspective to contemplate how
poorly we're taking advantage of the semiconductor industry's Promethean gift.
My computer still looks a lot like the Apple IIs and TRS-80s did in that
video, and the same is true for my workflow.

------
the_duke
Now just imagine how many tens of thousands of pages Intel has on internal
documentation...

~~~
bonzini
And still no one knows why x86 control registers are CR0, CR2 and CR3. What
was CR1 supposed to be used for?!?

(This is actually true. I asked x86 architects when I met some).

~~~
Someone
According to
[http://www.pagetable.com/?p=364](http://www.pagetable.com/?p=364) (which also
shows part of the reason the x86 needs so much documentation, by the way), it
started its life "reserved", and never got a real role in life.

~~~
bonzini
Yes, but: "instead of overflowing the new bits into CR1, Intel decided to skip
it and open up CR4 instead – for unknown reasons."

~~~
vidarh
Sounds like someone took "reserved" a bit too seriously and/or couldn't figure
out who had marked it reserved or why, and decided the latter was the safer
option.

------
Waterluvian
Maybe I'm naive, but it seems incredibly impressive that the 6502 only has
that many transistors!

~~~
bigiain
<showing my age here>

When I remember all the stuff you could do on an AppleII (and a BBC Micro)
with their tiny four odd thousand transistor cpus and 4 whole KB of ram - and
consider how much time I spend waiting for this laptop with it's billion-or-so
transistor cpu and 16 GB of ram - it's almost enough to make you weep about
the profligate waste of resources of the entire software engineering
profession... ;-)

~~~
72deluxe
I sometimes think that. I am amazed at when I look back at the BBC Micro and
think of the software I used to run on there, and how they did it with such
tiny amounts of RAM.

I do realise that the last 20 years of GUI progress has stalled and that you
could take a Mac from yesteryear or PC from ~1991 and know your way around it
without any trouble at all.

Of course software development strategies have changed and languages now let
us express ourselves in previously unimaginable ways, but we've come so far
and not far at all.

I am particularly struck with the craze over the last 5+ years with regard of
"cloud" and shoving data to the other side of the world, particularly given
the microcomputer revolution and the lack of need to shove your data
elsewhere. That's what the microcomputer is for!

------
en4bz
I would say that roughly 25% of the documentation applies to ancient modes of
operation like real mode and protected mode. Unless you REALLY need to know
the fine details of these modes you can skip right to the long mode stuff.

~~~
johncolanduoni
Windows still uses a lot of programs in emulated protected mode, so it's
pretty relevant still.

------
rasz_pl
Does that include all the secret documentation for stuff like LOADALL, ICEBP
etc?

[http://www.drdobbs.com/undocumented-
corner/184410285](http://www.drdobbs.com/undocumented-corner/184410285)

[http://www.rcollins.org/articles/loadall/tspec_a3_doc.html](http://www.rcollins.org/articles/loadall/tspec_a3_doc.html)

You will love this paragraph:

"Unlike the 286 LOADALL, the 386 LOADALL is still an Intel top secret. l do
not know of any document that describes its use, format, or acknowledges its
existence. Very few people at Intel wil1 acknowledge that LOADALL even exists
in the 80386 mask. The official Intel line is that, due to U.S. Military
pressure, LOADALL was removed from the 80386 mask over a year ago. However,
running the program in Listing-2 demonstrates that LOADALL is alive, well, and
still available on the latest stepping of the 80386."

Just imagine whats in Intel chips now due to NSA pressure :/

~~~
qb45
Not sure what to love here, it's a debug feature which, according to your
source, Intel promised the US Mil to remove for some reasons but ultimately
didn't.

There certainly are undocumented debug facilities in modern CPUs. For one
example, the leaked Socket AM3 datasheet clearly shows a JTAG interface,
though I don't know if it's operational in production silicon.

Hopefully, debug capabilities cannot be used to pwn the CPU from unprivileged
code without external debug hardware which could pwn the CPU anyway by itself.
It's not even clear if they are enabled in production chips at all.

LOADALL for example worked only in RING0 and got ultimately removed early in
the 486 days so it seems Intel cared about security somewhat (and probably
also about future compatibility, to be honest, it's not fun when software
relies on features you want to change in the next generation).

Nowadays they should care even more - if software backdoors were available and
leaked to the public, the magnitude of shit happening in all those cloud
companies would be monumental.

~~~
rasz_pl
> if software backdoors were available and leaked to the public

[https://www.blackhat.com/us-15/briefings.html#the-memory-
sin...](https://www.blackhat.com/us-15/briefings.html#the-memory-sinkhole-
unleashing-an-x86-design-flaw-allowing-universal-privilege-escalation)

conveniently "discovered" by a 3 letter agency favorite principle contractor
(Batelle Memorial Institute - have fun researching them) employee just after
everybody switched to the next(fixed) cpu generation.

~~~
qb45
I doubt that this can be used for VM escape because it requires access to the
physical LAPIC and afaik hypervisors wouldn't allow VMs to touch this.

It also doesn't work from userspace so pretty much all you can do with it is
hacking SMM from a kernel running on the bare metal. Maybe useful for
rootkits, but truth be told 3LAs seem to have no problem making non-SMM
malware undetectable by commercial AVs. See stuxnet :)

> conveniently "discovered" by a 3 letter agency favorite principle contractor

Not sure what you are alluding to. 3LAs wouldn't want this to be known if it
was their job, methinks.

~~~
redblacktree
I think he's saying that the 3LAs knew about it for a long time, but publicly
"discovered" the flaw when it was no longer useful to them (after everyone had
upgraded)

------
yuhong
It is probably not difficult to create a new x86 version that is user mode
compatible with most modern programs but lacks things like segmentation and
real mode. New OS versions would be required, but most modern user mode
programs would work with few if any modifications.

~~~
vidarh
Things like Linux should work fine with a CPU that strips 16 bit mode entirely
(32 bit too, possibly? not sure) as long as you have a BIOS / boot loader that
can handle it and - as of when I last looked at the Linux kernel
initialisation code over a decade ago - change / strip out a handful of lines
that took care of changing the mode.

It'd be interesting, but I don't think it'd save all that much unless you
strip 32 bit compatibility as well, and even then it might be less than you
think or they probably would have tried to see if the market would want it...

~~~
yuhong
A lot of microcode is about things like segmentation and TSS.

------
hota_mazi
Somebody needs to come up with a law that correlates the size of a processor's
documentation to the number of transistors on that processor.

~~~
andrewbinstock
Mark Papermaster[0] should formulate that law.

[0]
[https://en.wikipedia.org/wiki/Mark_Papermaster](https://en.wikipedia.org/wiki/Mark_Papermaster)

------
amelius
Moore's law for documentation?

~~~
Aardwolf
Not sure about that one, seems like an inverse law to me, computer games used
to come with big booklets with documentation and backstory, now nothing (other
than user made wikis of course).

Or a mobile dumbphone came with a manual explaining all the menus and options.
Now the only paperwork with a smartphone is legal and warranty.

~~~
wott
Oh, I swear it is not inverse for chip documentation (unless it's from a
Chinese manufacturer, you have to do with a 2-page leaflet for a 80-pin chip
in that case). But it doesn't necessarily mean it is exhaustive high quality
doc.

First, we have to acknowledge that most texts (if works for law too) are very
diluted now compared to a few decades. There is a lot of blah-blah that
doesn't bring information. Information density decreased.

Then, there are docs that are so big (many many thousands pages), that I am
sure no editor can read them fully. They pile up copy-paste from older or
similar models without checking if it applies to the chip. They don't write a
clean doc specifically for the chip. So as a user you can trash parts of the
doc. Problem is that you don't know which ones.

Since they don't print manuals any more, they don't have to care about fitting
the doc in the book, it's no-limit.

------
jianina
How can i order these books

~~~
user5994461
You can download them from intel website for free.

Source: I've got the x86 and x64 manual instructions set from there, which is
thousands of pages in PDF. Rootkits ain't gonna write themselves =)

------
sebcat
If anyone's interested in the 6502, or CPU design in general, this is a very
good simulator:
[http://visual6502.org/JSSim/index.html](http://visual6502.org/JSSim/index.html)

~~~
72deluxe
Thanks for this link. I have no idea what it's doing but it is an interesting
start!

------
mwcampbell
Is the situation appreciably better with ARMv8? How about ARMv7?

~~~
static_noise
Is the situation with x86 bad to begin with?

Granted, I didn't read the full documentation provided online for my hardware
before I powered it on. Honestly, I didn't read any documentation and it just
works, kind of.

~~~
Kubuxu
To show complexity of x86-64 it is best to look at boot process. You processor
starts in 16bit mode, then is upgraded to 32bit and then to 64bit mode. You
want to do some call to BIOS now? You have to downgrade through 32bit mode to
16 bit mode to do that and then back up to handle the response.

And it is just very small component of cruft that x86-64 has.

~~~
hlandau
The rate at which the complexity of the amd64 boot process is increasing is
quite alarming.

UEFI is an overcomplicated, buggy monstrosity, but that's just the tail end of
the "boot process". Nowadays, to get an x86 CPU to execute a single opcode,
you need to have a Management Engine (or Platform Security Processor, in AMD-
speak) firmware blob resident in the firmware flash chip. More modern CPUs,
for Intel, say, oblige you to use Intel-provided "memory reference code" and
other "firmware support package" blobs just to initialize the CPU in the early
stage. AFAIK, Intel isn't even bothering to document the details of its CPU
and chipset initialization sequences anymore, in favour of just making people
use unexplained blobs. These are just some of the issues the coreboot project
is having to deal with. It really feels like at least in the world of x86, the
window is rapidly closing on projects like coreboot being able to accomplish
anything useful, although there are at least some major users like
Chromebooks.

And then of course we have things like SMM, and the way in which secure
firmware updates are facilitated (which relies on things like flash write
protect functionality)...

~~~
djsumdog
Those blobs are run by the BIOS/UEFI correct? Like Grub/the Linux kernel don't
need those Intel blobs just to get booting do they?

~~~
hlandau
It's correct that these blobs are loaded way before GRUB or a Linux kernel
gets booted. To be precise they are part of the firmware image; UEFI refers to
a boot protocol specification. So for example with coreboot, you can select
one of many "payloads". Payloads include UEFI boot, MBR boot, etc. So it's
probably best to distinguish between the boot protocol and the firmware
package as a whole.

The ME firmware is loaded by the CPU itself before anything begins executing;
there's a header in the firmware image stored on the CPU to let the CPU find
it. These are cryptographically signed, so all projects like Coreboot can do
is incorporate the binaries provided by Intel.

The MRC/FSP blobs are executed by the x86 firmware, they're x86 code which
runs very early. Theoretically projects like Coreboot could replace these
blobs with their own code, but it would require reverse engineering these
blobs to figure out what they're doing. The fact that this would be a major
effort is a testiment to the complexity of the initialization routines
implemented in these blobs.

The order is basically something along the lines of:

1\. CPU loads ME firmware, verifies signature, starts it running on the ME
coprocessor.

2\. First x86 opcode is executed; this is part of the 3rd party firmware
(Coreboot, AMI, etc.)

3\. The 3rd party firmware will probably start by executing the Intel MRC/FSP
blob. (Possibly this blob even expects to be the reset vector now, wouldn't
surprise me; I'm not an expert on this.)

4\. The memory controllers/chipset/etc. are now setup. The 3rd party firmware
can do what it likes at this point.

5\. Typically, firmware will implement a standard boot protocol like MBR boot
or UEFI boot. Coreboot executes a payload at this stage.

I should add that microcode is another (signed, encrypted) blob. Modern x86
CPUs are so buggy out of the factory that they're often unable to even boot an
OS unless a microcode upgrade is applied, so 3rd party firmware often performs
a microcode upgrade before booting. Historically I don't believe it was
uncommon for the OS kernel to perform a microcode upgrade, if configured to do
so because a newer microcode was available than was incorporated in the
firmware; Linux has functionality to do this. However I seem to recall that
late (kernel boot or later) microcode application is being phased out; recent
x86 CPUs want microcode updates to be completed very early, before kernel
boot.

------
_RPM
Tried to compile it, it worked, but it segfaults on executing `./vm`

~~~
andars
I'm guessing you meant to comment on the vm thread.

~~~
_RPM
Yes, my bad.

------
husky_voice
That documentation has more letters then sunny days in Phoenix. _bullshit
statistic in action_

------
kazinator
The point is that this is wrong. It's _hardware_ ; hardware should be simple.
It's operating systems, languages, libraries and applications that (if
anything) should have the proverbial "wall of manuals", not the machine
architecture.

Power on reset, shift, decode, execute, repeat.

Intel loves complexity, which is why they invented USB: another tree killer.

The processor doesn't do anything. In all that silicon and its pages of
documentation, you can't even find a parser for assembly language; you need
software for that.

In spite of 4000+ pages of documentation, printing "Hello, world" on a screen
requires additional hardware, and a very detailed program. Want a linked list,
or regex pattern matching? Not in the 4000 pages; write the code.

And this is just the architecture manuals _software developers_. This is not
documentation of the actual silicon. What it contains:

 _This document contains all seven volumes of the Intel 64 and IA-32
Architectures Software Developer 's Manual: Basic Architecture, Instruction
Set Reference A-L, Instruction Set Reference M-Z, Instruction Set Reference,
and the System Programming Guide, Parts 1, 2 and 3. Refer to all seven volumes
when evaluating your design needs._

Instruction set references and system programming guide; that's it!

Note also that this is not the programming documentation for a system on a
chip (SoC). There is nothing in this 4000+ page _magnum opus_ about any
peripheral. No serial ports, no real time clocks, no ethernet PHY's, no A/D
D/A converters; nothing. Just CPU.

~~~
andars
"Intel loves complexity"

Intel loves performance, because people want performance. Complexity is the
cost of increased performance. As an example, I would guess that of the ~2000
pages of the instruction set reference, at a _minimum_ 1000 pages document the
various SIMD instructions. You don't need those, or the floating point
operations, or SHA instructions, but I don't see any harm done by making them
available.

~~~
flamedoge
don't you technically just need mov which is said to be turing complete?

~~~
fnj
It's not clear to me how [simple unconditional] mov could possibly do the job
alone. I believe it could only work if it incorporates "magic" memory
locations - e.g., storing at location x executes math combining location x and
location y in some way and alters location z. This simply begs the question by
moving logic behind the curtain.

I think the single instruction which can do the entire job without any magic
assist is _subneg x, y, z_ :

Subtract location x from location y; store the result in location y; and
branch to location z if result is less than 0; else proceed to next.

Or various trivial variations of the same idea.

Any complication beyond this is no more than syntactical sugar and performance
optimization.

~~~
pjc50
See
[https://github.com/xoreaxeaxeax/movfuscator](https://github.com/xoreaxeaxeax/movfuscator)

~~~
fnj
Once I checked out the reference at the end of README.md, I like it. I could
try to object that the "magic" has been moved into the addressing modes of the
mov, but that would be a bit arbitrary.

If you focus only on direct memory addressing (no indirect or indexed), mine
does still work, but mov doesn't. I _think_.

