
X86 Register Encoding - eklitzke
https://eklitzke.org/x86-register-encoding
======
pcwalton
LLVM doesn't actually prefer the lower registers when doing register
allocation. IIRC, sunfish told me that GCC doesn't either. It would be
interesting to add features that try to minimize code size to the register
allocator, but no compiler I know of actually does this.

Partially as a consequence of this, the REX prefixes take up a lot of space in
most x86-64 instruction streams. In fact, the average size of each instruction
is almost exactly 4 bytes, exactly the same as in classic 32-bit RISC
architectures. (This is why I dislike it when people link to that old Linus
post about how x86 is better than RISC architectures because of code size; it
may have been true then, but not now.)

~~~
mikeash
Wouldn't you also have to look at the average number of instructions, not just
the average instruction size? x86 instructions tend to do more, too.

~~~
pcwalton
The binary sizes are also similar when I last measured. Keep in mind that (a)
ARM and AArch64 have quite a few addressing modes as well; (b) ARM has things
like LDMIA/STMDB and AArch64's LDP/STP that compress function prologs and
epilogs; (c) more registers means you don't have to spill as much; (d) three
address instructions are often more compact than a MOV plus a two address
instruction, which helps with typical register allocation algorithms.

~~~
mikeash
Thanks for the info, to you and all the other replies!

------
stephencanon
N.B. the "Volatile?" column is specific to the _Windows_ calling conventions.
Under the Sys V calling conventions (i.e. what the world outside of Redmond
uses), RDI and RSI are volatile (and used for passing the first two integer
arguments).

------
pklausler
The OP perpetuates the mistaken assumption that x86-64 looks as it does
because it extends good ol' 32-bit x86 encodings, which one might assume still
work so well that one could run 32-bit code in 64-bit mode and have it still
work.

Which is not the case at all. Those REX prefix bytes used to be perfectly good
32-bit x86 instructions that now simply don't work in 64-bit mode with their
original encodings. So the "compatibility" between 32-bit and 64-bit modes is
mythical -- the Opteron could have had a nice shiny new 64-bit programming
model that was far less confusing than the dog's breakfast that is x86-64, but
just didn't.

~~~
ant6n
They probably wanted to re-use most of the x86-32 decoder that they had on the
chip anyway.

Kind of strange, nowadays, an embedded armv8 cpu will come with multiple
decoders (ARM32, thumb2, AARCH64).

~~~
lisper
This. In today's world it is easy to support multiple instruction sets in the
same silicon using the same registers and ALU. So why in the name of all that
is holy does the x86 not have a nice clean 64-bit instruction set that you can
swap in? It's all RISC under the hood anyway, why not give us access to it?
The only explanation I can think of is that Intel _wants_ to keep things
complicated as a barrier to entry for competition.

~~~
pcwalton
> It's all RISC under the hood anyway, why not give us access to it?

Presumably because it doesn't matter that much, so it isn't worth the
investment for Intel.

It feels to me like people have trouble accepting that both of these are true:
(1) RISC vs. x86 doesn't matter that much in practice; (2) technically
speaking, RISC is a superior design to x86.

~~~
nullc
> Presumably because it doesn't matter that much, so it isn't worth the
> investment for Intel.

Intel invested in a new instruction set (IA64). AMD did not.

The market chose AMD's offering.

~~~
pcwalton
And the market also chose ARM's AArch64 instruction set, despite the ARM
installed base being wider than x86 _and_ a backwards incompatible break from
the past. The secret? Supporting both ISAs and switching instruction sets on
exceptions.

The lesson I take from history is that there is no need to maintain ISA
compatibility as long as the old mode is accessible on a process level.

~~~
imtringued
>The secret? Supporting both ISAs and switching instruction sets on
exceptions.

The secret is that Apple controls their hardware/appstore and android apps are
ISA independent.

~~~
adrianratnapala
Whot?

Didn't the AMD64 architecture spank Itanium in the marketplace long before the
iPhone became a big deal?

------
alexanderstocko
The REX prefix:
[http://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix](http://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix)

------
bogomipz
I had a question about this statement in the artcile:

"The C calling convention on x86 systems specifies that callees need to save
certain registers."

Practically speaking is this the prologue that that C run time - crt0.o
provides automatically/implicitly?

~~~
aji
no, the compiler does this when generating code for functions and function
calls

crt0.o is the glue between how the kernel loads a program into memory and how
main() expects things to work. it does basic setup tasks that can vary between
platforms, but is generally things like collecting command line arguments and
setting up the stack. it will also invoke exit() if main() returns, since
that's how the kernel expects the process to be destroyed

~~~
bogomipz
Thanks for the response.

When you say "how the kernel loads a program into memory" I assume you are
referring to ld-linux.so.2? Is that correct?

I imagine then that ld-linux.so.2 calls __start in crt0.o and crt0.o jumps to
main(). Is this correct?

~~~
cnvogel
[http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.g...](http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/binfmt_elf.c)
load_elf_binary()

maps the executable to memory, and _if_ an elf interpreter (/lib/ld-linux.so)
is specified (which it normally is) also the elf interpreter.

It then jumps to the entry point of the raw binary or of the elf interpreter.

~~~
bogomipz
Thanks, right, if there is a PT_INTERP header in the binary that specifies ELF
then load_elf_binary() will be called.

Its easy to sometimes conceptually think or talk about "a loader" as if its
some standalone entity. However this can be misleading(at least to me anyway.)
Because in reality its linked into every dynamically-linked binary, along with
crt0.o which as the other poster mentioned takes care of some ABI requirements
and the setting up of the stack.

For anyone else who might be interested this also an illuminating source code
file to read, libc-start. Which would be part of crt0.o

[http://repo.or.cz/glibc.git/blob/HEAD:/csu/libc-
start.c#l105](http://repo.or.cz/glibc.git/blob/HEAD:/csu/libc-start.c#l105)

I don't have to think this low level on any kind of regular basis but its a
great thought exercise to do so from time to time.

