
Assembly Is Too High Level: SIB Doubles - ingve
http://xlogicx.net/?p=456
======
__michaelg
The cool thing about the address calculation is that there is a LEA
instruction (load effective address), which gives you the resulting address.
I.e. instead of actually loading the value from the calculated address in
memory, it gives you the address itself without accessing memory. This allows
you to do "complex" calculations in just one instruction. This doesn't really
help performance-wise in modern CPUs, but it used to in the old 386 days.

~~~
igravious
Yes, a lot of these let's call them `assembler hacks' were true at one time in
the x86 32 bit era but are probably not true now. Also, the x86 processor
market is quite big now from Pentiums and Core Ms and Core Is and AMD and god
knows who else. There used to be assembler books that could teach you all this
in the 32 bit era, do these exist for the 64 bit era. It's been a while since
I did any low-level bit twiddling.

~~~
_wmd
Agner Fog's PDFs are the closest thing I know of

------
raimue
I would say you cannot deduce much from that micro-benchmark. On x86, loops
depend on the alignment of the labels and branch instructions due to the
branch prediction logic. You can actually make some loops faster by inserting
nops inside the loop.

Since the result of the xor is never read and immediately overwritten by the
following mov, I wonder if clever register renaming could actually detect that
and discard the instruction.

~~~
acqq
Yes, I'm also suspicious about his claims of "this is faster than that." He
doesn't demonstrate that the actually know how something like this should be
correctly measured.

------
al2o3cr
My first thought was: is this a caching / instruction-fetch artifact? The two
instruction sequences generate blocks of 8 bytes (for the eax+eax version) vs
11 bytes (for the 2*eax +0 version).

~~~
cfallin
That's just about the only possibility, I think -- both forms should generate
the same sequence of uops.

------
igravious
Could someone inform an old fogey if "super ignorant" (down at the end of the
article) means that something is actually very good?

Oh, and cool article btw. I think I have a hazy recollection from a long long
time ago Microsoft assembler would do the same -- as in pick the smallest
machine opcode in the case of multiple assembler possibilities.

~~~
Kristine1975
A lot of assemblers did that. Which is why in self-modifying code you inserted
an instruction to load a large integer constant (e.g. 0xDEADBEEF) that the
assembler could not optimize into a smaller instruction. Then at runtime you
would replace that integer constant with whatever value, without the risk of
overwriting part of the next instruction.

Those were the days...

~~~
0xdeadbeefbabe
Indeed. The bubbles virus really confused me the first time I stepped through
a self modifying section using dos's debug program.

------
PaulHoule
This guy needs an ontology.

