
ARM Assembly Is Too High Level: ROR and RRX - ingve
http://xlogicx.net/?p=673
======
jlarcombe
I always thought this was quite elegant; in the original ARM ISA most ALU
operations and register moves (and even some load/store indexing) could pass
through the barrel shifter at no extra cost, so a 'ROR' was just a move from a
register to itself with a pass through the shifter. This made up (somewhat)
for the low code density implied by fixed-length instructions and uniform
load/store architecture. AArch64 removes this capability from most of the
arithmetic operations I believe.

The design of the orginal ARM ISA is very interesting historically, mainly
informed by many man years of hand-optimising 6502. As such it was quite
idiosyncratic, nice to write by hand, and rather awkward for compilers...

~~~
ajross
> could pass through the barrel shifter at no extra cost

There was always a cost. You had to spend those bits in the instruction
encoding that could have been used for more registes, more instructions, more
operands, or (the x86 choice) the ability to fit more (smaller) instructions
in the instruction cache. The existence of this extra thing in the instruction
data path meant you needed an extra cycle or two in the execute pipeline for
every instruction, not just the ones with shifts. You also had to have the
single-cycle barrel shifter implemented in hardware (this is something that
smaller microcontrollers used to skip).

In fact that weird ARM shift field is broadly held to have been a mistake.
Note that A64 skips it.

~~~
jlarcombe
Well, yeah, run-time cost, I meant. Originally this was nil (unless you had a
register-specified shift) on the 'classic' 3-stage pipeline in the original
ARM (1/2/3/6/7). As the pipeline got deeper this was more problematic, as you
say, and higher frequencies make the silicon implementation awkward too. But
even in A64, the barrel shifter is available on the second register operand of
the non-arithmetic data processing operations.

------
XlogicX
To let everyone in on a little !secret, the 'objections' to these encodings is
satire, something that was laid on a bit thick at the end of this post. There
have been no real objections to ARM so far. x86 is a different story, the
AAD/AAM instructions being the biggest example. In that case, being able to do
something at the machine level that the assembly level abstracts away
(converting bases other than base 10). Regardless of any kind of usefulness,
any non 1-to-1 mappings between abstractions highly interest me.

~~~
ajross
The BCD instructions aren't "too high level", these are[1] real hardware
operations that had real utility to real problems. In the late 70's, the
modular math required to format decimal numbers for display could be a big
chunk of your ROM budget, and these instructions eliminated the problem.

This is like saying SSE is "too high level" because you could just do all the
operations independently with scalar math.

[1] Were, anyway. They're surely microcoded on modern processors.

~~~
XlogicX
I have no objection to the utility of something like AAD. What I'm saying is
that this same very instruction can do more at a machine level. AAD assembles
to D5 0A, even though D5 is the part that refers to AAD, 0A is hardcoded for
base 10. One could machine code something like D5 08 (to have base 8
conversions). You can really do just about any base. Even the Intel manual
states you can do this, you just have to do it at the machine code level, you
can't do it with the assembly level instruction of AAD (it's too high level or
abstracted). This is all I meant by my comment.

------
pharrington
ROR r0, #0

0 isn't permitted with ROR.

[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cjacbgca.html)

xlogicx should read the manual one more time.

~~~
XlogicX
lol "isn't permitted", "unsupported", "undefined", are all trigger words for
me; when I see them, it's the only thing I can think of doing. I get that more
than 90% of the time something fucky is going to happen, doesn't stop me from
wanting to know exactly what will happen. And sometimes, rarely, something
really cool happens. In this case, doing ROR r0, #0 is just useless (as
documented), and my 'objections' to it are satire. With that context, my
'rant' at the end of the blog should be more clear. And here I thought my
satire was obvious. That said, can't say I don't love the serious technical
discussion in these comments ;)

~~~
TFortunato
Hah! Glad to know I'm not the only one who reads "undefined" as "why don't you
try it and see ;-)"

I have to be a bit more careful though with this one lately, when I'm working
on industrial robots vs. the embedded boards I'm used to!

------
andybak
> Note: If you prefer video format to reading stuff, there’s a companion video
> for this

Thanks. I much prefer this format and can't really comprehend why anyone would
prefer a video to a well laid out and illustrated blog post.

However your font size is pretty uncomfortable to read for me.

~~~
make3
ADD

~~~
MisterTea
I have ADD and video has its place, just not in this case.

A technical article such as this one is ideally disseminated using a regular
web page. Pictures, text, code all in easy view and scrolled conveniently at
will.

Compare that to a video which in essence is an auto-scrolling page that can
only be paused or slightly slowed down. You wind up pausing, rewinding and
skipping parts. Annoying.

------
simias
I don't quite understand the objection here, this is fairly elegant in my
opinion, you have a single opcode used to encode several operations by using
"special cases". In my experience it's rather common in RISC IAs.

If the author doesn't like this they shouldn't look into MIPS because it goes
well beyond that. You see, MIPS has a special "R0" register that's always 0
(AARCH 64 does as well by the way) so you can always use it as a placeholder
in other instructions.

As such, there's no real MOVE instruction, it's just an assembler mnemonic
that assembles down to `OR $target, $src, $R0`. NOP? It's by convention `SLL
$R0, $R0, 0` (which has the nice property of being an instruction encoding as
"0x00000000"). You want to negate a number? `SUB $target, $R0, $src".

Since all instructions are 32bit wide you can't load a 32bit immediate value
in a single instruction, instead the assembler's "LI" mnemonic generates a
pair of instructions (LUI/ORI) for large immediate values (ARM prefers PC-
relative loads).

You have a whole bunch of mnemonics in MIPS that are just aliases around other
instructions. I always thought it was pretty clever.

In summary you can have this assembler listing:

    
    
        1:
            sll $0, $0, 0
            or  $t0, $t1, $0
            li  $t0, 0xabcdef
            sub $t0, $0, $t1
            j   1b
    

That will disassemble to:

    
    
            nop
            move    t0,t1
            lui     t0,0xab
            ori     t0,t0,0xcdef
            j       0x0
            neg     t0,t1
    

The only operation here that I would qualify as "high level" is the reordering
of the "neg" instruction into the delay slot (note that j is no longer the
last instruction). Everything else is very straightforward substitution and if
the assembler didn't support these mnemonics we could implement them with very
trivial macros.

Note that even x86 assemblers do that to some extent, for instance "nop"
assembles down to an instruction with no side effect (typically `xchg eax,
eax`). Furthermore there are a bunch of mnemonics for the same encoding, for
instance JAE (jump if above or equal), JNB (jump if not below) and JNC (jump
if not carry). Overall instruction encoding is also massively more complicated
in x86 (and even more for amd64) so the assembler needs to handle many more
corner cases than the simple substitutions of ARM and MIPS. As a brain teaser,
consider the following similar looking amd64 instructions that load the 32bit
value pointed at by a register (the only difference is that the first one
dereferences the pointer in %rax, the second in %r12):

    
    
            mov %eax, (%rax) ; assembles to 89 00
            mov %eax, (%r12) ; assembles to 41 89 04 24
    

I can't even be bothered to walk you through this but basically it has to do
with the fact that %r12 happens to be encoded as %rsp + 8 (because registers
r8 to r15 are effectively a hack since x86 only supported 8 GPRs) and %rsp has
special semantics in this addressing mode which mandate a different, longer
encoding otherwise you end up with an ambiguous instruction.

Yeah, I think in retrospect we can give ARM a pass for their ROR shenanigans.

~~~
megaremote
> I don't quite understand the objection here, this is fairly elegant in my
> opinion, you have a single opcode used to encode several operations by using
> "special cases".

Sounds like CISC.

~~~
kccqzy
I don't think this characterizes an instruction set as CISC at all. In any
case, having those "special cases" means that if an operation can be subsumed
by another operation, the former is just an alias of the latter on the
instruction encoding level, thereby reducing the actual number of
instructions. Think of it as syntactic sugar.

I still find this classic to be the best explanation of the technical
characteristics of CISC/RISC:
[https://userpages.umbc.edu/~vijay/mashey.on.risc.html](https://userpages.umbc.edu/~vijay/mashey.on.risc.html)

Here's a short summary:

Most RISCs:

\- Have 1 size of instruction in an instruction stream

\- And that size is 4 bytes

\- Have a handful (1-4) addressing modes

\- Have NO indirect addressing in any form (i.e., where you need one memory
access to get the address of another operand in memory)

\- Have NO operations that combine load/store with arithmetic, i.e., like add
from memory, or add to memory.

\- Have no more than 1 memory-addressed operand per instruction

\- Do NOT support arbitrary alignment of data for loads/stores

\- Use an MMU for a data address no more than once per instruction

\- Have >= 5 bits per integer register specifier

\- Have >= 4 bits per FP register specifier

~~~
simias
IMO RISC is more a philosophy than a technical term, your definition is
something that was created post-facto to try and come with a definition. It's
more like "all currently accepted RISC IAs have the following characteristics"
but I disagree that they're an appropriate definition. For instance:

>Have 1 size of instruction in an instruction stream and that size if 4 bytes

So that means that Thumb isn't RISC because it has 16 bits instructions and a
few double-width opcodes? Even though its instruction set if effectively even
more restricted than ARM? That doesn't make sense to me.

>Do NOT support arbitrary alignment of data for loads/stores

MIPS has SWL/SWR LWL/LWR, does that count? I suppose you could say that RISC
has no support for arbitrary alignment in regular load and store instructions
but again, is that really enough to disqualify an IA? What if I made a tweaked
MIPS CPU with an identical instruction set with the only difference being that
unaligned LW/SW would work as intended instead of raising an exception, would
it stop being RISC?

>Have >= 5 bits per integer register specifier, Have >= 4 bits per FP register
specifier

That actually disqualifies ARM32 as far as I can tell, since it only has
16GPRs encoded using 4 bits. I fail to see how this small encoding detail is
relevant to RISC anyway. Maybe it just meas that you need at least 32GPRs?

Wikipedia has a much broader (and IMO more reasonable) definition of RISC:

>Various suggestions have been made regarding a precise definition of RISC,
but the general concept is that such a computer has a small set of simple and
general instructions, rather than a large set of complex and specialized
instructions.

By this definition an instruction such as "Floating-point Javascript Convert
to Signed fixed-point, rounding toward Zero" is very much un-risc-y.

------
pdw
PowerPC assembly is going to blow this guy's mind.

~~~
bopbop
Do you have an online reference for that?

I tried this IBM one but it has a terrible clickthrough to attempt to avoid
GDPR:

[https://www.ibm.com/developerworks/library/l-ppc/index.html](https://www.ibm.com/developerworks/library/l-ppc/index.html)

And, as seems to pretty much always be the case, the wikibooks looks promising
but then appears to be empty:

[https://en.wikibooks.org/wiki/PowerPC_Assembly/Instructions](https://en.wikibooks.org/wiki/PowerPC_Assembly/Instructions)

This one seemed pretty good:

[https://www.cs.uaf.edu/2011/fall/cs301/lecture/11_21_PowerPC...](https://www.cs.uaf.edu/2011/fall/cs301/lecture/11_21_PowerPC.html)

~~~
dasmoth
If you're looking for a reasonably readable summary, there was a fair amount
on Raymond Chen's blog recently. Part one (of about 15 IIRC) here:

[https://blogs.msdn.microsoft.com/oldnewthing/20180806-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20180806-00/?p=99425)

------
codeulike
Would I be right in thinking 'Assembly is Too High Level' is a kindof title or
catchphrase for a series of blog posts, and that the actual article is just an
analysis of how those instructions work?

~~~
XlogicX
The main (actual) theme of the series is focusing on non 1-to-1 mappings, for
any reason (useful or not). It's a thing that fascinates me. The phrase "$x is
too high level" is mostly satirical, a phrase I've used for more than just
assembly language ($x = [asm, regex, scapy, inflate/deflate, zip, elf,
burritos, etc...]).

------
Senderman
I've gotten lost in the 'assembly is too high level' series on this blog
recently and it was very enjoyable. Also learned some things I didn't know.

