
ARM immediate value encoding - JoshTriplett
http://alisdair.mcdiarmid.org/arm-immediate-value-encoding/
======
joosters
_ARM, like other RISC architectures MIPS and PowerPC, has a fixed instruction
size of 32 bits. This is a good design decision, and it makes instruction
decode and pipeline management much easier than with the variable instruction
size of x86 or 680x0._

Actually, it turned out not to be that good a decision. The generated code is
far less compact than x86, and this has costs which outweigh the advantage of
simpler instruction decode. CPU speeds increased far faster than memory, so
the costs of storing and fetching more instruction data grew larger and
larger.

As a result, ARM experimented with other more compact encoding schemes, e.g.
Thumb.

~~~
hga
Indeed. The rise and fall of RISC designs coincides with a period where CPU
and memory speeds were much better matched, and there was an _extreme_ premium
on the number of gates consumed by your design, to fit it all in a single
economically manufacturable (i.e. small) silicon die.

I say "fall" because ARM had an additional advantage, the market which they
were targeting couldn't afford ceramic packages, only plastic, and they didn't
have the simulation capability to confirm their 2nd design would hit their
power dissipation target (they were in fact using chips from their 1st
design), so they were _very_ conservative and hit their target with a very
large margin to spare, maybe a factor 10?

So with that extra advantage, and a good business model (better than any
competitor at the critical times? Certainly MIPS), they got more and more
design wins where power was a consideration. In 2001 at Lucent I worked on a
power constrained monster board that provisioned something like 300 modem
lines, it had specialized ADI chips to do the heavy lifting, they were
controlled by a bunch of ARM chips, and it had one housekeeping MIPS chip that
was standard for these boards. And they pretty much own the mobile market that
isn't tied to x86 by an existing software base, or that needs more horsepower
than they're currently offering.

~~~
danellis
I remember Steve Furber told us in CS that the first time he tested the
original ARM they got back from the fab, they forgot to connect the power
lines. It worked anyway, just from the power on the I/O lines.

~~~
hga
Hah! No need to meter the power used to know they'd achieved their goal by a
large margin.

I found the interview of him I was making my comments from, here's the
critical 2 paragraphs
([http://queue.acm.org/detail.cfm?id=1716385](http://queue.acm.org/detail.cfm?id=1716385)):

 _SF The ARM was conceived as a processor for a tethered desktop computer,
where ultimate low power was not a requirement. We wanted to keep it low cost,
however, and at that time, keeping the chip low cost meant ensuring it would
go in low-cost packaging, which meant plastic. In order to use plastic
packaging, we had to keep the power dissipation below a watt—that was a hard
limit. Anything above a watt would make the plastic packaging unsuitable, and
the package would cost more than the chip itself.

We didn't have particularly good or dependable power-analysis tools in those
days; they were all a bit approximate. We applied Victorian engineering
margins, and in designing to ensure it came out under a watt, we missed, and
it came out under a tenth of a watt—really low power._

------
hrydgard
This is classic 32-bit ARM immediates.

Modern 32-bit arm also have movw and movt, which load the bottom and top 16
bits of a register. Combining the two you can load any 32-bit value in two
instructions. Classic 8-bit+rotation immediates are still used for arithmetic
immediates though.

ARM64 is different. For arithmetic, you get a 12-bit value with an optional
left-shift by 12, for bitwise instructions you get an arbitrary string of 1's
at any position you want (including rotated), which turns out to be very
useful, while for loading values, just like in modern ARM-32, you load 16 bits
at a time using movw, movt and movk to reach the higher parts of a register.
You have to string together four of those to load a fully general 64-bit
value, but that's pretty rare in practice.

~~~
ant6n
In aarch64, the bitwise operation immediates are a bit more complicated. From
the documentation

 _Is the bitmask immediate. Such an immediate is a 32-bit or 64-bit pattern
viewed as a vector of identical elements of size e = 2, 4, 8, 16, 32, or 64
bits. Each element contains the same sub-pattern: a single run of 1 to e-1
non-zero bits, rotated by 0 to e-1 bits. This mechanism can generate 5,334
unique 64-bit patterns (as 2,667 pairs of pattern and their bitwise inverse).
Because the all-zeros and all-ones values cannot be described in this way, the
assembler generates an error message._

[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/AND_log_imm.html)

I feel like I have a vague memory of a single byte being repeated with some
specific pattern for some instruction, but I can't find it anymore.

------
kabdib
If you want to see something truly wacky and inspired, look at how the
Transputer loads constants. There's an instruction that does this, IIRC, four
bits at a time, with rules for what happens to previously loaded bits in terms
of shifting, sign extending and so forth. Basically you built up a constant 4
bits at a time. Most of the time you want things like 1, 2, 4, -1, 0 and so
forth, so it made sense.

Of course, being a stack machine, this is easier (no register to address, just
a top-of-stack value).

I don't know why the Transputer didn't take off. Probably Occam (which was
interesting, but weird). I know of at least one TV set top box that used the
T800, but don't know of any other design wins the chip had.

~~~
PhantomGremlin
_I don 't know why the Transputer didn't take off._

I think for two reasons.

1) "too weird". This eventually dooms all of the goofy architectures like
Transputer, IA-64, Transmeta, STI Cell, etc. Nobody wants to deal with
something radically different unless there are compelling long-term reasons to
do so. Granted, that's a little bit of a chicken-and-egg argument. Without
adoption there won't be any long-term existence of an architecture.

2) Support tools and infrastructure. Intel and ARM have an incredible variety
of support software. Any new architecture is at a disadvantage unless and
until all that supporting stuff is written and works well. This means
compilers, IDEs, development hardware, etc. Normally the tools for a new
architecture are laughably primitive compared to existing environments. Which
puts early adopters at a serious disadvantage.

We're in an x86-64 / ARM duopoly now. Nothing else is really relevant. Do you
see anything that could change that in the near future? I sure don't.

~~~
ant6n
We could get a monopoly.

Maybe OpenRisc/Risc-V will rise. Who'da thunk in 1990 that Linux would rise?

------
pmalynin
I'd love to see an Isa that uses something like a modified Huffman coding.
That way you can have jump instructions have 30 bit immediates.

~~~
phs2501
SPARC. The CALL instruction takes a 30-bit immediate.
[http://www.cs.rochester.edu/~scott/456/local_only/sparcv7.pd...](http://www.cs.rochester.edu/~scott/456/local_only/sparcv7.pdf)
Also the Mill talks seem to indicate that their (variable-length) instruction
encoding is machine-generated to be as efficient as possible.

