
ARM1 Gate-level Simulation - walterbell
https://www.theregister.co.uk/2015/11/28/arm1_visualized/
======
dasmoth
The actual simulation is at
[http://visual6502.org/sim/varm/armgl.html](http://visual6502.org/sim/varm/armgl.html)

~~~
pjc50
A (extremely terse, partial) description can be found at
[http://www.righto.com/2016/02/reverse-engineering-
arm1-proce...](http://www.righto.com/2016/02/reverse-engineering-
arm1-processors.html)

~~~
kens
I wrote several articles about the ARM1 when this simulator was released in
2015. A better article to start with is:
[http://www.righto.com/2015/12/reverse-engineering-
arm1-ances...](http://www.righto.com/2015/12/reverse-engineering-
arm1-ancestor-of.html)

Dave Mugridge also wrote some articles about the ARM1, focusing more on the
ALU and registers: [https://daveshacks.blogspot.com/2015/12/inside-alu-of-
armv1-...](https://daveshacks.blogspot.com/2015/12/inside-alu-of-armv1-first-
arm.html)

------
krylon
Within my lifetime, we went from CPUs that were simple enough we can now
simulate them at the logic-gate level in ____ing Javascript to CPUs complex
and powerful enough to emulate yesterday's mainstream CPUs at the ____ing
logic-gate level in ____ing Javascript. I would not even be _that_ surprised
if the result outperformed the the original hardware.

------
Aardwolf
Nice! One thing though:

For
[http://visual6502.org/sim/varm/armgl.html](http://visual6502.org/sim/varm/armgl.html),
would be much nicer that dragging would pan rather than "3D rotate" the view.
The panning with wasd is too slow and not compatible with some keyboard
layouts.

And of course zooming around mouse cursor rather than around center of screen
would also help to zoom towards the part you want.

The 3D rotation is gimmicky but not actually useful to see the gates, and the
current UI just doesn't let me zoom to gates I want without spending too much
effort fighting the slow panning and the zooming target.

Thanks!

------
bogomipz
I had a question, the article states the following:

>"One very nice thing about the 32-bit instruction set is its pervasive
conditional execution, which helps one avoid branching over code. For example,
this sequence of instructions resets the register r0 to 0 if its value is
equal to or less than zero, or forces its value to 1 if its value is greater
than zero:

CMP r0, #0 ; if (r0 <= 0) MOVLE r0, #0 ; r0 = 0; MOVGT r0, #1 ; else r0 = 1

Without the conditional moves (MOVLE and MOVGT) after the compare (CMP), you'd
have to branch after the compare, which is wasteful."

How are those those two conditional moves after the CMP operation more
efficient than branching? Aren't they kind of branches themselves? What would
the alternative "branching" sequence look like then?

~~~
monocasa
It'd look something like

    
    
        cmp  r0,#0
        bgt  .1f
        mov  r0,#0
        b    .2f
      1:
        mov  r0,#1
      2:
    

The big deal is the conditional branch (the bgt). If the processor gets it
wrong it's a pipeline flush. And best case you still have extra instructions
for the branches. The conditional mov example is a fixed cost of a single
"wasted" cycle, which matches the best case of the branching example (branch
correctly predicted to mov r0,#1 and fall through). The worst case for the
branching version is probably somewhere ~15 cycles depending on the uArch, but
is still 1 cycle for the conditional move.

All of that being said, the branching version tends to be nicer for OoO cores
since there aren't data dependencies on the flag registers any more, hence why
you see RISC ISAs designed for OoO cores removing conditional execution for
most instructions (AArch64 and RISC-V standout here).

~~~
fanf2
In the ARM2 era (probably the same for ARM1?) a basic ALU instruction such as
MOV took 1 cycle, and a branch took 4 (if taken) or 1 (if not). (There were
extra DRAM page cycles every 4 words too)

So for a simple if/else, it was usually both less code and faster to use a
straight line of conditional instructions. In more complicated cases, if the
programmer was feeling clever, it was possible to update the status flags to
get three-way (or more!) conditionals in straight-line branchless code. Fun!

------
bogomipz
The article states:

>"The ARM2 had pretty much the same instruction set as the ARM1, although
featured new multiplication and (later) atomic swap instructions."

Does this mean that the ARM1 didn't support any atomic operations or were they
using something else besides "compare and swap"?

~~~
jecel
The ARM1 did not have any atomic operation. You only need those if you have
more than one processor. It also lacked the multiply and multiply-accumulate
instructions, as stated above. These took multiple cycles, which is not very
RISC-like. That is also true of the load multiple and store multiple
instructions of the ARM2 (I don't remember if the ARM1 had them). The ARM2
also added the coprocessor interface.

~~~
jecel
Oops - in the analysis of the PLA2 in the ARM1 there are both the load/store
multiple instructions and the coprocessor stuff. In fact, together they take
up about half of the logic. So I was remembering it wrong, then.
[http://daveshacks.blogspot.com/2016/01/inside-
armv1-instruct...](http://daveshacks.blogspot.com/2016/01/inside-
armv1-instruction-decoding-and.html)

------
all2
Does anyone else find it slightly entertaining that this is an article from a
news outlet titled "The Register"?

~~~
krylon
I never thought about it, but now that you mention it, it _is_ a great name
for an IT news site. ;-)

