
How the Z80's registers are implemented - zdw
http://www.righto.com/2014/10/how-z80s-registers-are-implemented-down.html
======
VLM
If you think swapping via register renaming is creative, you'd really like
barrel shifters.

[http://en.wikipedia.org/wiki/Barrel_shifter](http://en.wikipedia.org/wiki/Barrel_shifter)

The easiest way to shift bits isn't "adder like" with carry propagation across
data lines (sorta), although that could be done. The easiest way is just to
phase distort the bus (sorta) so what you called bit X is now bit X-4 for all
bits on the bus, and use a 2x1 mux to select which, and cascade rollers in
binary (you don't need a roller for all integer numbers if you can cascade
them 16, 8, 2, 1 position only activating some rolls). If you can rotate, you
can shift by adding one more stage at the end that does bit oriented ops to
clear MSBs or LSBs. Also if your roller is capable of rolling all bits on the
bus, you don't need two rollers one in each direction. Latency does add up of
course, all those cascaded ops.

Also I saw in the notes the Rodney Zaks z80 book being referenced. I learned
assembly from that book, back when it was new. I enjoyed that book. I also
looked at the PDF scan of someone's beat up copy... remember when textbooks
were only $10.95 each?

~~~
kator
I still have my original copy of Zaks' book in my collection. I obtained it in
1981 from a friend. I did so much z80 assembly on the TRS-80 that I could
"see" instructions in a hex dump before asking zbug to disassemble them.

The z80 was quite the chip for its time and I enjoy reading these articles has
they look back into the deeper inner workings of the chip.

One could master the z80, I'm not sure many modern processors could be
mastered at that level by a single programmer. They seem to have gotten so
complicated that they're beyond comprehension in a single person's mind. I
could be wrong but that's the general feeling I get. Sure you can be really
good at them and maybe an expert in many applications of a modern complex
processor but I doubt many people have a singular understanding of everything
each chip has to offer.

Or maybe I'm just getting old.. :-)

~~~
userbinator
The 8086, which is based on the Z80, is not that much more complicated (even
the opcode structure is similar, being an octal format), and same goes for the
80186. Things started becoming a bit more complex with the '286 and its
protected mode, and then the '386+ with 32-bit and its various protected-mode
enhancements really made it harder for a single person to master fully, but I
think it was still mostly possible; the really hairy stuff started with 64-bit
mode and all the extensions (e.g. virtualisation) that have been made to that
since.

On the non-x86 side, modern ARM SoCs like the ones used in smartphones are not
all that much simpler despite a less complex CPU core - they're still
thousands of pages of documentation in total.

------
sudowhodoido
A little comment regarding the Z80 vs 6502 register count: I always consider
the 6502 to have 256 registers (zero page) and an accumulator and two index
registers. Never a shortage if you use it like that...

~~~
pdq
I would think of the zero page as a rudimentary L1 cache, because the access
times to the zero page requires a lookup cycle, whereas registers do not.

~~~
TheLoneWolfling
On a related note: I wish that more CPUs had an explicit cache. So data has to
be explicitly loaded into cache, etc.

Modern CPUs are NUMA. Don't treat memory as RAM any more, because that's not
true.

Biggest problem with this is that not all CPUs have the same amount of cache.
But you can get around this by treating the cache as the low area of RAM, with
instructions to get the amount of cache available. Especially if cache is also
paged.

Other issue with this is context switches, but this is conceptually no
different than paging RAM to disk when required.

~~~
pdq
I'm not too familiar with many other architectures, but MIPS has dcache
"fill", "flush", and "lock" operations, so the user can both do a premature
fill operation, or even lock data in the cache so it won't be evicted.

I haven't seen many people actually use these ops, because it's actually
pretty hard to do better than the built-in cache allocation policies for most
applications, especially if you take into account that your app is going to
get swapped out consistently by the operating system task switches.

~~~
TheLoneWolfling
There is a distinction between _allowing_ said operations and being _designed_
for such operations. It is possible even in x86, although difficult, and
requires privileged operations. (A user-mode program can request that
something be prefetched or flushed, and can do non-temporal loads (and
stores?), but in order to get "true" scratchpad memory you have to play with
the MTRR, and even then the processor doesn't support hardware paging of
cache, like it does with, for example, RAM)

> I haven't seen many people actually use these ops, because it's actually
> pretty hard to do better than the built-in cache allocation policies for
> most applications, especially if you take into account that your app is
> going to get swapped out consistently by the operating system task switches.

And again, this is largely because the cache is implicit to the OS. There's no
way to go "this is the stuff that was cached last time this process has
control, when you can, reload it back in" to the processor, because you can't
tell what in cache is "owned" by what in anything like an efficient manner -
and even if you could, the moment you start executing a context switch you've
overwritten random bits of cache.

It's like if the processor was set up to directly talk to the hard drive to do
paging on demand, to the point that the OS wasn't even aware of it. In theory
it's a good idea, but the more you look at it the more flaws emerge.

~~~
kps
For anyone interested in experimenting along these lines, the Intel Quark
(486-based SoC) has on-die plain SRAM instead of L2 cache.

------
rwmj
Interesting that the flags are stored like the other registers. I had always
assumed that the flag bits would be "spread out" across parts of the ALU, and
only combined with the accumulator into a virtual register when you read the
AF register. In fact they are stored in the register file and copied to and
from the ALU on every(?) operation.

------
Animats
It's cool seeing register renaming in an early microprocessor. Doubling up all
the registers takes a lot of silicon, and is unusual for that era.

~~~
userbinator
I don't know if the term "register renaming" was even around at the time since
the other instructions (e.g. MOVs) don't "rename" registers, but they
certainly had the right idea - "why move the data around when you can just
change the register selection bits?" It's also how I expected the exchange
instructions to work when I first saw the instruction set and the timings, so
it's probably a rather natural and obvious way of doing it.

I did it the same way when I designed a Z80-compatible CPU in a (graphical)
logic simulator, although I used a dual-ported register file.

~~~
agoetz
Register renaming has been around since the 60's in the form of Tomasulo's
Algorithm.

------
agumonkey
The renaming trick makes me feel a cpu is just a fixed size activation
frame/scope under reduce/fold.

