
Z80 arithmetic: also surprisingly terrible (2018) - rutenspitz
http://cowlark.com/2018-03-18-z80-arithmetic/index.html
======
flohofwoe
I don't think the 8-bit CPU instruction sets were designed with code-
generation for high-level languages in mind, but to make it easier to write
assembly by hand.

And each 8-bit CPU had its specialties, and it wasn't uncommon that computers
were designed around those specialties.

For instance when you look at the video memory layout of Z80 computers like
the ZX Spectrum or CPC, those often have a weird non-linear arrangement, which
only makes sense with the special 16-bit register pairs on the Z80.

E.g. when the 16-bit register pair HL is used as a video memory address, the
memory layout was often such that H and L could basically be used as X and Y
coordinates. E.g. incrementing L gets you to the next character on the same
line, and incrementing H to the first pixel line of the character line below.

The KC85/4 (East German home computer) even had a 90 degree rotated memory
layout of 320x256 pixels, 8 horizontal pixels grouped in a byte, so the video
memory was a matrix of 40 columns by 256 lines.

Put the column (0..39) into H, and the line (0..255) into L, and you have the
complete video memory address (or rather offset, H must 0x80 + column, because
video memory started at 0x8000). Increment or decrement H to move to the next
or previous column, and L to get to the next or previous line.

edit: messed up the number of columns, it's 40, not 80 :)

~~~
mtkd
There was little enough memory to get an asm program doing anything material
using HiSoft DevPac assembler let alone using a high-level lang.

The instruction set was very limited (in comparison to i386 etc.) so the
learning curve was not steep and writing in assembler was, in practice, no
more time consuming than using C now - once you were in the flow.

However due to the resource constraints you not only had to dry out everything
into functions as you would with C today - but often had to manually mutate
parts of the function instruction set code by poking in new values before
calling to change the behaviour to avoid wasting memory on branches etc.

For large writes people would often move the stack pointer around as PUSH BC
etc. were faster than LD at writing an address and inc(dec?) the pointer. I
seem to remember IX/IY being avoided as much as poss as they were quite
costly.

------
garganzol
When you write Z80 assembly you mostly use registers. Nobody writes it like
shown in the article! I spent a lot of my time writing Z80 assembly manually.
The code in article is somewhat artificial, inefficient and unnatural to my
eye.

EDIT: The code in article addendum is OK. The author finally caught up to Z80
style after many trials and errors. This is how idiomatic Z80 assembly looks
like.

~~~
Sharlin
Well, it _is_ artificial in the sense that it's meant to be code emitted by
the code generator of a compiler for a high-level language with very different
idioms. And the architectures of these early microcomputers obviously don't
exactly prioritize being easy compiler targets.

~~~
userbinator
Here's a sample of compiled Z80 code (the bootloader for one of these:
[https://en.wikipedia.org/wiki/S1_MP3_player](https://en.wikipedia.org/wiki/S1_MP3_player)
) generated by a C compiler --- of course, a commercial one and not a cheap
one at that (IAR Embedded Workbench for Z80.) You can see that it has
absolutely no problem using nearly all the registers:

[https://pastebin.com/raw/f3919adf](https://pastebin.com/raw/f3919adf)

------
Arnt
I must be an assembly programmer, because my immediate reaction to this was:
WHY ARE THEY WRITING TO MEMORY ALL THE TIME! The code just looks
uncomfortable. Like a word-by-word translation from a foreign language. Like
forth written by someone who tried to keep the stack empty and all the data in
variables.

~~~
jacquesm
Because the Z80 doesn't really have a whole lot of registers that you could
use for general storage. Even the 6502 (and the 6809) would try to keep the
cycle count down by storing stuff in the zero page (or direct page on the
'09).

Z80 assembly tends to be very easy to read because you don't have to keep a
mental map of what lives in which register, A really is the accumulator and
the only register that you can use to do anything more complex than inc or dec
to.

Contrast

[https://www.masswerk.at/6502/6502_instruction_set.html](https://www.masswerk.at/6502/6502_instruction_set.html)

with

[http://clrhome.org/table/](http://clrhome.org/table/)

And see how much more effective the 6502 set is when it comes to empowering
the various registers. The 6502 does not need 'shift' codes (slow) either.

I've programmed both, and even though they both have their charm I would
prefer to code the 6502 for the same problem (and I'd much prefer to use the
6809 over either).

~~~
Arnt
I wrote a lot of Z80, very little 6502. They felt very different to me. Code
written by one couldn't be ported to the other, but had to be rewritten.

Didn't have much trouble storing things in B/C/D/E/BC/DE, at least I don't
remember it as a problem. The innermost loops had to get top priority when
deciding what each register was used for, that's all.

~~~
jonsen
_Code written by one couldn 't be ported to the other, but had to be
rewritten._

That’s not my experience. I once ported a program from the 6800 (the mother of
the 6502) to the Z80. It was very straight forward. But to my surprise the
version on the much “fancier” Z80 turned out to be both larger and slower
despite the Z80 ran a little faster clock speed.

~~~
floofy222
I don't think the OP literally meant that porting code was impossible, but
rather, that it was very hard to do so and have the result run efficiently.

~~~
Arnt
(OP speaking for himself:) When I ported Z80 code to the 6502 literally, the
result didn't use the zero page to its full effect, because the zero page was
so much bigger than the Z80's extra registers. When I ported 6502 to Z80
literally (I mean code that used the zero page well), too much of the zero-
page work had to be replaced with memory work and not enough with the nice
fast registers.

------
mrob
>(Incidentally, the 6502 can do this in 10 bytes and 14 cycles. The Z80 is
terrifyingly slow.)

It's misleading to call the Z80 "terrifyingly slow" based on cycle counts
without mentioning that it clocked higher than the 6502. E.g. the Apple II ran
at just over 1MHz, while the ZX Spectrum ran at 3.5MHz (although with wait
states for accessing the memory area shared by the graphics hardware).

~~~
tonyedgecombe
That's a little unfair, the BBC Micro ran at 2 MHz and was released before the
Spectrum.

~~~
masklinn
It's not wrong though, the 6502 _was_ much slower clocked that equivalent
contemporaries.

However IIRC most contemporary architectures were closer to the Z80, so the
6502 was considered impressive because it was competitive despite a much lower
clock rate.

~~~
jonsen
1 MHz clock on the 6800 family of processors, including the 6502, is
equivalent to 2 MHz clock on the Z80. The 6800 architecture used a so called
two-phase clock. Each phase, positive and negative, of the clock cycle was
used to do work.

~~~
timbit42
I would suggest it is between 2 and 3 times, but it depends on what kind of
code you are running.

------
jgrahamc
A good reference for Z80 programming is Rodnay Zaks' Programming the Z80. On
page 94 at the start of the chapter Basic Programming Techniques there's a
section on 8-bit addition. The program is as follows:

    
    
        LD   A,(ADR1)        LOAD OP1 INTO A
        LD   HL,ADR2         LOAD ADDRESS OF OP2 INTO HL
        ADD  A,(HL)          ADD OP2 TO OP1
        LD   (ADR3),A        SAVE RESULT RES AT ADR3
    

this is the idiomatic Z80 way. That book goes on to show how to do 16-bit and
32-bit arithmetic the Z80 way.

[http://www.z80.info/zip/zaks_book.pdf](http://www.z80.info/zip/zaks_book.pdf)

~~~
stevekemp
Rodnay Zaks is a name I remember very well from my Spectrum-days. I have a
small collection of Z80-books which I've carried around, across countries, for
the past 30 years.

I should go dig out an emulator soon! (I usually have a game or two of "Chaos:
The Battle of Wizards" every six months or so.)

------
twtw
I would have appreciated some historical context for the z80 in this post.
8,500 transistors (equiv to ~2800 nand gates in depletion-load nmos logic * )
and not really designed to be a compiler target. I'd say the z80 is pretty
impressive - but certainly with quirks.

I would have enjoyed that kind of framing much more than "lol the z80 sux!"

* Not saying the z80 was implemented with all nand, just providing the figure as a reference.

~~~
FullyFunctional
While that's 100% true and Z-80's design has a legacy explanation, HOWEVER
freed of the legacy requirement, there's no technical reasons why Zilog
couldn't have created a much better compiler target.

------
Doctor_Fegg
Like any 8-bit CPU, the Z80 had plenty of coding optimisations you soon
learned. First one that came to mind reading this: instead of ld bc,4; ldir,
it’s faster just to do ldi; ldi; ldi; ldi.

There’s also a set of “alternate registers” you could swap in with exx, which
sometimes enabled faster arithmetic without hitting memory.

~~~
jgrahamc
Yep. The trade off is that takes 3 more bytes but it's a great example of the
thinking that went into writing 'tight' code for these processors. You're
doing a loop unroll for speed and taking up more space, depending on what evil
trick you're up to (e.g. hiding code inside a 128 byte unused spot in the
BDOS) one might be better than the other.

------
maire
This article brought back so many memories!

My first job out of college was writing z80 assembly language at Cromemco.
Later we ported everything to the 68000. We didn't write anything in a higher
level language because it was too slow. In fact the entire CDOS and Cromix OSs
were written by a single person. He originally wrote everything in c - but
when he ran it was so slow. He then rewrote everything directly in z80
assembly language and kept the c code as comments. Raw c code were the only
comments in the code.

I wrote the graphics drivers for screen and printer and a wysiwyg word
processor. There were no floating point processors. All math was in the
registers (as stated in the article. You can still render a lot of graphics by
converting your renderer to additions and multiplication by 2 (register shift
left and right). I was happy to find years later that code that I derived to
render circles and arcs using only 1 bit step and multiplication by 2 was also
derived by someone else and published in graphics books. You live within
limitations when that is your only option.

------
Steve44
I learned Z80 through the Sinclair stable but did play around with the 6502
and found I much preferred the Z80 way of working.

One notable thing here is they are writing a code generator and not actually
programming in assembly. As a result their modules need to be more general
purpose than if directly programming.

~~~
_Codemonkeyism
Same experience here. Came from a C64 (6202) to a CPC (Z80) in the 80s and the
Z80 felt much more powerful and expressive with its instruction set and
registers - also the instruction set of the Z80 felt more logical and planned.

~~~
beagle3
Indeed it felt that way, but in practice the 6502’s indirect addressing gives
you 128 16-bit registers compared to the z80’s 5+3
(hl,de,bc,ix,iy,hl’,de’,bc’) which is a winner for code generation and macro
programming.

~~~
_Codemonkeyism
No longer an expert in Z80 - and perhaps I never was as much of my assembler
career was Amiga 68k - but didn't the Z80 also have indirect adressing?

~~~
beagle3
No[0]. To read a byte through a pointer at IX, you have to:

    
    
        LD L,(IX+0)
        LD H,(IX+1)
        LD A,(HL)
    

On the 6502, you can do that in one instruction[0] if your X or Y registers
are zero (and more often than not, you can use the indexed-indirect or
indirect-indexed to save even more instructions):

    
    
         LDA ($40,X) ; if X == 0, and $40 is your pointer.
    

[0] [https://8bitnotes.com/2017/05/z80-addressing-
modes/](https://8bitnotes.com/2017/05/z80-addressing-modes/)

[1]
[http://www.obelisk.me.uk/6502/addressing.html](http://www.obelisk.me.uk/6502/addressing.html)

~~~
Steve44
From memory the IX and IY instructions took a lot of clock cycles and I
avoided them unless there was a really good reason to use them.

I've just had a quick look at an instruction cycle table and it seems that
without indexing they took 4 cycles more than HL then with an index that
increased to 12 more.

Ref from search returning
[http://www.z80.info/z80time.txt](http://www.z80.info/z80time.txt)

------
jussij
FWIW I learnt to program Z80 assembler using the MicroBee personal computer:

[https://en.wikipedia.org/wiki/MicroBee](https://en.wikipedia.org/wiki/MicroBee)

That was during my schooling years of 10, 11 and 12 prior to going on to study
engineering.

While studying engineering, I then got to work with 6502 assembler and while I
have no doubt that earlier Z80 experience help greatly, I still remember
thinking, writing assembler for the Z80 seemed to be so much easier than
coding for the 6502.

------
toolslive
I also seem to remember (I programmed this one in the late 80s, back when my
memory was good) that overflowing the 16 bit register did not set the proper
flags.

Still, compared to the 6502, this one had lots of comfort.

------
howard941
> And only A can be directly written to or from memory

I think the author might to write that only A can be indirectly written to or
from memory but even that isn't correct as demonstrated by the code bits that
retrieve and store from and to ram @HL (and IY/IX+blah). The 8080 was able to
directly read and write HL. The Z80 could do it for IX, IY, and IIRC BC and DE
too.

------
azhenley
I once implemented a [partial] Z80 emulator. It is a fantastic learning
project, especially if you haven't had to touch a lot of assembly in the past.
Even back when I did this, there were a plethora of resources on how to
emulate the Z80.

------
Accacin
This is a nice read for someone who has just started to learning GB ASM, I
know the processor in the game boy isn't exactly a Z80 (or an 8080) but it's
quite similar.

~~~
vanderZwan
I would recommend _Z80 Assember In 28 Days_ instead

[http://tutorials.eeems.ca/](http://tutorials.eeems.ca/)

------
unwiredben
Font fail: I kept reading 3op as 30p and was wondering what some of this
meant. 3-op would be clearer.

------
raverbashing
I'd say performance was of secondary importance when making it easy to program
assembly and silicon real estate were more important factors.

------
timonoko
Re-inventing the wheel, son? Z80 had CP/M Turbo Pascal with floating point
arithmetics. According to Byte-magazine, it left burn marks on the table,
because it was "lightning fast"

~~~
peteri
The z80 version only supported recursion if it was specifically enabled so I
suspect it tended use fixed locations for variables. Z80 support for a
traditional stack frame for recursion isn't very good (although IX,IY with
offsets could work).

~~~
jonsen
I’d guess that it used the old times trick of saving the return address by
modifying the code of the called subroutine. A recursive call would then
overwrite the primary callers return address.

