Hacker News new | comments | ask | show | jobs | submit login
Z80 arithmetic: also surprisingly terrible (2018) (cowlark.com)
89 points by rutenspitz 34 days ago | hide | past | web | favorite | 67 comments

I don't think the 8-bit CPU instruction sets were designed with code-generation for high-level languages in mind, but to make it easier to write assembly by hand.

And each 8-bit CPU had its specialties, and it wasn't uncommon that computers were designed around those specialties.

For instance when you look at the video memory layout of Z80 computers like the ZX Spectrum or CPC, those often have a weird non-linear arrangement, which only makes sense with the special 16-bit register pairs on the Z80.

E.g. when the 16-bit register pair HL is used as a video memory address, the memory layout was often such that H and L could basically be used as X and Y coordinates. E.g. incrementing L gets you to the next character on the same line, and incrementing H to the first pixel line of the character line below.

The KC85/4 (East German home computer) even had a 90 degree rotated memory layout of 320x256 pixels, 8 horizontal pixels grouped in a byte, so the video memory was a matrix of 40 columns by 256 lines.

Put the column (0..39) into H, and the line (0..255) into L, and you have the complete video memory address (or rather offset, H must 0x80 + column, because video memory started at 0x8000). Increment or decrement H to move to the next or previous column, and L to get to the next or previous line.

edit: messed up the number of columns, it's 40, not 80 :)

There was little enough memory to get an asm program doing anything material using HiSoft DevPac assembler let alone using a high-level lang.

The instruction set was very limited (in comparison to i386 etc.) so the learning curve was not steep and writing in assembler was, in practice, no more time consuming than using C now - once you were in the flow.

However due to the resource constraints you not only had to dry out everything into functions as you would with C today - but often had to manually mutate parts of the function instruction set code by poking in new values before calling to change the behaviour to avoid wasting memory on branches etc.

For large writes people would often move the stack pointer around as PUSH BC etc. were faster than LD at writing an address and inc(dec?) the pointer. I seem to remember IX/IY being avoided as much as poss as they were quite costly.

When you write Z80 assembly you mostly use registers. Nobody writes it like shown in the article! I spent a lot of my time writing Z80 assembly manually. The code in article is somewhat artificial, inefficient and unnatural to my eye.

EDIT: The code in article addendum is OK. The author finally caught up to Z80 style after many trials and errors. This is how idiomatic Z80 assembly looks like.

Well, it is artificial in the sense that it's meant to be code emitted by the code generator of a compiler for a high-level language with very different idioms. And the architectures of these early microcomputers obviously don't exactly prioritize being easy compiler targets.

Here's a sample of compiled Z80 code (the bootloader for one of these: https://en.wikipedia.org/wiki/S1_MP3_player ) generated by a C compiler --- of course, a commercial one and not a cheap one at that (IAR Embedded Workbench for Z80.) You can see that it has absolutely no problem using nearly all the registers:


I must be an assembly programmer, because my immediate reaction to this was: WHY ARE THEY WRITING TO MEMORY ALL THE TIME! The code just looks uncomfortable. Like a word-by-word translation from a foreign language. Like forth written by someone who tried to keep the stack empty and all the data in variables.

Because the Z80 doesn't really have a whole lot of registers that you could use for general storage. Even the 6502 (and the 6809) would try to keep the cycle count down by storing stuff in the zero page (or direct page on the '09).

Z80 assembly tends to be very easy to read because you don't have to keep a mental map of what lives in which register, A really is the accumulator and the only register that you can use to do anything more complex than inc or dec to.





And see how much more effective the 6502 set is when it comes to empowering the various registers. The 6502 does not need 'shift' codes (slow) either.

I've programmed both, and even though they both have their charm I would prefer to code the 6502 for the same problem (and I'd much prefer to use the 6809 over either).

I wrote a lot of Z80, very little 6502. They felt very different to me. Code written by one couldn't be ported to the other, but had to be rewritten.

Didn't have much trouble storing things in B/C/D/E/BC/DE, at least I don't remember it as a problem. The innermost loops had to get top priority when deciding what each register was used for, that's all.

Code written by one couldn't be ported to the other, but had to be rewritten.

That’s not my experience. I once ported a program from the 6800 (the mother of the 6502) to the Z80. It was very straight forward. But to my surprise the version on the much “fancier” Z80 turned out to be both larger and slower despite the Z80 ran a little faster clock speed.

I don't think the OP literally meant that porting code was impossible, but rather, that it was very hard to do so and have the result run efficiently.

(OP speaking for himself:) When I ported Z80 code to the 6502 literally, the result didn't use the zero page to its full effect, because the zero page was so much bigger than the Z80's extra registers. When I ported 6502 to Z80 literally (I mean code that used the zero page well), too much of the zero-page work had to be replaced with memory work and not enough with the nice fast registers.

Same here, wrote 6502 (C64) and Z80 (CPC) code and preferred the (double-)registers of the Z80 to work with compared to the limited registers of the 6502.

That's not a completely fair comparison. BCDE+HL+IXIY offered more space than the 6502's registers, but less space than its zero page, so the hierarchies differed. In both cases "A, something, main memory", but the somethings differed and it mattered.

You're right, though if I remember correctly the zero page was only slightly faster than memory access.

I'm als influenced a lot by my personal circumstances, as though the Z80 took generally more cycles than the 6502, as I moved from a 1mhz 6502 (C64) to a 4mhz z80 (CPC) in general the z80 felt faster.

Maybe it actually was. Code like (java) for(A a : b) a.c = false; certainly could be written very nicely on the Z80. By keeping a in IX and laying out the data structure as a struct where all of the bools are packed into one byte, the body of that loop could be just one (slow) instruction like RES 3, IX+4. But to get that speed you had to be aware of those possibilities and design the data structure. On the 6502 you'd lay out the data structure differently, playing to that CPU's strengths.

Interesting, I have no idea as I was writing assembler only, no code generation (except some assembler macros ;-)

+1 for the 6809. I've worked with both the Z80 and 6809, and while both were long enough ago that I don't remember the details the 6809 really impressed me. I wonder if there's ever been another 8-bit processor that surpassed it.

I remember the 6809 it was really clean and capable and I think supported relocatable code. Very much unlike the funky instruction sets of any of the other 8 bit machines and not the horror show that was x86.

I much preferred the 68000 to the 8086. The regularity of the instruction set made writing a disassembler trivial. Coupled with the single-step bit you could write a very capable monitor, which I did.

Yes, the 6809 could jump and branch relative to the current location, program counter + offset. If all jumps where of that kind a program could be located anywhere in memory.

For those miss Arnt's point, see how the real life Z80 code looked like here:


That's the disassembly of the whole ZX Spectrum ROM which has the built in BASIC interpreter with the floating point support.

What you did in really complex arithmetics, was to implement a stack machine. PUSH HL, POP BC etc were guite cheap operations.

They were also a way to empty a long section of memory faster than LDDR: swap out the stack pointer to the starting address, set HL to zero, then PUSH HL inside an optionally unrolled loop

As others pointed out, you had to, because of lack of registers.

Also, the article explicitly mentions it does more loads and stores than necessary (”In real life, values from one expression will remain in registers for the next, and so won't need to be reloaded; the examples are all deliberately choosing the worst possible case.”)

Finally, writing to memory wasn’t as bad in those days as it is today (or rather: using registers wasn’t as fast as it is today). Writing to a fixed address, for example, only took twice as long as a register-to-register move (4 cycles vs 2 cycles on a 6502, if I googled that correctly)

You are are correct, direct memory access on those machines were really cheap compared with today and even earlier. Modern day memory bus speeds haven't kept of with processor speeds (this is what sank RISC machines). Old core memory is slow. Drum memory is hella slow.

The other thing is there is a trade off between number of registers and instruction size. With 8 bit machines you see that for instance where only certain addressing modes can be used with certain registers. You don't have enough instruction space to encode for every addressing mode for all the registers you have.

No idea about the Z80, but in uni we had to write assembler for a C167 which was a fun architecture and registers basically didn't exist. They were just mappings to memory in a given location. Granted, a few things only worked on registers and not on memory directly. I think something like addressing individual bits, but just reading and writing a value was effectively the same on memory and on a register name.

I had no qualms at all to use memory as variables when convenient. I didn't have to use the stack at all in all the assignments as 16 registers and a handful DB sprinkled through the code for more locations to write to were enough – the disassembler in the debugger didn't like code interspersed with data, though.

registers basically didn't exist. They were just mappings to memory in a given location

This is how a lot of microcontroller architectures work, and indeed there are C compilers for them too.

>(Incidentally, the 6502 can do this in 10 bytes and 14 cycles. The Z80 is terrifyingly slow.)

It's misleading to call the Z80 "terrifyingly slow" based on cycle counts without mentioning that it clocked higher than the 6502. E.g. the Apple II ran at just over 1MHz, while the ZX Spectrum ran at 3.5MHz (although with wait states for accessing the memory area shared by the graphics hardware).

Yep. The Amstrad CPC was an effective 3.3MHz. Most 8-bit machines of the early 80s had reasonably comparable CPU performance: the real speed differentiators were the graphics hardware (Spectrum fast because you had so few colour bytes to write, CPC slow - until you learned how to scroll with the CRTC) and, if you used it, the efficiency of the BASIC/firmware.

There was a 16MHz Z80: Amstrad briefly built a machine on it in the 1990s (the PcW16).

TI still sells 15Mhz Z80s in their TI84 graphing calculators.

And still charges what they cost in the 80s

I'm kind of wondering why there isn't an open hardware calculator yet. Given the mark-up most of these brands have, it might actually be viable.

The ACT and SAT require certified calculators

Well, I imagine we could get citizen funding for that.

EDIT: Oh, the certificate is with the product? Well, there might be some smaller players who currently can't compete with TI who could handle the production side.

Hardware-wise it should definitely be easy enough to create a cheaper calculator just as capable of meeting the requirements, no?

Why do people still buy the TI-84? I used a TI NSpire when I was in school and it cost about the same

That's a little unfair, the BBC Micro ran at 2 MHz and was released before the Spectrum.

It's not wrong though, the 6502 was much slower clocked that equivalent contemporaries.

However IIRC most contemporary architectures were closer to the Z80, so the 6502 was considered impressive because it was competitive despite a much lower clock rate.

1 MHz clock on the 6800 family of processors, including the 6502, is equivalent to 2 MHz clock on the Z80. The 6800 architecture used a so called two-phase clock. Each phase, positive and negative, of the clock cycle was used to do work.

I would suggest it is between 2 and 3 times, but it depends on what kind of code you are running.

A 2 MHz 6502 is approximately equivalent to a 6 MHz Z80, give or take 1 MHz depending on what you're doing.

A good reference for Z80 programming is Rodnay Zaks' Programming the Z80. On page 94 at the start of the chapter Basic Programming Techniques there's a section on 8-bit addition. The program is as follows:

    LD   A,(ADR1)        LOAD OP1 INTO A
    ADD  A,(HL)          ADD OP2 TO OP1
    LD   (ADR3),A        SAVE RESULT RES AT ADR3
this is the idiomatic Z80 way. That book goes on to show how to do 16-bit and 32-bit arithmetic the Z80 way.


Rodnay Zaks is a name I remember very well from my Spectrum-days. I have a small collection of Z80-books which I've carried around, across countries, for the past 30 years.

I should go dig out an emulator soon! (I usually have a game or two of "Chaos: The Battle of Wizards" every six months or so.)

I would have appreciated some historical context for the z80 in this post. 8,500 transistors (equiv to ~2800 nand gates in depletion-load nmos logic * ) and not really designed to be a compiler target. I'd say the z80 is pretty impressive - but certainly with quirks.

I would have enjoyed that kind of framing much more than "lol the z80 sux!"

* Not saying the z80 was implemented with all nand, just providing the figure as a reference.

While that's 100% true and Z-80's design has a legacy explanation, HOWEVER freed of the legacy requirement, there's no technical reasons why Zilog couldn't have created a much better compiler target.

Like any 8-bit CPU, the Z80 had plenty of coding optimisations you soon learned. First one that came to mind reading this: instead of ld bc,4; ldir, it’s faster just to do ldi; ldi; ldi; ldi.

There’s also a set of “alternate registers” you could swap in with exx, which sometimes enabled faster arithmetic without hitting memory.

Yep. The trade off is that takes 3 more bytes but it's a great example of the thinking that went into writing 'tight' code for these processors. You're doing a loop unroll for speed and taking up more space, depending on what evil trick you're up to (e.g. hiding code inside a 128 byte unused spot in the BDOS) one might be better than the other.

I learned Z80 through the Sinclair stable but did play around with the 6502 and found I much preferred the Z80 way of working.

One notable thing here is they are writing a code generator and not actually programming in assembly. As a result their modules need to be more general purpose than if directly programming.

Same experience here. Came from a C64 (6202) to a CPC (Z80) in the 80s and the Z80 felt much more powerful and expressive with its instruction set and registers - also the instruction set of the Z80 felt more logical and planned.

Indeed it felt that way, but in practice the 6502’s indirect addressing gives you 128 16-bit registers compared to the z80’s 5+3 (hl,de,bc,ix,iy,hl’,de’,bc’) which is a winner for code generation and macro programming.

No longer an expert in Z80 - and perhaps I never was as much of my assembler career was Amiga 68k - but didn't the Z80 also have indirect adressing?

No[0]. To read a byte through a pointer at IX, you have to:

    LD L,(IX+0)
    LD H,(IX+1)
    LD A,(HL)
On the 6502, you can do that in one instruction[0] if your X or Y registers are zero (and more often than not, you can use the indexed-indirect or indirect-indexed to save even more instructions):

     LDA ($40,X) ; if X == 0, and $40 is your pointer.
[0] https://8bitnotes.com/2017/05/z80-addressing-modes/

[1] http://www.obelisk.me.uk/6502/addressing.html

From memory the IX and IY instructions took a lot of clock cycles and I avoided them unless there was a really good reason to use them.

I've just had a quick look at an instruction cycle table and it seems that without indexing they took 4 cycles more than HL then with an index that increased to 12 more.

Ref from search returning http://www.z80.info/z80time.txt

FWIW I learnt to program Z80 assembler using the MicroBee personal computer:


That was during my schooling years of 10, 11 and 12 prior to going on to study engineering.

While studying engineering, I then got to work with 6502 assembler and while I have no doubt that earlier Z80 experience help greatly, I still remember thinking, writing assembler for the Z80 seemed to be so much easier than coding for the 6502.

This article brought back so many memories!

My first job out of college was writing z80 assembly language at Cromemco. Later we ported everything to the 68000. We didn't write anything in a higher level language because it was too slow. In fact the entire CDOS and Cromix OSs were written by a single person. He originally wrote everything in c - but when he ran it was so slow. He then rewrote everything directly in z80 assembly language and kept the c code as comments. Raw c code were the only comments in the code.

I wrote the graphics drivers for screen and printer and a wysiwyg word processor. There were no floating point processors. All math was in the registers (as stated in the article. You can still render a lot of graphics by converting your renderer to additions and multiplication by 2 (register shift left and right). I was happy to find years later that code that I derived to render circles and arcs using only 1 bit step and multiplication by 2 was also derived by someone else and published in graphics books. You live within limitations when that is your only option.

I also seem to remember (I programmed this one in the late 80s, back when my memory was good) that overflowing the 16 bit register did not set the proper flags.

Still, compared to the 6502, this one had lots of comfort.

> And only A can be directly written to or from memory

I think the author might to write that only A can be indirectly written to or from memory but even that isn't correct as demonstrated by the code bits that retrieve and store from and to ram @HL (and IY/IX+blah). The 8080 was able to directly read and write HL. The Z80 could do it for IX, IY, and IIRC BC and DE too.

I once implemented a [partial] Z80 emulator. It is a fantastic learning project, especially if you haven't had to touch a lot of assembly in the past. Even back when I did this, there were a plethora of resources on how to emulate the Z80.

This is a nice read for someone who has just started to learning GB ASM, I know the processor in the game boy isn't exactly a Z80 (or an 8080) but it's quite similar.

I would recommend Z80 Assember In 28 Days instead


Font fail: I kept reading 3op as 30p and was wondering what some of this meant. 3-op would be clearer.

I'd say performance was of secondary importance when making it easy to program assembly and silicon real estate were more important factors.

Re-inventing the wheel, son? Z80 had CP/M Turbo Pascal with floating point arithmetics. According to Byte-magazine, it left burn marks on the table, because it was "lightning fast"

The compiler was ligthning fast. No need for floating point artihmetic to compile a program.

The z80 version only supported recursion if it was specifically enabled so I suspect it tended use fixed locations for variables. Z80 support for a traditional stack frame for recursion isn't very good (although IX,IY with offsets could work).

I’d guess that it used the old times trick of saving the return address by modifying the code of the called subroutine. A recursive call would then overwrite the primary callers return address.

"Stack Frames"? I think they just pushed the current values of pertaining variables onto CPU stack upon entering a recursive function. Does Pascal even have lexical/dynamic binding issues? I do not think so.

Cowgol doesn't have recursion at all anyway

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact