
On choosing the Z80 over the 6502 (2014) - networked
http://www.luke.maurits.id.au/blog/post/on-choosing-the-z80-over-the-6502.html
======
dwarman
I spent severaal years of my life in the latter 70's and early 80's living and
dreaming in Z80, for my work at MICOM using it for hard real time data
communictions (specifically, Statistical Multiplexors). A critical component
to achieving HRT is interrupt response time. Z80 has two complete sets of
hardware registers and a one byte instruction to toggle which is active. So
with the basic 2.5MHz clock time from raising the interrupt to execution of
the first instruction of the handler was 2 usecs. With zero additional context
saving to do. Add to that hardware vectored interrupts and it was golden - you
landed directly in the handler for the specific peripheral. No need to poll
status registers.

Not only way ahead of its time, this kind of performance feature is still
apparently unknown. Even in GHz SOC systems and modern DSPs. I've seen some
really arcane and baroque interrupt status trees in my time since. All take
significant code to parse and dispatch. One really egregious offender was the
i960 CPU, sold specifically into the data communications market by Intel in
the mid 90's. Sold into a necessarily HRT application domain. Even at a 50 MHz
clock it still took 6 - 10 usecs to get started doing actual work. I was very
surprised, but the hardware choice at that time was not open to me and I had
to live with it.

~~~
nibnib
I think the market has splintered a little bit, low-latency "real-time" stuff
tends to be targeted more by simpler devices, whereas the GHz SOCs and DSPs
are more often used for heavy number-crunching or "less real-time"
applications where they may be running a full OS. Some of the delay is
probably due to hardware support for things like scheduling, multi-threading
etc.

As for current tech that can match the Z80, I know the MSP430 can have an
interrupt latency of about 10 cycles. This can be below 1us with the kind of
clock speeds many of the parts will run at.

Apparently (according to the manual) an ARM Cortex M3 can enter an interrupt
in 12 cycles. This can be much faster as the parts can be run above 100MHz.

------
ChuckMcM
The Z80 was always considered the more "commercial" processor, and businesses
ran CP/M and various business applications like dBase on it. CP/M as an OS was
more "OS like" than anything available for the 6502 at the time, I believe
that RT-11 inspired CP/M which was a DEC operating system (a "real" computer
company) and commands like PIP were ported over with almost identical syntax.

In contrast there was a number of "proprietary" vendors with 6502 systems,
Apple and its OS, SWTP and its OS, Sphere and its OS, Commodore and its OS,
all different all with minimal amounts of market share in business. Contrast
that to CP/M which ran on Altair machines, IMSAI, Morrow, Polywell, Sol20's,
Heathkits, Etc. Even the Cromemco once my BIOS had gotten reasonable
distribution.

The I/O address space provided another feature on Z80 (and 8080) machines,
which was "shadow rom". Basically you booted with some "massive 8K" ROM and
then you could write out a byte which would swap it in memory space for a 1/2K
very simple jump table/switcher. That let you use most of the address space
for user programs. Later when MP/M came out it let you do wild things with
"swapping" processes.

Perhaps the saddest part of the Z80 was that the Z800 (which only made it to
market in a more limited Z280 form) was still born. It had a sorts of great
ideas that folks wanted to use but the momentum of the 8086/8088 was
unbeatable by that time. Given chip geometries it would be interesting to
build a Z800 today but for the same reasons that it is interesting to build
the difference engine, to prove that it would have worked and had some nice
features that were ahead of its time.

~~~
cmrdporcupine
Take a look at the eZ80 from Zilog. 50mhz (equiv of a 200mhz 80s-era Z80
apparently) with 256KB onboard flash, and can address 16MB (24-bits) of RAM,
as well as full of onboard peripherals and GPIO.

Too bad only available in surface mount, not really hobbyist friendly.

~~~
Sanddancer
I've found that surface mount is more bark than bite, to be honest. Boards
that'll make it a pth device aren't terribly much, and I've done tons of
surface mount chips using the toaster oven technique. Even there, you can get
a low end smd oven off ebay for not a terrible amount of money if/when you
want to do any sort of volume work. Hell, with my crappy fine motor skills,
I've nuked more stuff with the soldering iron than I have with smd, so things
definitely have changed on that front.

~~~
tonyarkles
There's two techniques that I find work very well for SMD ICs for hand
assembled boards:

\- hot air rework wand. I got mine from China for $100. It works awesome both
for removing parts and for melting paste.

\- flood & wick. You start by tacking down two corners of the chip, and then
drag the tip with some solder along the full edge. You'll probably get some
bridging between pins. Sometimes, when the moon is aligned just right, you
just get a perfect solder. For the times when you don't, the next step is to
take a bit of fresh wick and clean it up.

Between these two tools (cheap Chinese reflow wand $100, and a nice Hakko iron
$100), I've probably hand-assembled 75-100 prototypes and I don't recall ever
destroying one. My next trick is going to be trying to do some BGA parts...
new experiences!

~~~
userbinator
Hot air is definitely the right way to do it for SMD/BGA, and I find it much
easier than through-hole because you don't have to solder each pin separately
-- just aim the gun at the part and wait for the solder to melt. The surface
tension means the part will actually self-align into place over the pads if
you get it close enough.

BGA is not that much more difficult; look online for videos of the phone
repair shops in China which can swap BGA chips in literally minutes.

------
userbinator
Another advantage of the Z80 if you ever want to assemble (or disassemble)
programs manually is that its instruction encoding is very consistently
arranged in an octal, 2-3-3 pattern:

[http://www.z80.info/decoding.htm](http://www.z80.info/decoding.htm)

This design was likely influenced by the 8080/8085, and x86 followed the same
organisation too:

[http://reocities.com/SiliconValley/heights/7052/opcode.txt](http://reocities.com/SiliconValley/heights/7052/opcode.txt)

Due to how it decodes, the 6502 has a 3-3-2 pattern, which doesn't look quite
as structured:

[http://www.oxyron.de/html/opcodes02.html](http://www.oxyron.de/html/opcodes02.html)

[http://www.llx.com/~nparker/a2/opcodes.html](http://www.llx.com/~nparker/a2/opcodes.html)

[http://www.pagetable.com/?p=39](http://www.pagetable.com/?p=39)

~~~
pvg
Not likely but for sure - the Z80 was designed to be binary compatible with
the 8080. Although I'm not sure it is that big a practical advantage, you
start recognizing the instructions fairly quickly after staring at enough
dumps.

------
melted
Z80 has beautiful assembly. Super intuitive and user friendly. I had a
Spectrum growing up, and 13 year old me had no issues writing complex programs
in Z80 assembly. In retrospect, it's kind of incredible that those programs
worked, seeing how I only had less than 48K to work with and didn't know shit
about programming.

------
Theodores
In the home computing sector the 6502 occupied the high end in the UK with the
BBC Micro being the flagship. Meanwhile Z80 was popularised by Sinclair with
the ZX Spectrum (which built on the hugely popular ZX81).

Personally as a newbie to programming (as everyone was at the time, to varying
degrees), I preferred the simplicity of the 6502 because it only had three
registers. Meanwhile, the extra registers on the Z80 also came with
programming concepts that were beyond me at the time. I always knew the Z80
was better!!!

Despite knowing the Z80 to be better it was the underdog at the time in the UK
home computing sector, more accomplished but used in the cheaper Sinclair
style machines with the 6502 in more expensive machines.

Why were the 6502 machines more expensive? The extra chips that went with the
6502 such as the 6522 IO chip and commonly found chips like the AY-3-8910
sound chip provided useful features that were handled very well by these
support chips. They cost money. Meanwhile, in Z80 land with the likes of Clive
Sinclair, some ULA chip of some sort would be put together. This chip would
work with the CPU to do all of these extra functions badly.

On the ZX81 the CPU did everything. This included the screen. In 'fast mode'
you would have no screen. With sound on the spectrum, the same story. Rather
than have some neat electronics do the sound properly the CPU would stop
everything it was doing to make some 'beep'.

So, in 2015, which to go for? If you are not going to be making millions of
these boards there is no point saving money on the support logic. Therefore
the Z80 - the better CPU - with a proper set of support chips is what to go
for.

~~~
cmrdporcupine
The 6502 has ridiculously good interrupt responsiveness and very tight cycle
timing. And it was cheap. It was the best choice for gaming/graphics type
systems because of this.

------
cmrdporcupine
It's neat to imagine an alternate history where z80 CP/M machines 'won' the
80s PC wars, and Gary Kildall and DRI became the rich and deservedly powerful
software house of the 80s and 90s, instead of Gates/Microsoft; with ascendancy
for GEM and a multitasking multiuser CP/M instead of Windows and MS-DOS.

~~~
rasz_pl
To be fair Kildall believed he deserved $240 for every copy of the os, and was
angry at Gates selling dos at $40. He didnt see the big picture.

~~~
cmrdporcupine
Do you have a source for that?

~~~
rasz_pl
wiki, books, interviews of IBM employees

~~~
cmrdporcupine
I ask because there's been a cottage industry of rumours around this stuff,
including the (not supported) claim that Kildall was "busy flying his plane"
instead of meeting with IBM, etc.

~~~
rasz_pl
Afair Computer History Museum has an interview with one of IBMers that was
present at this meeting.

------
avifreedman
6809 for the win over both, by far!

~~~
jejones3141
Amen to that. Alas, I'm not aware of it still being made, though there are
FPGA implementations. (And look up "CoCo on a Chip"; it's a project to
recreate and eventually enhance the Tandy Color Computer on an FPGA. The
fellow doing it has started work to recreate the CoCo 3's GIME chip, which is
its MMU and graphics hardware.)

If only Hitachi had publicized the capabilities of the 6309...

~~~
Sanddancer
Rochester Electronics got the license to start making the 6809 again a few
years ago, and they're apparently _not_ available now, _though should be
shortly_.

[http://www.embedded.com/electronics-
news/4437243/Rochester-b...](http://www.embedded.com/electronics-
news/4437243/Rochester-brings-Freescale-68K--Intel-80C186-88-MCUs-back-to-
life)

[Link to irrelevant part removed, thanks tacos]

~~~
tacos
Doesn't seem to be in production yet. The part numbers you reference are
SOT23-3 voltage regulators.

------
pflanze
I thought that the page that was used for the stack on the C64, or at least
the C128 (which I had, i.e. the MOS 8502, which is almost identical to the
6502) could be reconfigured. I remember some code that would use stack pushes
to achieve faster filling of memory, and that would of course only have been
useful if the location of the stack could be changed. But I just can't find
anything on the web that confirms this (all resources say the stack was fixed
at page 01, I just think that was the default, not really fixed; as I recall
it, the zero page location was _really_ fixed, but the stack wasn't).

------
to3m
As a longtime 6502 programmer I was impressed by the Z80 after reading the
data sheet - but actually trying to program it left me nonplussed. The main
omissions as I recall were: lack of immediate addressing for the 16-bit
arithmetic instructions; immediate instructions generally are more expensive;
(IX+n)/(IY+n) addressing modes are heinously slow; no indirect-with-register-
offset addressing mode. These all seemed to eat away at the Z80's apparent
advantages.

As a somewhat representative example - suppose you have a bunch of objects
that you want to work on. On the Z80 you'd probably adopt a modern-style
struct system, whereby each object is represented by a block of memory. Offset
0 is X coordinate, offset 1 is Y coordinate, offset 2 is flags, blah de blah.
Suppose each object is 8 bytes, and you want to set a flag for each object
that's on screen.

    
    
        LD DE,(MAX_X<<8)|MIN_X ; 10 10
        LD IY,8                ; 14 24
        LD IX,OBJECT0          ; 14 38
        LD B,NUMOBJECTS        ; 7  45
    
        .LOOP
        LD A,(IX+0)            ; 19 19
        CP D                   ; 4  23
        JP C,NOTVISIBLE        ; 10 33
        CP E                   ; 4  37
        JP NC,NOTVISIBLE       ; 10 47
     
        .VISIBLE
        LD A,(IX+2)            ; 19 19
        OR VISFLAG             ; 7  26
        LD (IX+2),A            ; 19 45
        ADD IX,IY              ; 15 60
        DJNZ LOOP              ; 13 73
        JP DONE                ; 10
    
        .NOTVISIBLE
        LD A,(IX+2)            ; 19 19
        AND ~VISFLAG           ; 7  26
        LD (IX+2),A            ; 19 45
        ADD IX,IY              ; 15 60
        DJNZ LOOP              ; 13 73
        .DONE
    

(For readers born after 1985 :) - numbers in the comments are cycle counts:
instruction's count, then cumulative total for this block)

So, for N objects: (45 + N * 120). Plus 10 if the last one was visible.

For the 6502 you'd probably adopt a striped layout, so rather than having each
object as a block of data, you'd have an X table, a Y table, a flags table,
and so on. So the same again:

    
    
        LDX NUMOBJECTS         ; 3  3
        .LOOP
        LDA XS-1,X             ; 4  4
        CMP MINX               ; 3  7
        BCC NOTVISIBLE         ; 3  10
        CMP MAXX               ; 3  13
        BCS NOTVISIBLE         ; 3  16
    
        .VISIBLE
        LDA FLAGS-1,X          ; 4  4
        ORA #VISFLAG           ; 2  6
        STA FLAGS-1,X          ; 5  11
        DEX                    ; 2  13
        BNE LOOP               ; 3  16
        BEQ DONE               ; 3
    
        .NOTVISIBLE
        LDA FLAGS-1,X          ; 4  4
        AND #~VISFLAG          ; 2  6
        STA FLAGS-1,X          ; 5  11
        DEX                    ; 2  13
        BNE LOOP               ; 3  16
        .DONE
    

So, for N objects: (3 + N * 32). Plus 3 if the last one was visible. (I've
been a bit scrappy with the cycle counts here, by counting each branch as
taken, even though that makes the totals invalid. This is done to favour the
Z80, which doesn't appear to execute untaken branches any quicker.)

That's pretty much the 4:1 improvement that's commonly claimed. 1MHz 6502 will
beat 3.5MHz Z80; 2MHz 6502 will beat 4MHz Z80.

This might seem like a synthetic benchmark, designed to make the 6502 look
better, but this sort of thing crops up fairly often. (I noticed it initially
after noting how crappy-looking the code was for a couple of Z80 games I was
disassembling; I only realised why after trying to rewrite the snippets in
question myself!)

One thing to note in particular here is that in the Z80 case you're two
registers down - because IY has been used for the array stride (no immediate
16-bit addition...), and DE has been used for the min/max constants (immediate
instructions are more expensive).

And this is all very well as the code is presented here, but in practice, in
many cases you'll probably need to use DE in your code. You can work around
this by using EXX and having your constants in the shadow register bank, but
you're losing 8 T-states per iteration (EXX = 4 T-states). For many purposes
you'd be better off using self-modifying code - but now you lose 3 T-states
per addition (immediate instructions are more expensive).

So a bit of a disappointment for me. Initial excitement at seeing just how
much crazy stuff the Z80 can do rapidly turned into disillusionment as I
realised just how long it takes to do anything. It's like the 68000 in this
respect.

Good luck beating LDIR with a 6502 at quarter the clock rate, though...

Z80 code is also generally much easier to follow...

Z80 has a 16-bit stack pointer too...

~~~
hyperpallium
\oblig HN pedantry: In this specific simplified problem, I think you could use
stripes for Z80, with x in (HL), flag in (DE), and use literals for max/min
(as in 6502) instead of D/E? But it would still be much slower than the 6502,
and you run out of registers if need more than two fields of an object...

I'd forgotten (IX+n) instructions were so expensive! It's a shame, because
your example is their purpose.

I guess Z80 is a higher-level micro-coded language: easier but slower. I also
always found the orthogonal mnenonic style of Z80 much clearer, which I think
6502 could have used. e.g. "LDA $F453,X" could be "LD A,($F453+X)"

PS some references I found: 6502 addressing modes:
[http://www.dwheeler.com/6502/oneelkruns/asm1step.html](http://www.dwheeler.com/6502/oneelkruns/asm1step.html)

    
    
      LDA $F453,X  where X contains 3      

Absolute Indexed Addressing - Load the accumulator with the contents of
address $F453 + 3 = $F456.

Z80 timings:
[http://map.grauw.nl/resources/z80instr.php](http://map.grauw.nl/resources/z80instr.php)

~~~
to3m
Hmm... in this case, you might be right. You'd then need to advance 2
registers by 8 each time, and you'd be paying an extra 3 T-states per compare
for the fetch of the immediate operand, but you'd be saving 4+ cycles in
various places from never needing a DD prefix...

(Actually, MIN_X and MAX_X are supposed to be values you fetch from memory, as
you can see from the 6502 code. I have no idea why I wrote what I did for the
Z80 version. You could use self-modifying code to put constants in the right
place, though - which, thinking about it, is actually what I should have done
for the 6502 version.)

------
yuhong
It is sad that they then screwed up the Z80 by doing only 128 cycle refresh.

------
squeakynick
Didn't "The Terminator" use 6502?

Case closed :)

~~~
vardump
Terminator? Bah.

Bender used... err _will_ use a 6502.

[http://spectrum.ieee.org/semiconductors/processors/the-
truth...](http://spectrum.ieee.org/semiconductors/processors/the-truth-about-
benders-brain)

------
GPGPU
I like the 6502 instruction set better. But it does stink to have to chop up
your limited address space for I/O.

