Hacker News new | past | comments | ask | show | jobs | submit login
The Complete Spectrum ROM Disassembly (speccy.xyz)
108 points by MindGods on July 7, 2020 | hide | past | favorite | 61 comments



This is awesome. I still have a copy of the original book from 1983 on my bookshelf! This looks like a faithful reproduction with some added features to make easier to use.

Looking forward to browsing some of the other code linked to from the site - e.g. Manic Miner which was a particular favourite.

A couple of points about the Spectrum ROM.

- Spectrum BASIC was very slow especially compared to BBC Basic. I think that this was partly due to the constraints of having to fit everything into 16k vs 32k for the BBC - so code was optimised for size rather than speed.

- It included a stack based VM for floating point operations which I think was again mainly to reduce the code size (one byte for each operation rather than three bytes for a subroutine call).

Would seem appropriate to give credit to the main author of the code, Steve Vickers [1], who went on to found Jupiter Computing, makers of the Jupiter Ace.

[1] https://web.archive.org/web/20110516082258/http://www.sincus...

edit: added Steve Vickers credit.


The Spectrum used the CPU and the ULA for everything. This resulted in a cheaper price for the machine. Meanwhile the BBC had actual hardware. As an example a simple beep would tie up CPU cycles on the Spectrum whereas the BBC micro could define a sound envelope for something more than a beep and hand that off to the AY-8192 sound chip.

The Z80 had plenty of features that the BBC micro's 6502 did not have, so the code should have been quicker with fewer CPU cycles used and fewer lines of code. But the lack of supporting hardware negated this gain.

BBC BASIC was also a more complete implementation of the language. So any code written in BASIC was more efficient just because you didn't have to use your own BASIC workarounds. Being able to drop into assembly language also helped, the Spectrum lacked this option to truly turbo charge your code. The BBC used 16k for BASIC in ROM and had 16k for the OS. This enabled ROMs to be swapped out so you could run BCPL or another ROM, e.g. Fortran.

The Spectrum had BASIC and the OS in 16k with 48k available for RAM. The BBC could use more than 32k of RAM if installed, as per the later Master system that used bank switching.

I am not familiar with the code but I would be surprised if the BBC Basic ROM was anything less than a work of genius concise code. Every byte counted in those days regardless of what home computer you were programming for.


BBC Basic was a work of genius (it was written by Sophie Wilson of ARM fame) but not necessarily of concise code. I remember looking at some of the assembly in the 1980s and it was clear that code size was secondary to speed - there was some significant loop unrolling for example.

Absolutely agree on the hardware point as Sinclair's real talent was squeezing as much out of basic hardware as possible even at the expense of performance (e.g. CPU driving the screen in ZX80/81).

As a commercial decision limiting the ROM to 16k was a very good move in the end for Sinclair. It allowed full colour with 42k of spare RAM whereas the BBC only had c24k after allowing for screen memory, so in the end the Spectrum could support much larger and richer games than the BBC and (later) the Electron.


I recently was curious about the relative performance of the classic CPUs, so I made this table to compare their memcpy performance:

https://github.com/jhallen/joes-sandbox/tree/master/classic-...


On the Z80, LDIR is not as fast as unrolling the loop to produce a block of LDI instructions. You can jump into the middle of the block at count % blocksize to copy sizes which are not a round number. This was a fairly common game trick.

http://map.grauw.nl/articles/fast_loops.php#unrollingldir

The reason for this is a bit obscure, but was caused by a peculiarity of the Z80's implementation of LDIR. For each iteration it actually decrements PC by 2 and reloads the instruction. This is apparently so that the CPU can still service interrupts during the loop. Since LDIR is an "extended" instruction it has to reread 2 bytes. LDI being a single byte was quicker to read.

https://retrocomputing.stackexchange.com/a/4745


Ah, you are right, LDI is only 16 cycles.. I will update the Z-80 entry. I like the jump in the middle trick to avoid the test after each instruction.


The fastest way to memcpy by far on a Z80 is to point the stack register at your data and PUSH/POP the registers onto it. Many games used this (e.g. see 'Cobra', 1986)


I think this is true for memcpy from a constant to target- you can use "ld rr,nn" to load immediate constant then push to the destination: 21 cycles for two bytes- 10.5 cycles per byte. But not the fastest for generic copy from one memory area to another, unless I'm missing something..

     ld d,(hl)    7 cycles
     inc hl       6 cycles
     ld e,(hl)    7 cycles
     inc hl       6 cycles
     push de      11
18.5 cycles per byte, vs. 16 cycles per byte using ldi


Z80 has lots of registers; you POP from one stack, fill up as many registers as you can, then switch to the other stack and PUSH them all back. Use EXX etc to fill up the alternate registers too.


How does that interact with the NMI on the ZX Spectrum? I'm not a ZX programmer, but I've read that ZX has a NMI that fires and can't be reprogrammed.


You can't use it if the system has an NMI because it could push a return address on the stack at any time (unless there's a way to mask the "non-maskable", perhaps by disabling all devices that could generate NMIs). Luckily there was no NMI on the basic Spectrum. I believe it was only used by those game copying dongles where you would press a button to stop the system and snapshot memory.


I think you can speed up the 6502 version by moving it into the zero page and not using indexed mode. Of course, zero page memory is even more restricted than the 64kB you have in total, but that might be worth it in some cases.

Even if this isn’t in the zero page, it surprises me that those expensive x indirect loads and stores are cheaper than worth having to increase the source and destination addresses often, but that probably is because my 6502 is very rusty.

Also: I stared at that code for a while, but couldn’t figure out why that jmp immer is there. If you move the 3 instructions starting at lastpage to the end, it can be removed, can’t it? (I didn’t check, but there may be further code shuffles that do not decrease the number of instructions, but do decrease the number of branches and with it, the number of cycles spent)


I knew 6502 people would improve it :-)

>moving it into the zero page and not using indexed mode

I'm not sure I follow.. you mean restrict the memcpy to only within the zero-page?

>jmp inner..

Yeah, it doesn't matter to much, not in the inner loop. You could move the start of the code to right after the rts also..


No, moving the code into the zero page. See https://retrocomputing.stackexchange.com/a/92 for what the Basic in the Commodore 64 does.

Increasing an address is faster in the zero page, and self-modifying code is faster. That combines the two.


Oh I see:

             inc $c9        5 cycles
    00c8:    lda $xxxx      4 cycles
vs.

         lda $xxxx,x    4
         inx            2
Mine is still faster...


Luckily, I said my 6502 is rusty :-). You can win a cycle increasing the high byte of pointers, but that probably isn’t worth the zero page memory in most cases.


The 6809 could move its zero page around!


And stack too IIRC.


That's correct. The reason for this is that they foresaw using the CPU for multi-tasking with each application using it's own code, data, stack and zero page (scratch) data. That's also why it has relocatable code. A small multi-tasking OS called OS-9 was written and released for it:

https://en.wikipedia.org/wiki/OS-9


That's really interesting. Shows that optimising for each CPU presented a different challenge each time.

I remember hearing Steve Furber say that they realised that memory bandwidth was the biggest constraint in 8/16 bit CPUs so the ARM1/2 really focused on this and it shows (in fact I think they approached Intel with the idea of commissioning an 8086 with more bandwidth but were turned away - funny how things work out).


Yes, also no contention for video memory access on the BBC.


The spectrum also had contention for video memory access; The memory map was:

0->16K: ROM

16K->32K: "Slower" RAM (ULA/Video has preference so may delay CPU)

32K->64K: "Faster" RAM (Same RAM, but nothing other than CPU accesses it)


Yes - that's what I meant - sorry if not clear. BBC ran at full speed all the time (except when interfacing with 1Mhz peripherals). Spectrum clock speed higher but contention for video RAM. Also traditionally the 6502 is regarded as having greater throughout for same clock speed (something to do with the width of the ALU and the complexity of instruction decoding?)


From my experience (lost of Z80, much lesser 6502, but still some), it's mostly the indirect addressing modes - The Z80 had lousy IX/IY indexed access, and direct HL/DE/BC access. The 6502 had great indirect access through X and Y.

So when manipulating memory (which is .. most of the time), The Z80 often needed 3-4 instructions for what the 6502 could do in one. The z80 had more registers and some 16-bit arithmetic.


I attended lectures from Steve Vickers on Mathematical Structures in Computer Science, which is to date the hardest course or study I have ever done. I mean, I have studied advanced logic, pure maths, electronic circuit design and those were all a walk in the park compared to this. If you want to get a taste of how difficult and abstract the topic is, have a look at some of his papers, eg: https://www.cambridge.org/core/journals/journal-of-symbolic-... or https://scholar.google.com/scholar_lookup?title=Topical+cate...

Sadly I didn't get him to sign my original copy of the ZX81 Manual :-(


Re: that first one, I have no idea what I just read, but it conjures back to mind a philosophical pondering I had once, on the intended formal-relational-algebra meaning of a record not being present in an RDBMS table.

I.e. "Does non-presence of an asserted relation, mean the relation is instead refuted (i.e. is this table a total membership-function for a set whose members are its present rows)? Or does non-presence of an assertion of a relation mean that this system just hasn't been informed of that assertion, and is in a default state of not-knowing the truth value of the relation (i.e. is this table a partial membership function, defined only on asserted relations)?"

Or, to put that another way: if you were to create an RDBMS that "correctly" reflects formal relational algebra, then should a search query looking for P(X), which you haven't asserted to be true, be "satisfied" to say -P(X); or should it be "unsatisfied" / non-halting, because whatever answer it would give could potentially be wrong in the face of later learning?

...does that question have something to do with the concept of "open" vs "closed" sublocales, or am I way off?


If it's any consolation, even after doing the course I don't know either.


I am now retired and have kept only 4 or 5 technical books. This is one of them. The elegance of this code is in the way it fulfills its role and fits in the space available.


I just looked at my bookshelf and found it standing next to "Spectrum machine language for the absolute beginner" edited by William Tang, also by Melbourne House Publishers. Good memories...


> Spectrum BASIC was very slow especially compared to BBC Basic.

Yes. The commercial games were all assembly anyway. The BASIC was just to get the idea what is possible, assembly to achieve the speed.

> a stack based VM for floating point operation

Of course, even Intel 8087 FPU chip and all the FPU operations is x86 before the introduction of SSE2 were done based on the stack based code and operations. Stack based calculations indeed allow amazing code density.


Not quite all commercial games used assembly. I remember playing and modifying this game, which was written in BASIC:

https://en.m.wikipedia.org/wiki/Millionaire_(video_game)


I only managed to grasp Z80 on the Speccy after getting hold of that book on the local library, unfortunately by then I was already getting into 8086 Assembly, so it felt like a lost opportunity.

Using hexadecimal monitors wasn't that appealing to me, specially trying to find out why a checksum row wasn't correct. Much worse than finding errors on BASIC listings.


The original book presenting the disassembly was:

https://www.abebooks.co.uk/9780861611164/Complete-Spectrum-R...

by Dr Ian Logan and Dr Frank O'Hara, 1983

published by Melbourne House, here is the scan of their catalog from these times:

http://www.ourdigitalheritage.org/archive/playitagain/wp-con...

I see in the changelog of this project:

https://speccy.xyz/rom/reference/changelog.html

"20160709 The disassembly is now 'complete'.

- Added annotations from The Complete Spectrum ROM Disassembly by Dr Ian Logan and Dr Frank O'Hara"

The interesting aspect of this online version is its relation to

https://skoolkit.ca/

https://skoolkit.ca/docs/skoolkit/whatis.html


I know that Melbourne house was a (book) publisher, but mostly I think of them as the publisher of software:

https://www.filfre.net/2012/11/the-hobbit/


Ha! I couldn’t get this one in the local store in 1984, I had to do with [0], which was inferior although still very helpful. When I sort through my boxes in storage I suspect I’ll find it.

[0] https://www.computinghistory.org.uk/det/41998/The-Spectrum-M...

edit: I originally wrote that the author was Rodney Zacks but was mistaken -- googling found that it's by Richard Ross-Langley


Rodney Zaks wrote the definitive guide to the Z80: https://en.wikipedia.org/wiki/Programming_the_Z80


I have that book, even 30 years later. I've been referring to it a recently, while using an Arduino to drive a Z80 chip (pretending to be I/O ports, and RAM).


I read this book before I learned to drive... it taught me more about computer basics (e.g. how the address and data bus work) than University did many years later!


I’ve just picked up a copy of “The ZX Spectrum ULA: How to design a microcomputer“ by Chris Smith with makes a good companion read to this, though with the focus (obviously) being on the ULA.

http://www.zxdesign.info/book/


I did something like this with a friend for the TRS80 Color Computer / Dragon 32, we found some really neat stuff in there, for instance how to enable the upper 32 K RAM. Those were interesting times in the sense that it was possible to know the function of every instruction present in the memory of your computer.


My favourite part of Spectrum basic was that everything was an expression.

Go to 23*N


There is no copyright problems with this?


Who is going to try and enforce the copyright in 2020? BSkyB?


Good times indeed !

Off topic. Little quiz:

  RANDOMIZE USR 0
What for ?


OK, here's one (for the ZX81). Why did I choose ports 16509 and 16514 for the libvirt daemon (https://libvirt.org/remote.html)?


10 REM TAN

20 PRINT USR 16514

16514


Good one. TAN (the keyword, not the 3 letters) is hex C9 which is also RET in Z80 machine code. The address of the first byte after the keyword in the first statement in a ZX81 program is 16514. USR 16514 calls this address, which immediately returns. The PRINT statement then prints the result of the USR expression, which is the contents of the BC register on return, which I guess happens to contain the start address (since not otherwise set) as a side effect of the implementation.

16509 is the address of the start of the program in memory. Bytes 16509-16510 are the line number (2 byte LSB int). Bytes 16511-16512 are the length of the line. Byte 16513 is the command keyword (REM).


After thirty odd years I still have these numbers encoded into and handful of neurons somewhere.

Thank you for getting me to dig them out and great that they live on in libvirt!

ps The memory mapping reminded me that ZX81 managed to squeeze a 32 x 24 screen into a 1k total RAM. I think someone managed to write a chess program with those restrictions - but only using 8 x 8 or so of the screen.


Indeed, 1K chess by David Horne: https://en.wikipedia.org/wiki/1K_ZX_Chess

I had this chess game on tape, and I've often wondered if the two are related, but this one needed 16K: http://www.zx81stuff.org.uk/zx81/tape/ZXChessII(Black)


Not only you!


Shortest number of keypresses for a system reset?


bingo !


I remember trying to write machine code on my Speccy and it seemed that 99% of attempted runs resulted in a system reset. Always assumed that the professional games writers were using a CP/M machine or something similar with proper debugging support!


Indeed, as you can find out by magazines like RetroGaming, many were using 16 or 32 bit machines and then deploying into the Spectrum.


Ha, I thought that must be true at the time. It convinced me not to persevere. Good to hear I was right.


If I remember correctly most developers could only afford such systems after slowly establishing themselves, and they weren't cheap for 80's money.

So preservance using what the 8bit systems offered was a must anyway.


It would be interesting to recreate the full 1980s Spectrum development experience. Emulating a 1980s 16 bit machine on a modern PC, which runs a development system running, presumably, a Spectrum emulator!



Thanks - looks fascinating!


Bonus points. The difference from NEW being?


RAMTOP!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: