This is awesome. I still have a copy of the original book from 1983 on my bookshelf! This looks like a faithful reproduction with some added features to make easier to use.
Looking forward to browsing some of the other code linked to from the site - e.g. Manic Miner which was a particular favourite.
A couple of points about the Spectrum ROM.
- Spectrum BASIC was very slow especially compared to BBC Basic. I think that this was partly due to the constraints of having to fit everything into 16k vs 32k for the BBC - so code was optimised for size rather than speed.
- It included a stack based VM for floating point operations which I think was again mainly to reduce the code size (one byte for each operation rather than three bytes for a subroutine call).
Would seem appropriate to give credit to the main author of the code, Steve Vickers [1], who went on to found Jupiter Computing, makers of the Jupiter Ace.
The Spectrum used the CPU and the ULA for everything. This resulted in a cheaper price for the machine. Meanwhile the BBC had actual hardware. As an example a simple beep would tie up CPU cycles on the Spectrum whereas the BBC micro could define a sound envelope for something more than a beep and hand that off to the AY-8192 sound chip.
The Z80 had plenty of features that the BBC micro's 6502 did not have, so the code should have been quicker with fewer CPU cycles used and fewer lines of code. But the lack of supporting hardware negated this gain.
BBC BASIC was also a more complete implementation of the language. So any code written in BASIC was more efficient just because you didn't have to use your own BASIC workarounds. Being able to drop into assembly language also helped, the Spectrum lacked this option to truly turbo charge your code.
The BBC used 16k for BASIC in ROM and had 16k for the OS. This enabled ROMs to be swapped out so you could run BCPL or another ROM, e.g. Fortran.
The Spectrum had BASIC and the OS in 16k with 48k available for RAM. The BBC could use more than 32k of RAM if installed, as per the later Master system that used bank switching.
I am not familiar with the code but I would be surprised if the BBC Basic ROM was anything less than a work of genius concise code. Every byte counted in those days regardless of what home computer you were programming for.
BBC Basic was a work of genius (it was written by Sophie Wilson of ARM fame) but not necessarily of concise code. I remember looking at some of the assembly in the 1980s and it was clear that code size was secondary to speed - there was some significant loop unrolling for example.
Absolutely agree on the hardware point as Sinclair's real talent was squeezing as much out of basic hardware as possible even at the expense of performance (e.g. CPU driving the screen in ZX80/81).
As a commercial decision limiting the ROM to 16k was a very good move in the end for Sinclair. It allowed full colour with 42k of spare RAM whereas the BBC only had c24k after allowing for screen memory, so in the end the Spectrum could support much larger and richer games than the BBC and (later) the Electron.
On the Z80, LDIR is not as fast as unrolling the loop to produce a block of LDI instructions. You can jump into the middle of the block at count % blocksize to copy sizes which are not a round number. This was a fairly common game trick.
The reason for this is a bit obscure, but was caused by a peculiarity of the Z80's implementation of LDIR. For each iteration it actually decrements PC by 2 and reloads the instruction. This is apparently so that the CPU can still service interrupts during the loop. Since LDIR is an "extended" instruction it has to reread 2 bytes. LDI being a single byte was quicker to read.
The fastest way to memcpy by far on a Z80 is to point the stack register at your data and PUSH/POP the registers onto it. Many games used this (e.g. see 'Cobra', 1986)
I think this is true for memcpy from a constant to target- you can use "ld rr,nn" to load immediate constant then push to the destination: 21 cycles for two bytes- 10.5 cycles per byte. But not the fastest for generic copy from one memory area to another, unless I'm missing something..
Z80 has lots of registers; you POP from one stack, fill up as many registers as you can, then switch to the other stack and PUSH them all back. Use EXX etc to fill up the alternate registers too.
You can't use it if the system has an NMI because it could push a return address on the stack at any time (unless there's a way to mask the "non-maskable", perhaps by disabling all devices that could generate NMIs). Luckily there was no NMI on the basic Spectrum. I believe it was only used by those game copying dongles where you would press a button to stop the system and snapshot memory.
I think you can speed up the 6502 version by moving it into the zero page and not using indexed mode. Of course, zero page memory is even more restricted than the 64kB you have in total, but that might be worth it in some cases.
Even if this isn’t in the zero page, it surprises me that those expensive x indirect loads and stores are cheaper than worth having to increase the source and destination addresses often, but that probably is because my 6502 is very rusty.
Also: I stared at that code for a while, but couldn’t figure out why that jmp immer is there. If you move the 3 instructions starting at lastpage to the end, it can be removed, can’t it? (I didn’t check, but there may be further code shuffles that do not decrease the number of instructions, but do decrease the number of branches and with it, the number of cycles spent)
Luckily, I said my 6502 is rusty :-). You can win a cycle increasing the high byte of pointers, but that probably isn’t worth the zero page memory in most cases.
That's correct. The reason for this is that they foresaw using the CPU for multi-tasking with each application using it's own code, data, stack and zero page (scratch) data. That's also why it has relocatable code. A small multi-tasking OS called OS-9 was written and released for it:
That's really interesting. Shows that optimising for each CPU presented a different challenge each time.
I remember hearing Steve Furber say that they realised that memory bandwidth was the biggest constraint in 8/16 bit CPUs so the ARM1/2 really focused on this and it shows (in fact I think they approached Intel with the idea of commissioning an 8086 with more bandwidth but were turned away - funny how things work out).
Yes - that's what I meant - sorry if not clear. BBC ran at full speed all the time (except when interfacing with 1Mhz peripherals). Spectrum clock speed higher but contention for video RAM. Also traditionally the 6502 is regarded as having greater throughout for same clock speed (something to do with the width of the ALU and the complexity of instruction decoding?)
From my experience (lost of Z80, much lesser 6502, but still some), it's mostly the indirect addressing modes - The Z80 had lousy IX/IY indexed access, and direct HL/DE/BC access. The 6502 had great indirect access through X and Y.
So when manipulating memory (which is .. most of the time), The Z80 often needed 3-4 instructions for what the 6502 could do in one. The z80 had more registers and some 16-bit arithmetic.
I attended lectures from Steve Vickers on Mathematical Structures in Computer Science, which is to date the hardest course or study I have ever done. I mean, I have studied advanced logic, pure maths, electronic circuit design and those were all a walk in the park compared to this. If you want to get a taste of how difficult and abstract the topic is, have a look at some of his papers, eg: https://www.cambridge.org/core/journals/journal-of-symbolic-... or https://scholar.google.com/scholar_lookup?title=Topical+cate...
Sadly I didn't get him to sign my original copy of the ZX81 Manual :-(
Re: that first one, I have no idea what I just read, but it conjures back to mind a philosophical pondering I had once, on the intended formal-relational-algebra meaning of a record not being present in an RDBMS table.
I.e. "Does non-presence of an asserted relation, mean the relation is instead refuted (i.e. is this table a total membership-function for a set whose members are its present rows)? Or does non-presence of an assertion of a relation mean that this system just hasn't been informed of that assertion, and is in a default state of not-knowing the truth value of the relation (i.e. is this table a partial membership function, defined only on asserted relations)?"
Or, to put that another way: if you were to create an RDBMS that "correctly" reflects formal relational algebra, then should a search query looking for P(X), which you haven't asserted to be true, be "satisfied" to say -P(X); or should it be "unsatisfied" / non-halting, because whatever answer it would give could potentially be wrong in the face of later learning?
...does that question have something to do with the concept of "open" vs "closed" sublocales, or am I way off?
I am now retired and have kept only 4 or 5 technical books. This is one of them. The elegance of this code is in the way it fulfills its role and fits in the space available.
I just looked at my bookshelf and found it standing next to "Spectrum machine language for the absolute beginner" edited by William Tang, also by Melbourne House Publishers. Good memories...
> Spectrum BASIC was very slow especially compared to BBC Basic.
Yes. The commercial games were all assembly anyway. The BASIC was just to get the idea what is possible, assembly to achieve the speed.
> a stack based VM for floating point operation
Of course, even Intel 8087 FPU chip and all the FPU operations is x86 before the introduction of SSE2 were done based on the stack based code and operations. Stack based calculations indeed allow amazing code density.
I only managed to grasp Z80 on the Speccy after getting hold of that book on the local library, unfortunately by then I was already getting into 8086 Assembly, so it felt like a lost opportunity.
Using hexadecimal monitors wasn't that appealing to me, specially trying to find out why a checksum row wasn't correct. Much worse than finding errors on BASIC listings.
Ha! I couldn’t get this one in the local store in 1984, I had to do with [0], which was inferior although still very helpful. When I sort through my boxes in storage I suspect I’ll find it.
I have that book, even 30 years later. I've been referring to it a recently, while using an Arduino to drive a Z80 chip (pretending to be I/O ports, and RAM).
I read this book before I learned to drive... it taught me more about computer basics (e.g. how the address and data bus work) than University did many years later!
I’ve just picked up a copy of “The ZX Spectrum ULA: How to design a microcomputer“ by Chris Smith with makes a good companion read to this, though with the focus (obviously) being on the ULA.
I did something like this with a friend for the TRS80 Color Computer / Dragon 32, we found some really neat stuff in there, for instance how to enable the upper 32 K RAM. Those were interesting times in the sense that it was possible to know the function of every instruction present in the memory of your computer.
Good one. TAN (the keyword, not the 3 letters) is hex C9 which is also RET in Z80 machine code. The address of the first byte after the keyword in the first statement in a ZX81 program is 16514. USR 16514 calls this address, which immediately returns. The PRINT statement then prints the result of the USR expression, which is the contents of the BC register on return, which I guess happens to contain the start address (since not otherwise set) as a side effect of the implementation.
16509 is the address of the start of the program in memory. Bytes 16509-16510 are the line number (2 byte LSB int). Bytes 16511-16512 are the length of the line. Byte 16513 is the command keyword (REM).
After thirty odd years I still have these numbers encoded into and handful of neurons somewhere.
Thank you for getting me to dig them out and great that they live on in libvirt!
ps The memory mapping reminded me that ZX81 managed to squeeze a 32 x 24 screen into a 1k total RAM. I think someone managed to write a chess program with those restrictions - but only using 8 x 8 or so of the screen.
I remember trying to write machine code on my Speccy and it seemed that 99% of attempted runs resulted in a system reset. Always assumed that the professional games writers were using a CP/M machine or something similar with proper debugging support!
It would be interesting to recreate the full 1980s Spectrum development experience. Emulating a 1980s 16 bit machine on a modern PC, which runs a development system running, presumably, a Spectrum emulator!
Looking forward to browsing some of the other code linked to from the site - e.g. Manic Miner which was a particular favourite.
A couple of points about the Spectrum ROM.
- Spectrum BASIC was very slow especially compared to BBC Basic. I think that this was partly due to the constraints of having to fit everything into 16k vs 32k for the BBC - so code was optimised for size rather than speed.
- It included a stack based VM for floating point operations which I think was again mainly to reduce the code size (one byte for each operation rather than three bytes for a subroutine call).
Would seem appropriate to give credit to the main author of the code, Steve Vickers [1], who went on to found Jupiter Computing, makers of the Jupiter Ace.
[1] https://web.archive.org/web/20110516082258/http://www.sincus...
edit: added Steve Vickers credit.