
Porting Sweet 16 (2004) - potomak
http://www.6502.org/source/interpreters/sweet16.htm
======
rwmj
The Sinclair ZX81 (Z80-based) could do 16 bit arithmetic fine but it had a
similar interpreted mechanism for floating point, called the Calculator Stack.

You used the single byte RST 28 Z80 instruction followed by opcodes in the
special interpreted language, with opcode 34h meaning exit back to Z80 code.

The interpreter opcodes were quite powerful, including trigonometric,
logarithms, string parsing and memory operations. As the name "Calculator
Stack" implies the operands were stored on a stack in memory (persisting
between invocations of the interpreter). However a downside was it was very
slow - visible pauses when doing certain artithmetic operations.

(More here:
[http://www.users.waitrose.com/~thunor/mmcoyzx81/chapter17.ht...](http://www.users.waitrose.com/~thunor/mmcoyzx81/chapter17.html)
)

------
tom_
Regarding the opcode dispatch: setting up the RTS in this way is quite
expensive, and (if you've got the room) you could be better off assembling a
little thunk somewher in memory. 4C 00 >SET (JMP >SET*256). You'd do this on
startup.

A JMP to this thunk costs 3 cycles, and the JMP in the thunk costs 3 cycles,
so that buys you nothing compared to the RTS. And the STx to set up the low
byte takes up 3 cycles (zero page) or 4 cycles (elsewhere), which is the same
or worse than the PHA. But because the high byte is always set up, you save
the 5 cycles spent setting that up.

(If you're running from RAM, you don't even need the thunk.)

(Also: the opcode dispatch's EOR trick is space-efficient, but takes an extra
cycle - and one fewer bytes, I won't deny - compared to doing a TAY after
fetching the byte, then a TYA:AND $F0 later. That sequence takes 6 cycles,
whereas the LSR:EOR (R15L),Y sequence takes 7 or 8.)

~~~
kazinator
What do you think about using 6C, indirect jump? Instead of a little thunk,
some 16 bits of data can be set aside in a fixed location. We mutate only the
low order address and then do an indirect jump through it.

    
    
       LDA  OPTBL-2,Y
       STA  OPADDR
       JMP  (OPADDR)
    

The contents of OPADDR+1 is initialized once on entry into the interpreter. Or
perhaps statically.

Another thing would be self-modifying code (if we can forgo ROM-ming this,
which Woz couldn't): the interpreter mutates the operand of an immediate JMP
instruction to set up the address. That instruction then simply follows; there
is no need to branch to it. Same as your thunk, but placed inline.

Ah, the first machine language program I wrote was on the 6502 and used self-
modifying code to march through the graphics buffer. Indexed addressing modes
were the next chapter in the Rodney Zaks book.

~~~
tom_
Yes, 5 cycles, good point... that's much better. My excuse for not thinking of
this very obvious improvement is that I never wrote any speed-sensitive ROM
code ;)

I doubt I ever used JMP indirect. For this sort of thing running from RAM, I'd
typically use self-modifying code and a JMP absolute, which is where the idea
of having a little thunk came from.

------
kazinator
> FOLLOWING CODE MUST BE CONTAINED ON A SINGLE PAGE!

(Page == 256 byte block). Because the op dispatch table stores only the low-
order byte of the opcode address; the high order byte is fixed in the
interpreter. I think the last function could spill past the end of the page;
its starting address just has to be in the page.

------
1over137
From the title and URL, I mistakenly thought this was about the Sweet16 Apple
IIgs emulator (for macOS) getting an update. That would have been nice,
because it's a 32 bit only app and so won't work past macOS 10.14 Mojave. :(

