
High Performance Z80 Emulation - ingve
https://floooh.github.io/2017/12/10/z80-emu-evolution.html
======
danaliv
I've gone deep in the hole on Z80 emulation more than a few times as a result
of my all-time favorite video game (Phantasy Star on the Sega Master System).
I wrote some debugger emulators just to step through the game's code and
analyze it.

If you've never tried to write an emulator I highly recommend it. In my
opinion it's an exercise on par with writing a compiler, in terms of how much
one can learn about these machines we work with and just developing a richer
appreciation for the craft of software.

------
mikepavone
> So currently I don’t see how JIT-ing could work with reasonable programming
> effort and performance for emulating entire 8-bit computer systems.

I don't know about the reasonable effort part, but I was able to get decent
performance with this approach in BlastEm. It takes about 33.5s to run zexdoc
on my laptop, which is faster than even the "instruction granularity" version
in the article (~37.5s) despite running on a slower machine (2.2GHz base clock
vs 2.8GHz base clock). It can pause both at the beginning of an instruction
and at any non-instruction fetch memory access. Unfortunately, it can't
currently accommodate wait-states on instruction fetch.

Fixing that is on my to-do list and I have a decent idea of how to do it
without killing performance. I'm also flirting with a re-write of that will
enable better performance generally. My current approach is pretty naive
(doesn't even do redundant flag calculation elimination), but my handling of
self-modifying code makes it hard to safely improve that without some fairly
major changes.

------
billforsternz
In the very early days of general purpose 16 bit computing there was a lot of
interest in emulating the Z80 and 8080 since the huge amount of legacy 8 bit
code was still very important. Unfortunately the emulations were inherently
around 10 times slower than native code since around 10 overhead instructions
(a jump table, save and restore flags, and a loop) were needed for each
emulated instruction.

I made my own emulator with no jump table or loop, and so no need to save and
restore flags. It comprised just three instructions.

    
    
      lodsb
      mov ah,al
      jmp ax
    

These instructions were appended to the emulation code for each of the 256
opcodes which were distributed sparsely through 64K of memory at 0x0000,
0x0101, 0x0202 etc. (no such thing as a free lunch - spending 64K of RAM to
get some speed seemed like a daring decision at the time). So my emulator was
inherently only 4 times slower than native, and faster PC hardware made up the
difference much earlier for me than for people stuck with 10 times slower
emulators.

Of course these days it would probably mean the difference between running the
8 bit code 1000 times faster or 2000 times faster than native (something like
that). Rather than going as fast as possible, the challenge now is accurately
reproducing the original timing, as discussed in the article.

I'll quietly admit that the 8 bit software I was using extensively at the time
ran happily on an 8080, and I never did all the extra work required to get Z80
emulation.

------
Zardoz84
I did something similar on the Trillek virtual computer, but without the trick
of the 64bit int to store the bus state. Instead I used a bit high level
aproximation as the CPU is far more simple, there isn't a IO address space,
and there isn't wait states. What I did was a tick-instruction hybrid approach
instead a tick-instruction cycle hybrid.

------
apple4ever
That's so awesome. I love the Z80 because its so simple yet also complex.

Like others, my dream is writing a Z80 emulator, even though others exist,
just to do it. I'm going to write mine in Objective-C (my favorite language)
to provide and interface to emulators (like a TI-8X one).

Reading this just gets me excited.

------
magnat
Would it be possible to emulate whole Z80 or CPC platform at a
transistor/circuit level (as in Visual 6502 [1]) in real time on a modern PC
with a GPGPU?

[1] [http://www.visual6502.org/JSSim/](http://www.visual6502.org/JSSim/)

~~~
speps
IIRC the Visual 6502 talk, it's just visual so basically each simulation (it's
not emulation) step can be a GPU compute dispatch.

------
rogual
I'm sorry but I found the constant comma splices genuinely distracting.

~~~
kagebe
A common mistake for us Germans. Especially if one hasn't worked in an
English-"speaking" environment (US/UK/..., international company or publishing
in English academic proceedings/journals) and/or learned how to write
idiomatic English. Abusing and over-using relative clauses is a favorite
speaking and writing technique in these parts. ;)

~~~
rogual
It's common among native speakers too. You might even say it's stopped being
'wrong' and started to become just 'informal'. I just find it really jarring,
unless the clauses are really short.

I hope my comment didn't come across as anything other than the constructive
criticism it was intended as.

~~~
flohofwoe
No worries! The blog post came out much longer than I intended. I was writing
it in one sitting and when I was finally done I didn't feel like going over
and editing it. If I had more time, I would have written a shorter blog post
(and sentences) ;)

