Regarding the 6502 instruction set linked in the article, please mind that this list doesn't cover any of the "illegal opcodes", which matters for the VCS, AKA Atari 2600, as these were/are often used to optimize the instruction count of game kernels. (Author of that page here.)
Not sure if this approach would work for CPUs with bigger instruction sets/more complex instructions, e.g. the Gameboy CPU "DAA" instruction [1] is notorious for being tricky to implement when writing a GB emulator
I guess tackling the tedious/easy to implement ones with this code generation method would be helpful, leaving the more complicated ones to be tackled manually though!
I'm using a code generation approach both for 6502 and Z80, not as extreme and elegant as demonstrated here though.
Instead of a pure data description I have python scripts which generate C source code. The 6502 is perfect for code generation because instructions are very uniform, and the "interesting" part of instructions are the addressing modes which always run the same sequence of operations in front of the actual instruction-specific "payload". The Z80 instruction set has many more special cases, but it can be decoded "algorithmically" as well, see here:
My Z80 emulator basically implements this "recipe" in python, and generates a huge "unrolled" switch-case statement with one case-branch per instruction (ok not quite, the CB prefix instruction range is still decoded algorithmically to reduce the resulting binary code size a bit).
Complex instruction logic like DAA are still essentially hand-written C functions though, the code generation mainly helps with the "mundane" parts of an instruction, like opcode fetch, and regular memory load/store machine cycles.
One nice side effect of using code generation is that it is very easy to create variations of the emulator. For instance I created a cycle-stepped version of my 6502 emulator (versus the previous instruction-stepped version) with surprisingly few changes to the code-generation script.
PPS: an interesting approach (which I haven't tried) for 6502 emulation would be to use the 6502's decode ROM (aka PLA) as the "base-data" for code generation.
I personally consider the approach very solid from an engineering perspective, however, it's definitely a hell of a work to implement: with more complex ISAs, e.g. Sharp LR25902 (the Gameboy Classic's), it becomes a very demanding work of balancing right level of abstraction; lowering the level too much will cause the generator to be intricate, raising it too much will make the generator too difficult to recycle for other architectures.
I did something very similar, going as far as automatically generating the test suites (note that this is a WIP):
The metadata itself is generated from another generator (!), then the required code is manually added, along with the autogenerated one.
I think the 6502 is a very good target for this type of work, as (AFAIK) it's very common in commercial contexts, so that the documentation is more rigorous. Since the GBC CPU was used just for a now obsolete console, the information sources are more or less hacked together.
This really should be referred to as the Sharp SM83, as original Sharp documentation has now been found for the microcontroller that was used in the Gameboy's SoC.
Curious, for NES emulation, the most accurate emulators also simulate minor timing inaccuracies to allow for certain hacks to take place (if memory serves).
How is this handled in such code-generation projects?