It would be a little adventure to try to figure out how it was conceived, as although it looks like something a superoptimiser would produce, it was around before superoptimisation, and I think it was generated by a human likely related to the demoscene.
This was a well known trick for 8080s and Z80s as far back as the 1970s. Back then there were a lot of assembly programmers, tricks like this were stock in trade.
Reminds me of optimizing microcode back in uni... From the reference implementation of 134 microinstructions down to 61, and it was far more cycle-efficient as well (progressive decoding and Huffman encoded micro representation given sample programs). Since it was a contest for two goals of extra credit, people were pissed that our team captured both. It took about 30 hours to implement.
Microinstructione are like a CPU's firmware which implements the external-facing macroinstruction set into a simpler set of microinstructions which coordinate various internal state.
http://dflund.se/~john_e/gems/gem003a.html
It would be a little adventure to try to figure out how it was conceived, as although it looks like something a superoptimiser would produce, it was around before superoptimisation, and I think it was generated by a human likely related to the demoscene.