I got very excited when I saw the description of the video "Conversion of game into C++ with cicoparser and IDA disassembler". I thought "neat, a new decompiler".
But then I understood what CicoParser is doing: it translates machine instructions into C-statements, i.e. when your binary contains an instruction like "mov 123,sp", the output will be a C source file with a statement "memory16(_ds,123)=_sp;". On the github page, they say it is not a CPU emulator, but I would rather say it is a CPU emulator with AOT compilation of the binary.
If it was CPU emulator, it would update all the flags everytime performing any ALU operation (I have seen this approach in one source-to-source compiler). Actually, there is not much you can do: If the instruction stores SP into DS:123, it converts the instruction into simple assignment *((WORD*)&memory[ds*16+123]) = sp. All the ALU operations are directly calculated using the target instruction set, the flags register is updated only when necessary. Nor the memory is emulated, it directly accessess the memory buffer (in the video there are just extra range checks, even the *16 operation can be optimized replacing ds/es with memory pointers). Only thing that is emulated is the EGA adapter.
Thanks for the explanation. Very nice project. I guess self-modifying code does not work with this technique, does it? (I don't know much about DOS games and how common self-modifying code was on PCs).
Concerning access to video memory: I saw that you treat them "manually" in some cases. I am wondering whether you could avoid that by using virtual memory. You could mark the pages as invalid and when an instruction tries to access them, you catch it and replace the memory[...] access instruction by a call to memoryVideoGet. The JIT-Compiler of the Amiga emulator uses a similar technique for indirect accesses to hardware registers.
Good point. In the set of games (10 in total, release date up to 1991) I was porting I found only one that used this nasty technique. And it was just rewriting only single byte of code (something like rewriting nop instruction into return). So very simple case so far. Of course using cicoparser doesn't mean that you get working code without any manual work. You will always need to fix some issues by hand.
Virtual memory does not solve anything in this case. Writing to EGA video ram means that you want to display some pixels. But the write operation goes through some extra logic which decides what to do with the byte being written (extra rotation, masking...) and by reading the same addrees you are not guaranteed to get the same value back. EGA control registers handle this process and you simply need to emulate this behaviour somehow.
Probably depends on your definition of a decompiler. For me, a decompiler reverses to some extend the operation of a compiler. Variables instead of registers, function call arguments instead of stack pushs, etc.
Of course, you could also say that a decompiler is any tool that produces something from a binary that you can compile again. But in that case, I could claim that this here is also a decompiled program:
byte[] programbinary={ put binary of the program here };
runEmulator(programbinary);
A compiler has optimization steps. Rather than going straight from human readable C to binary, it compiles to an IR and then uses some heuristics to create binary that is more efficient for the machine to execute.
I feel like you're effectively asking for an optimization step. Decompile to an IR, and then use some heuristics to get back to C that is more efficient for humans to read.
And if a compiler without an optimizer is still a compiler, then a decompiler without an "optimizer" should still be called a decompiler.
But then I understood what CicoParser is doing: it translates machine instructions into C-statements, i.e. when your binary contains an instruction like "mov 123,sp", the output will be a C source file with a statement "memory16(_ds,123)=_sp;". On the github page, they say it is not a CPU emulator, but I would rather say it is a CPU emulator with AOT compilation of the binary.