I got very excited when I saw the description of the video "Conversion of game into C++ with cicoparser and IDA disassembler". I thought "neat, a new decompiler".
But then I understood what CicoParser is doing: it translates machine instructions into C-statements, i.e. when your binary contains an instruction like "mov 123,sp", the output will be a C source file with a statement "memory16(_ds,123)=_sp;". On the github page, they say it is not a CPU emulator, but I would rather say it is a CPU emulator with AOT compilation of the binary.
If it was CPU emulator, it would update all the flags everytime performing any ALU operation (I have seen this approach in one source-to-source compiler). Actually, there is not much you can do: If the instruction stores SP into DS:123, it converts the instruction into simple assignment *((WORD*)&memory[ds*16+123]) = sp. All the ALU operations are directly calculated using the target instruction set, the flags register is updated only when necessary. Nor the memory is emulated, it directly accessess the memory buffer (in the video there are just extra range checks, even the *16 operation can be optimized replacing ds/es with memory pointers). Only thing that is emulated is the EGA adapter.
Thanks for the explanation. Very nice project. I guess self-modifying code does not work with this technique, does it? (I don't know much about DOS games and how common self-modifying code was on PCs).
Concerning access to video memory: I saw that you treat them "manually" in some cases. I am wondering whether you could avoid that by using virtual memory. You could mark the pages as invalid and when an instruction tries to access them, you catch it and replace the memory[...] access instruction by a call to memoryVideoGet. The JIT-Compiler of the Amiga emulator uses a similar technique for indirect accesses to hardware registers.
Good point. In the set of games (10 in total, release date up to 1991) I was porting I found only one that used this nasty technique. And it was just rewriting only single byte of code (something like rewriting nop instruction into return). So very simple case so far. Of course using cicoparser doesn't mean that you get working code without any manual work. You will always need to fix some issues by hand.
Virtual memory does not solve anything in this case. Writing to EGA video ram means that you want to display some pixels. But the write operation goes through some extra logic which decides what to do with the byte being written (extra rotation, masking...) and by reading the same addrees you are not guaranteed to get the same value back. EGA control registers handle this process and you simply need to emulate this behaviour somehow.
Probably depends on your definition of a decompiler. For me, a decompiler reverses to some extend the operation of a compiler. Variables instead of registers, function call arguments instead of stack pushs, etc.
Of course, you could also say that a decompiler is any tool that produces something from a binary that you can compile again. But in that case, I could claim that this here is also a decompiled program:
byte[] programbinary={ put binary of the program here };
runEmulator(programbinary);
A compiler has optimization steps. Rather than going straight from human readable C to binary, it compiles to an IR and then uses some heuristics to create binary that is more efficient for the machine to execute.
I feel like you're effectively asking for an optimization step. Decompile to an IR, and then use some heuristics to get back to C that is more efficient for humans to read.
And if a compiler without an optimizer is still a compiler, then a decompiler without an "optimizer" should still be called a decompiler.
There's similar set of tools by notaz[1] that were used to static recompile starcraft, diablo, diablo 2, and jazz jackrabbit games to ARM Linux. You can read more about the recompilation here[2].
Should work with linux without any problem - cicoparser was initially developed on windows and does not use any libraries besides std, you can build it using gcc compiler... Host application is based on SDL2, so it should work really anywhere without any extra work
We've started reverse engineering Dune a bit for a ScummVM engine and I think it would pose a severe challenge to cicoparser.
All the areas where you currently need to manually change cicoparser, Dune does everywhere. It jumps based on flags, makes indirect jumps and calls (Dune dynamically loads in video, pcm audio and midi drivers), it jumps wildly around, into, and out of the middle of other procedures, and so on. I'd love to see you try, though.
Secondly, I don't really see a great advantage of cicoparser over emulating the CPU. You've converted the disassembled code to assembler-in-C, but you haven't reverse engineered the application. If I've successfully converted an application with cicoparser, yes, I can run it, but I haven't learned much about how the original application works. It can be a good starting point for reverse engineering though.
We had an engine in ScummVM that was created with a similar process. The application was organized well enough (with clear function boundaries, etc.) that the engine could gradually be transitioned to proper C.
But then I understood what CicoParser is doing: it translates machine instructions into C-statements, i.e. when your binary contains an instruction like "mov 123,sp", the output will be a C source file with a statement "memory16(_ds,123)=_sp;". On the github page, they say it is not a CPU emulator, but I would rather say it is a CPU emulator with AOT compilation of the binary.