Is it not completely crazy that this is even possible? This sounds like wizardry...

anthk · on July 13, 2023

Qemu does something similar. And Box86 does that while keeping 3D acceleration on converting X86->ARM linux binaries.

https://github.com/ptitSeb/box86

actionfromafar · on July 13, 2023

It also depends on how strictly you want preserve the behaviour of the asm.

foobiekr · on July 13, 2023

Moore's law has blessed all of us with nearly infinite compute relative to systems from a decade ago.

Now, that we've relentlessly squandered all of it, well, that's on us.

stavros · on July 13, 2023

Cries in Electron

ilyt · on July 13, 2023

It's quite common way to get good performance out of emulated code, althought usually it's JIT.

But yeah, it is crazy complex

dagmx · on July 13, 2023

In addition to the examples other commenters have given, it’s also how Rosetta 2 works to translate x86_64 to arm64 for supporting Intel binaries on M1/M2 Macs

Gordonjcp · on July 13, 2023

Well, think about it this way - suppose you have a block of C code and you compile it to object code, like:

    if (hitByMissile == True) {
      lives--;
      if (lives == 0) playing = False;
    }
    ; next address is 0x8f02

it might turn into:

    ld a, (4011h)
    jr z, 8f02h
    ld a, (4020h)
    dec a
    ld (4020h), a
    jr z, 8f02h
    xor a, a
    ld (0x4021h), a
    ; next address is 8f02h

That might disassemble back to C as:

    if (var4011 != 0) {
      var4020--;
      if (var4020 == 0) {
        var4021 == 0;
      }
    }

like that. Less readable, because we have no variable names, and not identical to the source code but probably close enough, but the important thing is we don't care if we understand it or not. If we declare a variable that our original code stores at 4020h in a 16-bit address space, but it ends up somewhere wildly different in our 32-bit address space because we're recompiling on a newer machine, we don't care - we just care that the name gets used consistently.

If you then read through the disassembled source you could start to piece together what the variables are, though.

stavros · on July 13, 2023

It's not so much the variables, but that compilation must be a lossy process. There may be many ways to interpret the assembly, and the compiler might generate different asm than the one that was decompiled.

jacquesm · on July 13, 2023

Much less lossy than you probably would think if you've never dug through what your typical compiler outputs. There usually aren't that many ways to interpret the assembly and it doesn't matter whether it generates different assembly as long as it does the same thing.

The typical code generator for a compiler uses all kinds of boilerplate for common constructs (loops, function calls, data access) and once you know about these you can usually recognize them on sight.

stavros · on July 13, 2023

Hmm, that's interesting, thanks Jaques.

jacquesm · on July 13, 2023

np

Optimizing compilers can make this quite a bit harder by the way, extra passes that do all kinds of reshuffling to get rid of instructions, to combine them and to move things from memory into registers.

You can usually tell compilers to output assembly code, doing that for a program that you wrote yourself is a good exercise to see how your high level code translates into lower level code. And with optimization off all you see is the code generator's output.

Gordonjcp · on July 13, 2023

Yeah this is where you start to see decompilers outputting C code that just looks like assembler written out as C. There was one I used to use years ago - can't remember the name, some odd commercial thing from one of the many commercial C products that never really took off - that would do a good job *mostly* but at some point start outputting blocks of code with lots of `register` variables in, and you knew it had gone out to lunch on that bit.

jacquesm · on July 14, 2023

Sourcer? If so I used to have that one, trying to remember if it did C as well or just asm. I eventually wrote my own multi-pass disassembler.