Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it not completely crazy that this is even possible? This sounds like wizardry to me.



Qemu does something similar. And Box86 does that while keeping 3D acceleration on converting X86->ARM linux binaries.

https://github.com/ptitSeb/box86


It also depends on how strictly you want preserve the behaviour of the asm.


Moore's law has blessed all of us with nearly infinite compute relative to systems from a decade ago.

Now, that we've relentlessly squandered all of it, well, that's on us.


Cries in Electron


It's quite common way to get good performance out of emulated code, althought usually it's JIT.

But yeah, it is crazy complex


In addition to the examples other commenters have given, it’s also how Rosetta 2 works to translate x86_64 to arm64 for supporting Intel binaries on M1/M2 Macs


Well, think about it this way - suppose you have a block of C code and you compile it to object code, like:

    if (hitByMissile == True) {
      lives--;
      if (lives == 0) playing = False;
    }
    ; next address is 0x8f02
it might turn into:

    ld a, (4011h)
    jr z, 8f02h
    ld a, (4020h)
    dec a
    ld (4020h), a
    jr z, 8f02h
    xor a, a
    ld (0x4021h), a
    ; next address is 8f02h
That might disassemble back to C as:

    if (var4011 != 0) {
      var4020--;
      if (var4020 == 0) {
        var4021 == 0;
      }
    }
like that. Less readable, because we have no variable names, and not identical to the source code but probably close enough, but the important thing is we don't care if we understand it or not. If we declare a variable that our original code stores at 4020h in a 16-bit address space, but it ends up somewhere wildly different in our 32-bit address space because we're recompiling on a newer machine, we don't care - we just care that the name gets used consistently.

If you then read through the disassembled source you could start to piece together what the variables are, though.


It's not so much the variables, but that compilation must be a lossy process. There may be many ways to interpret the assembly, and the compiler might generate different asm than the one that was decompiled.


Much less lossy than you probably would think if you've never dug through what your typical compiler outputs. There usually aren't that many ways to interpret the assembly and it doesn't matter whether it generates different assembly as long as it does the same thing.

The typical code generator for a compiler uses all kinds of boilerplate for common constructs (loops, function calls, data access) and once you know about these you can usually recognize them on sight.


Hmm, that's interesting, thanks Jaques.


np

Optimizing compilers can make this quite a bit harder by the way, extra passes that do all kinds of reshuffling to get rid of instructions, to combine them and to move things from memory into registers.

You can usually tell compilers to output assembly code, doing that for a program that you wrote yourself is a good exercise to see how your high level code translates into lower level code. And with optimization off all you see is the code generator's output.


Yeah this is where you start to see decompilers outputting C code that just looks like assembler written out as C. There was one I used to use years ago - can't remember the name, some odd commercial thing from one of the many commercial C products that never really took off - that would do a good job *mostly* but at some point start outputting blocks of code with lots of `register` variables in, and you knew it had gone out to lunch on that bit.


Sourcer? If so I used to have that one, trying to remember if it did C as well or just asm. I eventually wrote my own multi-pass disassembler.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: