

Practical and Portable X86 Recompilation (2014) - evanpw
http://mp2.dk/blog/blog/2014/04/14/practical-and-portable-binary-recompilation/

======
ksherlock
Interesting... I've also seen it done for the 6502/nintendo[0], but self
modifying code was a deal breaker.

[http://andrewkelley.me/post/jamulator.html](http://andrewkelley.me/post/jamulator.html)

------
ggambetta
Very cool idea. Reminds me of the Emulator-Backed Remakes [1] thing I did a
few months ago. Glad to see other people thinking in similar directions :)

[1]
[http://gabrielgambetta.com/remakes.html](http://gabrielgambetta.com/remakes.html)

------
Sir_Cmpwn
Having worked a lot in assembly and C and such, I imagine it'd probably be a
lot easier (and more performant, probably) to write something that translates
x86 instructions into x64 instructions. This is really interesting all the
same, though, because it enables you to potentially support as many targets as
your C compiler supports.

I once did something similar by converting the Minecraft server to .NET with
IKVM and hooking into its terrain generator.

------
falcolas
Sweet project, and awesome walkthrough, but this strikes me as less of a
recompilation and more of building a virtual x86 machine whose instructions
are defined at compile time.

I imagine that with a perfect optimizing compiler, this distinction would go
away, but we're nowhere near that point.

Is it not possible to abstract away some of those calls back into native C++
code (i.e. avoiding the need for cpu and memory classes)?

------
tbirdz
It is an interesting idea/project, but can it really be considered Reverse
Engineering? It seems like it is just translating the original instructions
into a C++ form. If the goal is to build an open implementation of the
CubeWorld server, then I don't see how this helps. It seems you are still just
running the original proprietary code, albeit translated in format.

Additionally one of the goals in reverse engineering is to understand how the
algorithms actually work and get a deeper understanding beyond the "black-box"
level. This technique doesn't really provide new insight into the code, it
just transforms it from one kind of closed binary blob to another kind of
closed binary blob.

Again, I don't mean to denigrate the work you have accomplished here, but I
would hesitate to classify it as "Reverse Engineering"

~~~
drv
It's certainly a neat hack, but legally I would avoid this approach unless the
original creator is cooperative or at least no longer around. Distributing
code derived in this manner is pretty clearly copyright infringement unless
the original license allows it. I am not a lawyer, but I would argue that,
while reverse engineering to allow interoperability is acceptable, doing so by
purely copying the original is not, and performing mechanical transformations
on the code (disassembly/recompilation) is not enough to cause the resulting
code to be a non-derived work.

That said, I can see how this is a technologically reasonable first step
toward a new implementation. Once this initial translation step is done,
individual functions can be swapped out for new (non-translated) versions
fairly easily by editing source code, as opposed to patching the original
binary. Later in the post, there's mention of replacing commonly-executed
functions with native versions to improve performance and allow porting to
other environments.

------
grokys
What are the legal implications of this? As far as I understood reverse
engineering using disassembled code isn't legal. And this isn't just reading
the disassembled code - they're taking it wholesale.

~~~
delinka
You can reverse engineer to your heart's content using whatever you like,
provided you didn't "acquire" the original source code. Given the binary,
there's nothing to legally prevent you from using or creating tools to convert
machine code to assembler code and then find patterns that you can disassemble
into compilable source code.

Now redistributing that code might maybe possibly get you into hot water, but
to find out you'll need to go through a trial. So you use your disassembled
source to learn the algorithm, then reimplement it in your own style. Now
you're not even in the gray with respect to the original source.

~~~
desdiv
Don't you need a Chinese wall between the person who disassembles the binary
and the reimplementor?

------
tomyws
This is fascinating!

Practical reuse of assembly is resourceful and this approach to portability
quite cool (take a look at this non-portable approach to fixing up and
executing an assembly dump[1]).

I wonder if the future of video game console emulation lies in recompilation,
perhaps to an intermediate representation format for LLVM (similar to
Dagger[2]).

[1]
[http://aluigi.altervista.org/mytoolz.htm#dump2func](http://aluigi.altervista.org/mytoolz.htm#dump2func)
[2] [http://dagger.repzret.org/](http://dagger.repzret.org/)

~~~
brigade
Static recompilation, no. See
[http://andrewkelley.me/post/jamulator.html](http://andrewkelley.me/post/jamulator.html)
for an attempt at that and why it isn't useful.

~~~
tbirdz
These issues may be serious problems, such as self modifying code in emulating
older systems like the NES, but are they still as problematic in more modern
consoles? I believe that modern consoles are already being programmed more in
high level languages (C, C++, etc) than in assembler. Playstation even has a
gcc based compiler. Since the code was initially written in a higher language
and then compiled, would it not be easily to statically recompile it?

However, I am not a games programmer, and do not have any actual experience
with console development. If there is anyone on here with experience in these
areas, I would appreciate their thoughts on the matter.

~~~
brigade
Dynamic codegen in games might be less of an issue, but instead you get to
deal with emulating a MMU among other things. And the point is that _all_ of
these issues specific to whole-system emulators mean that static recompilation
isn't faster than dynamic, so there's no point in doing contortions to get
static working. The original language barely matters when all you have is
machine code.

As for where newer consoles are easier, the big thing is that there's more
dynamic linking so there's more opportunity for high-level emulation of entire
systems. Think being able to emulate entire OpenGL function calls rather than
the raw GPU-specific register writes.

------
hyc_symas
Reminds me of my 8086 -> 68000 recompiler I wrote in my Atari ST days.
Translating MSDOS binaries to GEMDOS was pretty straightforward back then.

------
amelius
Actually, I think it is much easier and less error-prone (and hence more
secure) to translate from one machine language to another, than it is to
translate Javascript to machine language. Hence, I don't understand why we
don't use some form of machine language instead of Javascript on the web.

The number of different cases to tackle is certainly much smaller.

