
Statically Recompiling NES Games into Native Executables with LLVM and Go - darkf
http://andrewkelley.me/post/jamulator.html
======
tibbon
This is amazing. Also, this is the Dark Magic of programming that I don't
think I'll 100% grok in 20 years, but its good to try!

edit: now that I think of it, I really need to keep expanding my knowledge.
I'm going to go through this post in my terminal and try to at least make the
stuff work, so I can start understanding this process. I've been trying to
learn Go and C better anyway. Thanks for providing a ground to learn more.

~~~
exch
I've looked at this sort of stuff as utter voodoo for a long time. Then I ran
into this book[1], and everything just kind of 'clicked' into place in my
head. I can't recommend this book often enough.

[1]:
[http://www.nand2tetris.org/book.php](http://www.nand2tetris.org/book.php)

In short: It gives you a hands-on approach in designing and building your own
computer and programming language, to end up writing and running your own
games on the system.

    
    
        * It starts with simple boolean algebra to explain and create logic gates (NAND, AND, OR, XOR, etc).
        * Use these to build an ALU, memory banks and eventually a full CPU.
        * Design an assembly language and assembler for this system.
        * Use the assembler to create a higher level OOP language, compiler and code base.
        * Use this language to write a rudimentary operating system.
        * Write a game to run on the OS.
    

All using very clear and simple English and a very comprehensive emulation
system (written in Java) by the authors.

Edit: For some reason, this site has started showing malware warnings in
chrome and firefox since today. Even though Google's advisory[1] makes no
mention of any actual malware being detected. I've visited this site safely
for a long time. Still, if you don't trust it, then wait until Google clears
up the issue. I've already contacted one of the authors about it.

[1]:
[http://safebrowsing.clients.google.com/safebrowsing/diagnost...](http://safebrowsing.clients.google.com/safebrowsing/diagnostic?site=http%3A%2F%2Fwww.nand2tetris.org%2Fbook.php&client=chromium&hl=en-
GB)

~~~
tmzt
Might be a good candidate to port to asm.js or even just javascript, there are
platforms without Java where this could be a valuable teaching tool.

~~~
BHSPitMonkey
Or, better yet, just port the Java runtime to Javascript...

~~~
cgag
[http://int3.github.io/doppio/about.html](http://int3.github.io/doppio/about.html)

[https://github.com/int3/doppio](https://github.com/int3/doppio)

------
logic
Just a quick note about the disassembly challenge he faced (indirect
references), having gone through this before: you can get amazingly good
results by cheating a bit. That is to say, rather than assuming you actually
have to properly execute through the code path, you can get very close by
roughly tracking register assignments when making your initial pass through a
block of code. (Even better, if you can track potential ranges of values with
later calls into a given block. Some of this depends on how you've implemented
your disassembler, though.)

I ended up doing this with a SuperH disassembler (with SH2, due to its two-
byte opcode layout, indirect addressing is the order of the day), and by doing
basic register assignment tracking and adding a few crude heuristics, I was
able to get very usable results. No, the end result won't be "pretty"; you'll
be moderately embarrassed to show it off., but it will work. :)

(Heuristics: one structure that I had to manually handle were compiler-
generated jump tables; thankfully, for my project, I'd had a bit of help from
the compiler that was used, and there were distinct signatures I could key off
of.)

If you're even remotely interested in the disassembly aspect of this, I'd
recommend learning a bit about a piece of software called IDA Pro:
[https://www.hex-rays.com/products/ida&#x2F](https://www.hex-
rays.com/products/ida&#x2F); As horrible as the UI of it is, there is simply
nothing better on the market for reverse engineering analysis.

~~~
vidarh
Second this. There are a _lot_ of "signatures" in most asm. Programmers for
6502 and derivatives might be a nasty bunch of sadists that love to do weird
stuff to save cycles, but even there there are lots and lots of common
patterns that often "happened" just because people learned from the same
sources, or because it made sense, or because conventions appeared.

I never had a NES, but I had a C64, and the 6502 code wrote there seemed nasty
to translate on the surface, with lots and lots of self-modification, for
example. But in the end most of the self modification was specific looping
patterns because the 6502 can only index 256 values, and so many loops
involved writing addresses into the looping code, iterate 256 times, increase
the most significant byte directly in the code and see if you'd reached the
end, and jump back to iterate 256 times.

Most of this "nasty" stuff is relatively well known by now and much of it is
relatively regular and easy to detect.

------
VeejayRampay
This is one of the best technical articles I've seen in a long long time
congratulations. I won't go and pretend I understand what is really going on
but the writing style is excellent, to the point and the general flow and
formatting are a pleasure.

Props dude.

~~~
tharshan09
I agree. Really well laid out, easy to understand for lay person without using
too much technical jargon. I enjoyed the long code pastes; rather than a
github repo link.

------
comex
Somewhat off-topic, but to defend gcc against clang, here is a modern version
of gcc with the correct warning option:

    
    
        $ gcc-4.8 -std=gnu99 -Wall -o test test.c
        test.c: In function 'main':
        test.c:6:5: warning: suggest parentheses around comparison in operand of '&' [-Wparentheses]
             if (foo & 0x80 == 0x80) {
             ^
    

gcc 4.9 will have colored diagnostics, too.

Cool project, though.

~~~
nitrogen
This would be a great way, on a compiler or project that doesn't have this
warning enabled, to conceal a deliberate bug and have it appear to be
accidental.

~~~
dietrichepp
Hey, check out the Underhanded C Contest.

[http://underhanded.xcott.com](http://underhanded.xcott.com)

------
Filligree
Modern PS2 emulators - which is to say, pcsx2 - uses dynamic recompilation to
execute games at useful speed. Static recompilation might not be a useful
technique, but did you consider a dynamic version? What caveats are there?

~~~
AndyKelley
About halfway through the project I realized that static recompilation is
pointless and that dynamic is the way to go. I felt like it was worthwhile to
at least get to the "able to play super mario 1" checkpoint before quitting. I
did not do any investigation into dynamic recompilation other than pondering
about it and concluding that it is more practical than static.

~~~
jmhain
Why is static pointless and why is dynamic better? If emulating a newer game
console, couldn't you get better performance by running a statically
recompiled game, since it doesn't have to do the extra work at runtime? Or
better yet, couldn't you cross-recompile a game to run it on a platform that
couldn't normally handle emulation of the target platform?

~~~
vidarh
Dynamic lets you do lots of nasty tricks, and also lets you detect and work
around various nasty tricks at runtime (worst case by falling back to
emulation). Consider that old games often would use self-modifying code, for
example. Reliably statically detecting self-modifying code can be extremely
hard even when it's not intentionally obfuscated. But at runtime it is "easy":
Write protect all the code pages, and trap page-faults, and either modify the
offending code, or fall back to emulation.

In general, I think dynamic approaches are ideal when you're dealing with a
"hostile" environment where the software you're translating was written with
no expectation that it would be translated, and possibly (like with old games)
in a situation where the programmer may have tried to really maximally exploit
the hardware, because it means failure to statically determine that something
weird is going on can often be counteracted much simpler by detecting attempts
at violating your assumptions.

You can do hybrid approaches, and statically make a "best effort" and include
similar methods to trap stuff that breaks your assumptions and fall back to
JIT or emulation, but if you do that then there's a tradeoff between how much
dynamic stuff you need to be able to do before it's easier to just do
everything dynamically from the start.

The performance thing is not so easy to ascertain. JIT'ing code takes a bit of
time, but not much compared to the expected overall time the program will be
run afterwards. Static compilation can spend more time doing optimisations,
but JIT's at least in theory have more information to work with (can detect
the specific processor version, and use specialised instructions or alter
instruction selection, for example, or could at least in theory even do tricks
like re-arranging data to get better cache behavior (I have no idea if any
existing JITs actually _do_ that) based on profiling access patterns for the
current run.

~~~
jmhain
Thanks for the thorough response. I really appreciate it.

------
tluyben2
This is one excellent read! Thanks to the author for writing this down. Not
that i'm not interested in the NSA, but this is a welcome diversion. And
something I wanted to play with myself for a long time.

~~~
AndyKelley
Thank you. I do not write often, so this was a challenge for me. Constructive
criticism welcome.

~~~
Luc
Well you certainly rose to the occasion. I look forward to leisurely reading
this in detail.

Before I looked at the article I immediately wondered how you were going to
handle self-modifying code (running on the internal RAM of course). I guess
you didn't encounter that situation?

~~~
AndyKelley
That situation is covered in this section:
[http://andrewkelley.me/post/jamulator.html#dirty-assembly-
tr...](http://andrewkelley.me/post/jamulator.html#dirty-assembly-tricks)

Basically I embed an interpreter runtime and use it only when necessary, such
as in the case when the program jumps to RAM.

Good point though. I should specifically mention self-modifying code.

Note that with NES games, self modifying code is uncommon, because programs
are 32 KB ROM, and you only have 2KB RAM. So you'd have to first copy your
subroutine from ROM to RAM, and then jump to it. And then you have that much
less RAM to work with.

However, some of the emulator test ROMs[1] people have made use this technique
to test every instruction.

[1]:
[http://wiki.nesdev.com/w/index.php/Emulator_tests](http://wiki.nesdev.com/w/index.php/Emulator_tests)

------
quux
This is interesting. If this project can output LLVM byte code, then you could
also codegen to javascript with emscripten and make a web based version of a
NES game.

~~~
AndyKelley
That thought occurred to me as well, although I did not actually try it.

~~~
tibbon
Seems like a good followup post. I'd read that.

I mean essentially it would make NES games running natively in the browser
with no emulator right?

~~~
AndyKelley
It's not quite as sunshine and rainbows as that. For one, I had to embed an
interpreter runtime in the generated code to handle some dirty assembly tricks
that programmers can do. Further, the Picture Processing Unit and Audio
Processing Unit must be emulated. Even worse, there are some things that fool
the disassembler that many games do. And finally, this project never even
attempted to support mappers, which means that there are only about 8 or 9
notable games that this could even potentially support.

As is, it supports Super Mario Bros. 1 only.

~~~
boomlinde
There was a NES emulator for the Game Boy Advance that performed with a
surprising level of accuracy at full frame rate. I guess the CPU emulator was
a simple state machine, but the interesting thing in this context is how it
did the graphics and sound. Having a pretty advanced "PPU" of its own, with
the same basic types of features (sprites, tile indexing background layers),
it translated the PPU register writes (with appropriate scaling, I guess) to
native register writes. I'm not sure how things like collision interrupts were
handled, but presumably the GBA has similar functionality. For a javascript
emulator or static recompilation, similar methods could be used to set up a
bunch of shaders dealing with the basic functionality of the PPU (sprites,
tile layers etc). Certainly less accurate and probably missing a bunch of
corner cases, but it would definitely be the fastest approach to dealing with
graphics emulation.

------
Pxtl
... while it's not really useful for the NES, which is so old that emulating
it does not strain even the crudest modern processor, I'd be excited to see
this technique applied to newer consoles for lightweight mobile processors.

~~~
simias
I think it might actually work better for more modern hardware. Less
handcrafted ASM tricks, much more regular (compiler generated) machine code.
And of course no self-modifying code that would be extremely difficult to
recompile correctly.

Modern hardware (GPU, sound cards,...) is also very similar to what you find
on a PC so it would be more straightforward to port all this code. No messing
around with the framebuffer mid-scanline to create a cool effect, no quirky
special purpose hardware for very specific tasks.

~~~
drbawb
The other interesting thing about older hardware is that each cart could embed
special hardware that the NES could take advantage of. To play those games:
that extra hardware has to be emulated as well.

So far as I know: this is unheard of with current gen consoles.

The most recent example I can think of is for a handheld console. The Pokemon
Walker that was bundled with the newer Pokemon games for the Nintendo DS;
which I believe has the IR hardware embedded in the cart itself.

So in addition to worrying about rather interesting use of the stock hardware,
you also have to consider interesting use of _secondary_ hardware.

\---

The latest batch of consoles [Xbox One, PS4] look to be x86 PCs with high-
bandwidth memory; if that's the case, I'm hoping PC ports are more common, and
perhaps we'll even see a virtualization based approach to running next gen
games on standard PC hardware.

~~~
Guvante
I don't know how common add-ons were in the NES era, but they were very common
for the SNES (which was very similar to the NES power wise).

Games stopped embedding hardware when they went to discs. There is no way to
put a parallel processor into a DVD.

~~~
drbawb
Well, modern consoles _could_ still be extensible; but their hardware is
already so general purpose that there's not much point.

Best you could do w/ current gen technolgy is bundle a dongle w/ the game,
where the user plugs in some kind of co-processor through USB.

So far I haven't really seen anything like that -- the only USB dongles I've
seen bundled w/ games are for games like RockBand and they're just RF
receivers.

Aside from bandwidth concerns, and the poor sales of previous attempts (for
e.g the SEGA's whole 32x/CD addon), there's nothing preventing a disc-based
from having an external co-processor.

------
pilif
This is one of the best articles I've seen linked here in a long time. oP
covers so much stuff but simplifies exactly where needed so everything stays
understandable and there are no gaps (the "how to paint an owl" syndrome).

Thank you so much for writing and posting this. You made my day.

------
CountHackulus
I seem to remember someone doing this for the original xbox and getting up to
the halo "start game" screen. That was probably easier due to it being roughly
the same architecture. This is something else quite different and really neat.

------
kriro
I won't even pretend that I understand half of this but from a quick browse
this looks pretty interesting.

It seems very well written, too.

Filed away into my magic "ZOMG INTERESTING PROJECT IDEA" folder :D

~~~
AlexanderDhoore
Oh, man! I know that folder! I don't have one, but 20. I've switched from
bookmarks to actually writing it down on a piece of physical paper. And not
just the url. I write down a small explanation for my future self. I haven't
been bored in months :D

------
p_f
Very interesting article indeed. Some time ago I made something similar for
GameBoy games and ran into the same set of challenges (and ended up using
similar techniques). The ROM is decompiled and translated into C code, which
is then compiled and linked with runtime libraries. Jump tables and indirect
jumps often need some manual fixing. I went up to the point where I can
convert some simple games (without memory mappers) into binaries running on
iOS and X. I did not have the time to document the tools but if anyone is
interested to continue that work just let me know.

I guess one of the advantages of static recompilation is that you can port old
games to new platforms if you hold the copyright of the game itself, but
without running into issues with the manufacturer of the console
(Nintendo)--but I might be wrong. You could also conceivably improve the game
more easily during conversion (e.g., incorporate higher-resolution graphics).
Finally, you could potentially have the resulting code distributed via app
stores that do not allow general-purpose emulators.

------
shanselman
This article is a joy. What a wonderfully written and through explanation of
the space. I live for this stuff.

------
lucian1900
There is some research on this
[http://www.pagetable.com/docs/libcpu/26C3-libcpu.pdf](http://www.pagetable.com/docs/libcpu/26C3-libcpu.pdf)

It's a very interesting topic. It may be our best chance at preserving
software.

~~~
darkf
It's really unfortunate that libcpu didn't take off. Last I checked, they got
nowhere with no contributors, and now their site 503s. It was an interesting
project.

~~~
lucian1900
I think it's more than that. It is not yet clear that this approach can work
in the general case without emulation: the halting problem may be in the way.

~~~
AndyKelley
Correct. If you think about it, the program could prompt the user for the
address to jump to, and then the program could go there. In fact, that's
exactly what some games (accidentally) do:
[http://tasvideos.org/2341M.html](http://tasvideos.org/2341M.html)

It is impossible to solve this statically. This is why dynamic recompilation
is more practical.

~~~
BHSPitMonkey
Thanks for that link; That hack (and the accompanying video) has just become
perhaps my favorite thing, ever. Also, it led me to this similarly incredible
hack as well:

[http://www.youtube.com/watch?v=D3EvpRHL_vk](http://www.youtube.com/watch?v=D3EvpRHL_vk)

------
RegEx
Please forgive me for the bikeshedding, but I have a quick question as a C
novice: Is the equality check in the following necessary?

    
    
        if (foo & 0x80 == 0x80) {
    

I thought if you're checking a bit simply anding would be enough (since
everything else would be zeros). In other words, could we just use

    
    
        if (foo & 0x80) {
    

If so, then it seems like this would be the preferred form to avoid the
precedence issue presented in the article.

~~~
jlgreco
I am seeing that as basically defensive coding. Similar to how you may do
this:

    
    
      switch (mode) {
          case FOO: ... break;
          case BAR: ... break;
      }
    

Do you need the very last break? Technically no, but including it so that you
don't forget it when you need it later can be a good idea.

(Also, I think explicitly using comparison operators in conditionals _might_
be the idiomatic thing to do in Go. Someone correct me here if I'm wrong about
that.)

~~~
RegEx
The example was using C, not Go.

~~~
jlgreco
Aye, though the same code later appears in Go.

------
chadseibert
I agree; this is quite amazing work! I've been meaning to do something like
this; perhaps generate a native executable or something similar.

------
grapjas
Interesting stuff, and I like the writing style.

------
patresi
I had a similar idea to this that I never really put in practice which was
doing some sort of static recompilation but to higher level code in order to
make open source versions of some NES games that could be used by other people
to do the same. Accuracy would not be a concern as big as a pure emulation
project.

I never got past the reading phase.

------
QEDturtles
I've been meaning to port some classic games over and utilize better input
methods for a while. It would be fun to be able to load old GB games on my
Android phone and tap the menus instead of navigating them with the DPad.
Thanks for this, I was looking for something to do with my Friday night!

------
leehro
This was fantastic, thank you.

Static recompilation seemed like an obvious solution to emulating games in
theory, but it was fascinating to see just what it would take. Also loved to
read about the clever tricks from 30 years ago and how we can or can't deal
with them.

------
saejox
Someone should recompile ps2 games to x86.

------
0xe2-0x9a-0x9b
The plans in the Conclusion section look interesting.

------
dschiptsov
What is amazing here is not the techy stuff, but productivity and clear
understanding of concepts. Of course, such shape (of mind) comes from years of
daily practice. That's why I know I will never write anything good - I didn't
spend enough time practicing. Practice leads to perfection (not reading HN).

And look, the guy is not using any IDE or proprietary tools - just a terminal
window and command line (what a horror!) tools. Looks like they are good
enough..)

All that 9-to-5 Java coders should at least commit suicide.) More seriously -
this is very clear illustration for startup founders of what a huge gap lies
between mediocre and a top performer.

Convincing a top performer(s) to work for you is the real secret of a
successful startup. Even pg (god forbid!) could be not so successful without
rtm.))

~~~
spc476
It depends on what you are used to. I started programming in the 1980s and the
first editor I used was pretty much like EDLIN
([http://en.wikipedia.org/wiki/Edlin)---think](http://en.wikipedia.org/wiki/Edlin\)---think)
of an unholy cross between the Unix commands cat and vi (line based and
modal).

And, except for code completion, there isn't anything an IDE can do that can't
be done via the command line (just not as conveniently). Then again, I don't
program in Java.

~~~
dschiptsov
vi is a small miracle of software engineering.

