

How do emulators work and how are they written? - alcuadrado
http://stackoverflow.com/questions/448673/how-do-emulators-work-and-how-are-they-written/448689#448689

======
daeken
Wow, on HN again. While I'm proud of that answer, the real bummer was that I
edited it too many times (started with a tiny answer and filled it in from
there) and it became Community Wiki. I got all of 4 or 5 upvotes before it
went CW, so got 40-50 rep out of it instead of the 5290 I would've gotten
otherwise. Glad people have found it useful, though.

~~~
cristoperb
From the end of the SO answer: "I'm glad this post has been helpful, and I'm
hoping I can get off my arse and finish up my book on the subject by the end
of the year/early next year."

Are you still working on the book?

~~~
daeken
I am, but it's been slow going. Between a new job and a social life, I have
little free time and energy to work on it. At this point, it'd be tough to get
it done this year.

------
bane
I had the great opportunity in my undergrad college compiler class to write an
entire ecosystem to understand better how some of this stuff works. We didn't
just write a compiler, we wrote an emulator for a simple, notional
architecture. The whole thing was simple enough that we could write the
compiler toolkit, the system emulator/VM and run it in a few days.

Then for extra credit we could rewrite the whole thing (or part of it) in
another language. And/or get a guaranteed A if we could make the whole shebang
self-hosting.

It was a really great way of understanding the basics of a system
architecture.

As for the emulator, it was pretty primitive, but basically it loaded the
compiled program code into memory, then would read an operation's worth of
bytes, look the operation up in some kind of lookup table (or similar) that
mapped the emulated system's instructions to local system instructions say,
x86, and execute that operation.

Sometimes the mapping resulted in several operations on the host side the
emulated system might execute in one.

eg. ADD R1, R2, R3

might add the contents of registers r2 and r3, then put the result into r1. In
x86 it would be more like

    
    
       ADD AX, BX
       MOV CX, AX
    

or some such.

(actually we did it a little higher level then that, but that's the idea).

So an emulator needs to provide all of the various registers and operations
and such so that the code can execute, it likely has to have the ability
emulate the memory architecture and translate I/O appropriately. Some older
systems for example could only output to a printer terminal, or to a simple
status display, on a modern system you have to translate calls to those output
devices to equivalent ones on your system (like the screen).

More complex system may need to emulate several processors. The SNES and Amiga
for example have a handful of chips that not only need to be emulated, but
much of the software assumes a particular timing in the interaction of those
components that can be very challenging to get right and keeping track of all
that and running at reasonable performance can require fast hardware.

These days though, not many emulators use the simple execution mechanism I
outlined above. Many of them have sophisticated code profiling and execution
caches that can significantly speed up the emulation.

In these terms, emulating a system, or a VM (like Java) or some other runtime
environment (like Javascript and the V8 runtime for example) isn't really all
that different and there's huge crossover in the theoretical concepts between
these areas.

~~~
silentbicycle
_The Elements of Computing Systems_ (<http://www1.idc.ac.il/tecs/>) follows a
similar approach, building an emulated computer up from NAND gates. Highly
recommended!

~~~
bane
Great book! I really love this approach to teaching this kind of subject.
Seeing the whole stack, top to bottom, really helps tie together _so_ many
different threads that are generally left untied during a CS education.

------
Osmose
<http://www.romhacking.net/> is an amazing resource if you want to find
documents on the inner workings of older game consoles.

The SO post already links to bsnes, which does an excellent job of balancing
readability and accuracy. It's still a little hard to grok the code, but
leagues easier than tackling something like SNES9x.

I would also recommend glancing through the vNES source code; it's much
simpler than bsnes and is very easy to understand for the most part:
<http://www.thatsanderskid.com/programming/vnes/index.html>

------
Jacquass12321
Invariably been linked before, but [http://imrannazar.com/GameBoy-Emulation-
in-JavaScript:-The-C...](http://imrannazar.com/GameBoy-Emulation-in-
JavaScript:-The-CPU) also addresses basics of emulator design.

------
zandorg
I find undocumented opcodes most interesting. They were literally holes in the
6502 circuitry. People used them on the C64 to get more speed out of their
assembly code. Early C64 emulators couldn't cope with those games/demos.
Luckily, they are documented in disk magazines, etc, so they've been
implemented now.

~~~
Zaak
Also, the 6502 has been reverse engineered from micrographs, so perfect
transistor-level emulation can be done.

<http://www.pagetable.com/?p=517>

~~~
T-R
byuu is also doing this for the SNES with bsnes - he's even had all of the
enhancement chips like the DSP-1~4 and CX4 decapped to achieve cycle-perfect
emulation.

<http://byuu.org/snes/donations/>

~~~
thristian
Well, the decapped chips have been photographed, but they haven't been made
into working diagrams like that 6502 one.

The important thing about the decapped chips is that any on-board ROM has been
dumped (electrically, not visually) so that they can be emulated with an
opcode interpreter, just like other emulators.

------
freedrull
Anyone know which of the listed ways of processor emulation that bsnes uses?

~~~
T-R
I'd imagine bsnes is somewhere between interpretation and dynamic
recompilation.

Static recompilation would be really difficult on systems like the SNES and
GBA, which have two processor modes (8bit 6502 emulation/16bit 65816, and
16bit THUMB/32bit ARM respectively), so the width of any given instruction is
dependent on the processor state. Dynamic recompilation would be faster than
instruction-by-instruction interpretation, but full dynamic recompilation
(with subroutine caching and all) would probably get pretty difficult to time
(I could be wrong), and cycle accurate timing is a stated goal for bsnes (to
the point where byuu had the enhancement chips like the CX4 decapped).

bsnes has taken the low-level route because a decent bit of the software for
the SNES is really timing sensitive - even that black circle effect at the end
of the level in Super Mario World is done by changing values between
scanlines. For systems like N64 on the other hand, where the software is less
timing sensitive, high level emulation techniques are more common, for the
sake of speed increases.

~~~
thristian
As best I understand it, bsnes is purely a bytecode interpreter. It does fancy
tricks with stack-swizzling to quickly switch between different parts of the
code to get the timing right, but it never ever generates instructions for the
host CPU architecture.

~~~
lscharen
Actually, I think it's more accurate to say that (at least for the 65816
core), that bsnes is a pipeline interpreter.

Each 65816 opcode takes between 2 and 8 cycles to execute (more for the MVN
and MVP opcodes) and there are rules about when interrupts can be asserted --
bsnes correctly emulates these processor details.

Also, I believe the bsnes properly takes into account the read and write
states of the memory bus. This is also an intra-opcode level of detail.

AFAIK, some of the other CPU cores (esp. the DSP-n and Supergameboy cores) are
straightforward opcode interpreters.

------
CWIZO
Previous discussing of the same link:
<http://news.ycombinator.com/item?id=1350343>

