I also started out on the Apple ][, writing my own versions of the games I saw in the arcades (donkey kong, pacman, joust, etc).
My kids are old enough to be curious about how games are actually built, and we've had fun building an adventure game in basic, and a version of asteroids in C. We had planned to build something this summer, so this book of yours is an ideal next step which we can work through. I think they'll really get a kick out of seeing how it all goes together. (okay, okay, I admit it. Nostalgia hit me hard and I just want an excuse to fire up my old Apple ][ again.) Regardless, I just hit "order" on Amazon.
Add VT100/ANSI terminal emulation and you can run Wordstar, Rogue, dBase 2 etc.
If you're subtracting the CPU emulator what is the point of doing any of this?
On more modern systems, the CPU is only a tiny bit of the equation. You might have to emulate the graphics pipeline, the audio processor, the input mechanism, etc... This makes the process of writing an emulator more tenuous, as those are not nearly as straightforward to implement, and might be badly documented.
If you look at the really modern platforms (like the Switch or 3DS), CPU emulation is barely 20% of the work.
After that, I doubled down and wrote a g65816 emulator for it (that's the CPU the SNES and Apple IIgs used). Emulator writing is almost a zen-like experience.
At the moment (well, for years really) I have been in a 6502 binge, but want to get into m68k programming as well (and the Amiga ecosystem specifically).
The best way to learn how a CPU works would be to write a CPU! I.e., write HDL code for it and test in a hardware simulator.
I have designed a miniature application-specific processor, a compiler* and IDE for it, a disassembler, and an emulator too.
* To be more precise, I hacked a compiler for it using C# .net compiler.
But you can't share hardware designs (e.g., FPGA bitstreams) like you can software binaries or software source code. A hardware design will require a specific FPGA, board, and probably even a particular version of the synthesis tools if you want to build from source.
The actual processor was implemented on a Xilinx FPGA, and it was about 2K lines of HDL code, carrying about 25 application-specific instructions which I had designed also.
For the compiler, disassembler, IDE, emulator, I created in C# and .NET.
While I did not use these, there are also ASIPs: https://en.m.wikipedia.org/wiki/Application-specific_instruc...
Start with CHIP-8, it's the gateway drug to others.
I’m gonna finish it up with super chip-48, and then move to Gameboy. The progression feels natural and it’s fun and rewarding.
Real games on the console didn't have to worry about timing variances, so the emulator can't have them, but that means it needs to do a whole ton of extra work to make sure the various hardware bits are all running more or less in sync with each other.
When two devices are communicating over a port, only operations writing to the port matter. If something is spinning waiting on a signal (assuming the partner isn't affected by this read repeatedly) you can simulate that by sleeping until the partner writes to the port (or another internal event occurs), updating any internal counters with how many cycles would have burned.
Where it gets more difficult is interrupts, which in the above paradigm could come at any time. If they're timers you could know before going into a "basic block" (the JITted chunk, probably larger than a classic basic block) if it would hit during the block, then just single step interpret out the rest of the cycles. If it's irregular... you might either checkpoint regularly and revert to a previous checkpoint if an interrupt comes in during a block, or batch up changes and only commit them if we hit the end of the block with no interrupt.
One big problem with this idea is that RAM is a device, probably not connected to only a single core CPU! You can get around this a little by modelling RAM over time as separate ranges, first assume that writes are only going to be visible to the device that wrote it, then if another reads/writes it, split the region, treat the ranges as separate devices (and treat this event as an interrupt to the first device).
Man now I want to write this!
Cycle accurate: this instruction takes four cycles, so this push the address to the multiplexed A/D bus. Next cycle read the data. etc.
So more fast forwarding in timing vs cycle accurate. Typically timing accurate doesn't implment things like wait states injected by other peripherals accessing the main bus either.
In general for these very old systems (anything 8bit these days I'd say, and even 16bits if you're only targeting the desktop) a recompiler is probably more trouble than it's worth.
I guess that's also because modern CPUs can easily handle very old systems, which were relatively slow, and hence software written for it had to be efficient.
However, if emulating e.g. ARM on i86 (or vice versa), I guess that recompiling optimizations may be necessary to keep things efficient. Of course emulating JITted code can be a pain because of self-modifying code. Wondering: does VirtualBox use such approach?
It's that but it's actually worse than that: power aside, one hertz for one hertz, modern systems are generally much more recompiler-friendly than a GameBoy or Atari. The reason is that modern software is written using high level abstractions, you have well defined firmware interfaces, self-modifying code has become the exception instead of the rule etc... That lets you make simplifying assumptions that can speed things up greatly. For instance "I don't have to worry about self modifying code unless I encounter a call to the firmware's flush_cache method". That's better than "I need to be careful with every write to memory because it might potentially write to code instead of data or registers".
Back in the days it wasn't rare to find timing loops that required cycle-accurate emulation, for instance because the drawing code expected that the delay between the interruption until a certain instruction was executed was precisely n cycles. I doubt many people do that on a PS4, actually I would be surprised if you could make something like that work between all the various hardware revisions and firmware updates.
So older hardware generally requires the emulator to be a lot more strict which often means that you end up with a comparatively slow recompiler. Given the complexity of a recompiler compared to a simple interpreter it's simply not worth it anymore for these old systems.
Depends on the ARM in question and how dependent on cycle accuracy the rest of the system is. Anything more than ~100MHz you pretty much need to JIT, but a surprising number of systems don't reach that. So like a DS is generally interpreted, but a 3DS is generally JITed.
> Wondering: does VirtualBox use such approach?
VirtualBox uses a combo approach depending on the host.
32 bit host uses a very simple JIT that runs ring 0 and runs all guest code in ring 3. The JIT hides itself using the segment registers. It's mainly to emulate the few instructions that don't fulfill Popek & Goldberg on x86 (ie. instructions that act different in ring 0 vs ring 3, but don't trap). It'll also play with the page tables to know when self modification occurs.
64 bit host is way easier. Pretty much just relying on hardware virtualization, and interpreting a single instruction here and there when you get a trap out of hardware virtualization into the hypervisor.
That was really awesome to work on though. I probably learned more working on that than anything else. It was the first time I really felt like I understood entirely how a computer actually works.
/* Example 2: turn on one LED on the control panel */
char * LED_pointer = (char *) 0x2089;
char led = *LED_pointer;
led = led | 0x40; //set LED controlled by bit 6
*LED_pointer = led;