
A new cycle-stepped 6502 CPU emulator - ingve
https://floooh.github.io/2019/12/13/cycle-stepped-6502.html
======
eatonphil
This is a great introduction to emulation in general.

While I was trying to figure out how to write my first emulator I thought it
would be simpler to start with a Game Boy Advance or an 8-bit processor like
the 6502 or the PICO-8.

After a few false starts it turned out to be simplest to write a simple x86
emulator since I already had all the tools locally (assemblers, disassemblers,
compilers, etc.) and a working reference environment (my laptop).

Here's the walkthrough I wrote for building an x86 emulator in JavaScript that
you can use to run simple C programs compiled with GCC/Clang:

[http://notes.eatonphil.com/emulator-basics-a-stack-and-
regis...](http://notes.eatonphil.com/emulator-basics-a-stack-and-register-
machine.html)

Now that I did this though I'm still trying to get into a GBA emulator.

~~~
jsmolka
> I thought it would be simpler to start with a Game Boy Advance or an 8-bit
> processor like the 6502

The 8-bit processor will be much more approachable. I have worked on my GBA
emulator for around one year now. Before that I tried the classic GB and its
8-bit CPU was so much easier to implement. The GBA's ARM7TDMI alone took me 3
months to complete, even with extensive testing [1].

[1]
[https://github.com/jsmolka/eggvance/tree/master/tests](https://github.com/jsmolka/eggvance/tree/master/tests)

------
cabaalis
I've been thinking of writing a 6502 emulator. My plan was to start with
cycle-stepping. I'm purposefully not looking at other emulators or code :)

I do have something I've been wondering, especially since I haven't yet
written any complicated code yet for this plan. To emulate a 1mhz clock on a
modern CPU, do you have to do things like check time deltas to "limit" the
emulator speed? (If this was address in the article, know that I stopped
reading when I saw code.. see above purpose)

~~~
thristian
Most (game console) emulators don't try to limit the speed of actual CPU
emulation directly, since that makes the code more complex, and makes testing
slower, etc. Instead, the only time the emulated system speed is when it has
to talk to humans - render a frame of video, generate audio samples, etc. So
if you have a 60Hz monitor (as most people do), and your emulated CPU takes
180,000 cycles (or whatever) to render a video frame, then your emulator just
runs at top speed until a video frame is available, waits for the "vertical
synchronisation" signal from your graphics API, draws the frame with the
graphics API, and repeats.

Because of the delay introduced by vertical sync, your emulated CPU's average
speed will be correct, even though the instantaneous speed is either "way too
fast" or "zero".

~~~
zeta0134
Adding on to this, in the context of a typical emulator, it's important that
the other hardware be emulated mostly in lockstep with the CPU. Each time the
6502 reads or writes memory, the hardware registers it is talking to should be
"caught up" so that things like video and audio signal timings work correctly.
There are a bunch of different approaches to this depending on how accurate
you want things to be, but the most straightforward is to just clock the
components directly, one after the other. I have a function in RusticNES that
clocks the CPU + APU, another that clocks the PPU, and since the PPU runs 3x
as fast, a global clock function that looks like:

    
    
        pub fn cycle(&mut self) {
            cycle_cpu::run_one_clock(self);
            self.master_clock = self.master_clock + 12;
            // Three PPU clocks per every 1 CPU clock
            self.ppu.clock(&mut *self.mapper);
            self.ppu.clock(&mut *self.mapper);
            self.ppu.clock(&mut *self.mapper);
            self.apu.clock_apu(&mut *self.mapper);
        }
    

This way, all of the emulated chips remain mostly synchronized with each
other, but I don't bother to synchronize with the host until either the PPU is
ready to draw a frame, or the APU has filled its audio buffer.

This approach uses the CPU cycle as the boundary, but there are other
approaches to consider. You could run an entire opcode and then "catch up" the
rest of the hardware (maybe easier to manage CPU state), or run entire
scanlines at once. (Maybe easier to manage PPU state at first? The NES's PPU
is well understood, but very tricky to emulate correctly.)

------
mrec
Retro emulators seem to be increasing in popularity lately. I've seen a few
articles now covering the "how", but does anyone have pointers to something
explaining the "why"?

Is it a Zachtronics-style technical challenge? Specific nostalgia for the
software that ran on these ancient platforms? More general nostalgia for the
days when platform stacks were still thin and comprehensible [1]? Something
else?

[1] [https://ptrthomas.files.wordpress.com/2006/06/jtrac-
callstac...](https://ptrthomas.files.wordpress.com/2006/06/jtrac-
callstack1.png)

~~~
unoti
> Is it a Zachtronics-style technical challenge?

Actually, I did a lot of assembly coding on 6809 and 6502 back in the day, and
also played a lot of the Zachtronics games. It's not really the same at all!
Zachtronics games are all about challenges that are quite a bit more
ridiculous than you have in real life with small machines. In a typical
Zachtronics assembly programming game, you spend a lot of time trying to
figure out how to do pretty trivial things using a ludicrously small amount of
resources, like maybe you've got two reigsters to work with. In a real
assembly project, you'll have a few kilobytes to work with, or at a bare
minimum 512 bytes. The smallest system I worked with had 512 bytes of RAM and
8kb of ROM. The Zachtronics games were fun and challenging, but they were more
like puzzles rather than real programming. Puzzling out how to get the
assigned task done with ridiculous artificial limitations.

Contrast that to an example of a real-life non-trivial assembly thing I did on
a 6502. I needed to respond to remote control commands from an infrared remote
similar to what you have with a TV. For input, I'd get interrupts connected to
an IO-pin that would fluctuate up and down when receiving IR input. The task
involved "recording" commands from a remote, then later monitoring the input
and seeing if it matched any of the pre-recorded patterns. You'd count the
amount of time between successive pulses, using a real-time clock that was
connected to some other IO pins. It's not rocket science, but it's fun and
challenging figuring out how to get all that going with a limited amount of
memory. To me that kind of thing is a lot more fun than figuring out how to do
the various challenges in Zachtronics games, which mostly center around taking
something that'd be trivial do if you had a couple hundred bytes of memory to
work with, but they are forcing you to do it with 2 bytes of memory. So you
make little side constructs that act as memory, or other side constructs that
somehow use message queues as memory, or whatever.

The appeal of retro emulators I think is that it's a system that's small and
limited enough that you have more of a chance of understanding it from end to
end. There's a feeling of comfort or coolness in knowing that you know exactly
what the system is up to on every cycle. With larger computers, the hard drive
starts blinking and God only knows what it's doing, or why. There's this
feeling on small embedded or emulated systems that _you_ are really in control
of what's happening. On larger, more complex systems, the software I write is
just one little part of the larger who-knows-what going on inside the machine.

For me another aspect of it is how much I'm getting done with so little code.
There's something very satisfying about knowing the entire binary package of
assembled code I've generated, and being able to point to any byte of that and
understand what it is.

It's the exact opposite of what's going on in my other terminal, where as I
type it's creating a new react app and downloading untold megabytes of who-
knows-what with countless dependencies and abstractions. In retro systems, if
the system is capable of doing something, it's probably because you personally
made it able to do that thing.

~~~
djmips
Atari 2600 has 128 bytes of RAM and a crazy video system. Feels about as hard
as a Zachtronics game to me.

~~~
unoti
Zachtronics games never give you 128 bytes! You get the 2 registers in your
processor. They don’t give you ram!

------
Swivekth18
I like how it's pretty much standard to expect emulators to have a WASM port
running in the browser at full performance at this point. Atwood's law is 12
years old but still going strong.

------
djmips
I feel like with this approach and you are getting to a better place to
convert the design into an HDL and running on an FPGA.

------
boomlinde
This is really cool. I like the approach of exposing the pins as its API.
Super flexible.

