Hacker News new | past | comments | ask | show | jobs | submit login
GateBoy – A gate-level simulation of the original Game Boy hardware (github.com/aappleby)
416 points by aappleby 47 days ago | hide | past | favorite | 66 comments

> Is GateBoy a perfect simulation of a Game Boy?

> Actually no, for complicated reasons. The Game Boy chip has a handful of logic gates that operate independently of the master clock and whose exact behavior depends on things like gate delay. These gates create glitches that depend heavily on the physical placement of the gates, the silicon process used to make them, and other weird things like temperature and voltage.

This is fascinating, I previously would have assumed that a gate-level simulation would necessarily be perfect by definition.

And incredibly annoying as the developer - so much work translating and debugging gates, and it's still not "correct" for some definition of correct ;)

Reminds me of hearing about how Crash Bandicoot was developed, as well as being something of a masterpiece in itself they successfully debugged an issue that caused the memory card to get corrupted when the controller was touched in a certain way. I can't remember the precise reason but they eventually discovered that running one of the chips at a higher frequency introduced subtle electrical interference with nearby components on the board. This was after much debugging for a supposed software issue in the memory card logic. I often think of that when I have a tricky bug to sort out.

“ This is the only time in my entire programming life that I've debugged a problem caused by quantum mechanics “

Who knows, we may be running into this more literally in our lifetime.

I suppose all problems of this world are caused by quantum mechanics in a sense.

If you want a great and long read, Andy Gavin has a pretty extensive series about Crash Bandicoot on his blog, some entries more technical than others. I've never even played the games for more than a few minutes, but I had a great time reading through these


Don't be too annoyed. There are significant differences across models and even batches within the same generation of system. There is no completely correct definition of a system.

Some of these differences are CPU observable, so I made this ROM to identify different Game Boy models and revisions: https://github.com/mattcurrie/which.gb

Very cool!

So is this saying that a possible modern day commercial for a Game Boy might feature: "The artisanal properties of the Nintendo Game Boy make each and every one a unique playing experience!"?

I guess it's impossible to ever have it correct. The gates are just another level of abstraction. You could also simulate the silicon atoms and the electrons flowing and use that to have transistors and then gates. And so on.

This is also the reason why it is so hard to protect against side channel leakage. During my PhD I worked on a solution that would be theoretically perfect (and is formally proved so), but is not in practice because of exactly that.

You can find the paper (Formally Proved Security of Assembly Code Against Power Analysis) and presentation slides on my web page: https://pablo.rauzy.name/research.html#rauzy2014formaldpl

If you simulate cycle accuracy you don’t need to simulate each gate. Think of pins on your cpu and the trillions of gates. If you simulated your cpu you just care what’s at the pins. Not at the gates.

This assumes the gate level implementation doesn't have any side effects that are related to more than an instantaneous state. Seemingly unlikely as that is a larger class than the group of side effects mentioned in the quote above.

You are right. You buy nothing other than research for gate level. For accuracy you simply need cycle level (which is nothing easy mind you)

I read this as well. I'm definitely referencing this in the future when someone mentions how "FPGAs directly simulate the system hardware."

As I understand it asynchronous logic stopped being a common thing decades ago, so those sorts of glitches aren't possible (or less possible?) now. The hard stuff in simulation now is probably handling multiple clock domains.

Still out there, but only used in special situations. The ARM microcontroller I'm working with has an asynchronous prescaler on the real-time clock crystal input before it gets fed into a second synchronous prescaler which you can then read the output of directly.

This is all about reducing power consumption down in the nA range when the CPU is turned off but the RTC keeps ticking over.

Multiple clock domains definitely tricky though!

As a software-only guy .... multiple clock domains sounds like the same class of problem as multi threaded + parallel programming, right?

Sort of? It's almost more equivalent to networking or distributed processing - circuits in different clock domains can't just send a wire to another domain, they have to go through a synchronizer and do some handshaking and other stuff that's vaguely similar to RPCs. I'm stretching here, it's slightly beyond what I've worked with so far.

Seems like the same as in multithreaded programming, in which you can’t let threads share memory without synchronizing else you get data races and corruption.

it's more like two simple computers talking over serial, and the serial connection bitrate can't be faster than the clock of the slowest of the two computers.

Yep! The microcontroller reference manual is full of warnings to ensure clock rates are compatible by being within a certain range of each other.

Sort of. You have to be careful when signals cross clock domains because they can become asynchronous. So anytime you go from one domain to another, you have to be sure to synchronize the data. This is often accomplished via flip flop.

This is my absolute favorite paper on the topic:


I’ve only ever seen behavioral solutions for newer systems. At a certain point, logic level designs become unreasonable.

> These gates create glitches

What sort of glitches? Which effects could they produce on outputs?

I wish there was a concrete example.

There are examples on the github page, literally right below that sentence - have you stopped reading just before that point??

"For example, there's a glitch in the external address bus logic that causes internal bus addresses like 0xFF20 to appear on the external bus even though the logic should prevent that. Due to gate delays, not all of the inputs to gate LOXO (page 8 in Furrtek's schematics) arrive at the same time. This causes LOXO to produce a glitch pulse that in turn causes latch ALOR to make a copy of one bit of the internal bus address. ALOR then drives that bit onto the external bus (through a few more gates) where it can be seen with an oscilloscope or logic analyzer."

Btw I'm a little confused, So are these "glitches" reliably same across all game boys or can this differ from chip to chip?

They can differ from gameboy to gameboy. And from day to day (as some of them are apparently temperature dependent).

Latches or a ring oscillator to generate true randomness. There could also be bugs in the design that cause metastability.

This is very impressive!

The creator—Austin Appleby—also created MurmurHash, which is very useful for data structures like Bloom filters. The canonical MurmurHash implementation was released with a handy non-cryptographic hashing test suite.


Thanks for your work, Austin!

You are quite welcome. :)

What a marvelous project! The amount of efforts that the author put into this project makes my jaw drop.

This project reminds me of a question I've had in my mind, though. If this world is a simulation, would that be an accurate simulation of the elementary particles and their interactions (like this project), or an appropriated one (like normal Game Boy emulators)?

FurrTek deserves as much credit as I do for doing the initial exceptionally painstaking step of marking out the die traces and cells on the die shot. Not all of his guesses were correct, but they were enough to get things started.

I suspect that this world is a high-level simulation until you pull out a microscope, at which point it switches to low-level mode. ;)

I read somewhere (I'll dig for the article) that one of the conclusions of Quantum mechanics was that "things don't exist unless you look at them". And the claim was that this actually happened on the macro level.

This is one of many articles that discusses the experiment: https://www.sciencealert.com/reality-doesn-t-exist-until-we-...

Ah yes the real life LOD system

This is awesome stuff Austin!

> GateBoy, LogicBoy, and MetroBoy exist to give me a starting point for working on Metron, which is my long-term project to build a set of programming tools that can bridge between the C/C++ universe used by software and the Verilog/VHDL universe used by hardware.

I'd love more details about this. What does it mean to bridge C/C++ and Verilog/VHDL?

SystemVerilog has enough C++-like _stuff_ in it that you can write a surprising amount of code that looks almost like C++ - you can call methods on objects, pass interfaces around, lots of other things that you might not immediately recognize as a hardware language.

That said, SystemVerilog does not _run_ like C++. Conceptually every function and method is running simultaneously all the time, with clock edges keeping things synchronized.

But... You can, with a extreme amount of care, write C++ code that can be almost-trivially translated line by line into SystemVerilog and can produce the same results when executed - provided you follow a whole lot of strict rules that you have to manually enforce somehow on the C++ side.

GateBoy enforces those rules at runtime in debug builds (see Gates.h), and I have written a very quick and dirty proof-of-concept LLVM plugin that can enforce those rules at compile time. I've also written another LLVM tool that can do the automatic translation from C++ to SystemVerilog, and I regression-tested it on a chunk of code from MetroBoy to prove that everything works as claimed. I was able to take the original C++, translate it to SystemVerilog, translate that _back_ to C++ using Verilator, run both C++ versions in lockstep, and verify that every register bit at every cycle was identical.

Eventually I'll get the LLVM tools to the point where they can validate and translate all of GateBoy's codebase, and then those tools will be released in my "Metron" repository.

> I've also written another LLVM tool that can do the automatic translation from C++ to SystemVerilog

Is this code available anywhere?

wait, the repo on Github is just the read/write checker. I'll have to go find the translator...

the Metron repo is on my Github but it's a horrendous embarassing mess.

And it's mostly a proof-of-concept so don't expect much :)

Anyone who wants the “typescript” of verilog should check out https://en.m.wikipedia.org/wiki/Chisel_(programming_language... Makes life’s 100x easier

Easier than VHDL, harder than C. Chisel's "adder" example on the wikipedia page is:

  class Add extends Module {
    val io = IO(new Bundle {
      val a = Input(UInt(8.W))
      val b = Input(UInt(8.W))
      val y = Output(UInt(8.W))
    io.y := io.a + io.b
whereas the same in C would be

  uint8_t Add(uint8_t a, uint8_t b) {
    return a + b;
and while that is not directly usable in SystemVerilog, it can be mechanically translated to

  function byte add(byte a, byte b)
    return a + b;
There's way more to it than that, but I strongly believe there are better ways of writing hardware than the existing HDLs.

Imho, you‘re barking at the wrong tree. Design with existing HDLs is a solved problem. Verification is not :)

Honestly from the chisel and C example I can’t really see much a difference. Not sure why corporate it making me choose. That io bundle in your first example is defined in place and then used. If that was previously defined (which would be a shitty Wikipedia example) it would look a lot like your C example.

Whoa! This just cements my contention: Nintendo GameBoy should be on every intro to computational architectures syllabus ;)

I really think it should be, though I'd put a Risc-V RV32I core in place of the original one.

Writing Hello World for the original GameBoy is a module in the architecture course I teach.

Amazing work. Congratulations on endurance!

That had to require a huge amount of endurance, perseverance, etc. I'd like to see a story on how he kept motivated and focused on this project.

Family tragedy and pandemic, no lie. Working on it kept me sane.

Would this ever make it into the MiSTer FPGA project as a core? There is currently a GB core included, but I'm not sure whether the VHDL is a 1:1 mapping to the actual HW gates themselves.

The cores they have are more than good enough. I do intend to get GateBoy to where it can be automatically translated to Verilog though, at which point running on a FPGA should be more straightforward.

Since logic simulation is a massively parallel task, I wonder how much faster it would be if you used the GPU to do it.

Funny to see the gates named "DEFY", and "CYKA" right below it...

You either sort by cone, getting great gate-value reuse density, but lose gobs of parallelism; or, you sort by gate-type & eat-shit on gather-scatter. It’s a devil’s bargain, either way. It’s why the majors sell $$$…$$$ custom emulation equipment.

It’s also a field long overdue for disruption.

> It’s also a field long overdue for disruption.

Won't happen though, way too niche market - outside of a single-digit number of hobbyists, the only ones who have an actual use for such equipment are companies for whom the expense is a rounding error on the balance sheet.

Doesn’t avx offer bitwise logical operations? Pack your wires into avx vectors.

I've heard family talk about chip design and verilog. This is the first time i feel like I've had a glimpse into what they're talking about. Great write up.

I was under the impression the gameboy cpu was a standard z80, but it seems that's not the case? What did they add to it?


They added/changed/removed instructions to fit the system more closely, e.g. IN/OUT is now done through MMIO in the FFxx area and thus there are dedicated instructions to access that address range.

The Gameboy CPU is signifigantly less capable than a full z80. It's missing key features like the IX and IY registers (and thus addressing modes), most of the 16-bit ALU, the second set of "shadow" registers intended for interrupt handling (instead, almost all gameboy interrupt routines start by pushing all the current registers onto the stack), and a bunch of other things I forget.

There's also some stuff it adds, such as the swap instruction (swapping the upper and lower nibbles of a byte) and the stop instruction (puts the gameboy into a low-power state until it is woken by a keypress).

They removed some instructions and added some instructions. Check the pandocs - https://gbdev.io/pandocs/

This is awesome!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact