Hacker News new | past | comments | ask | show | jobs | submit login
The Infinite Loop That Wasn't (mgba.io)
275 points by tambre 65 days ago | hide | past | web | favorite | 35 comments

I love reading these kinds of reverse engineering discovery tales, I find them technically educative, and overall enlightening... they help put my own struggles debugging thorny things into sharp perspective, making me even more appreciative of the level of control I have over my own code and it’s runtime environments.

The level of effort put into reverse engineering for emulating classic hardware of all kinds is just remarkable. The Dolphin EMU team has a long history of these kind of blog posts for people looking for more of this kind of writing https://dolphin-emu.org/blog/

I've always had the attitude that if can reproduce a bug consistently, I can debug it, and I can fix it. Now here's a nail in the coffin of that theory.

Only if the problem is purely software. I've had reproducible problems that were impossible to solve by myself. The process went like this:

- Huh, getting USB disconnects randomly, need to fix this

- Hmm, device driver seems to be fine

- Hmm usbnet driver seems fine

- Hmm xhci driver seem fine

- $5000 and one USB 3.0 protocol analyzer later...

- What the heck is this host controller doing?

At this point it was looking like a hardware problem but seeing as I didn't have the schematics or verilog of the host controller I could go no further (nor did I want too... I never imagined USB could be such a deep rabbit hole).

Most of the bugs I've dealt with, the problem was usually down to something simple that can be explained quickly. (e.g. syntactically valid typo, config error, simple logic error etc.).

Seems like "Holy grail" bugs like this happen to need a thorough understanding of complex systems (which I usually like to hide under layers of abstraction).

In defense of the coffin, I'd say this is not so much about a bug, but about reverse engineering real world effects of undefined behavior (as caused by a bug in software).

If the goal of mgba is to reproduce the GBA hardware faithfully, then this is a pure software bug in mgba.

That is not the goal of mGBA (or any GBA emulator for that matter), btw. If you wanted to reproduce it faithfully, the emulator would be unusably slow.


s/faithfully/faithfully enough/ Prior to the bug being fixed this was clearly not faithfully enough for at least 2 games.

You are trying to argue about the meaning of the words being used. This is not productive, as what matters is what the author of the argument meant when they used that word.

I don't think that's the case. When talking about emulation and virtualization and such, it is easy to get terms backwards.

For example, a bug in the implementation of a virtualized device looks like a hardware bug to the client and like a software bug to the host.

To be honest, I think it's about crossing the lines of emulation and simulation. In emulation, we mostly consider instruction level behavior as good enough, but here, we'll have to venture below cycle level and integrated hardware effects to copy the behavior.

A rule of thumb I use is that if I can reproduce a bug in approximately 0 time, I can fix it within 1 hour 95% of the time.

These test ROMs are interesting. Since they are designed to run on real hardware that lacks debugging capabilities, the user must be able to operate the test suite and check the results manually. They aren't like modern software test suites.

According to the nesdev wiki, automation consists of playing back controller input, waiting for the tests to run, taking a screen shot of the results screen, hashing it and comparing the result to the hash of the expected results screen. This method also allows testing real games to check for reproducible bugs.


Pretty nice. I expected the test ROM to write the results to some special location in memory and have the emulator report those results.

So uh I didn't see it mentioned, but the question of why this code is in the games arises.

Is it buggy code, or something intentional like anti-emulator sneakery?

It's pretty likely that it's just buggy code that happened to work on hardware.

Definitely just working by chance in this case. This comes up a lot, unfortunately. Small budgets, tight deadlines, games not validated for correctness under software emulation for detecting these types of edge cases prior to shipping.

Another example: there are several GBA games that dereference null pointers, but it works because there is no memory protection fault hardware in the GBA. Since the BIOS is mapped to address 0 and yet locked out from reading post-boot (in a failed anti-copying mechanism), you have to emulate the open bus behavior to run these games correctly as a result.

There are also several anti-emulator routines in the GBA library, such as lying about the type of save memory (flash, EEPROM, or SRAM) that a given game actually has (by including strings in the ROM from the Nintendo SDK used to access them.)

Still, in the case of Hello Kitty, there's got to be a reason the developer added that loop at startup, right? They might have not understood the intricacies of how or why it worked, but presumably they were trying to do something.

> Then I noticed a difference between my test ROM and Pokémon Emerald when they hang. There was music playing in Pokémon. There was also music playing in Sonic Pinball Party. There wasn’t music playing in Hello Kitty, but this gave me an idea.

Maybe there was supposed to be music playing, this was for timing it, and it was removed, broken, or just forgotten about?

>games not validated for correctness under software emulation for detecting these types of edge cases prior to shipping.

Did they have software emulation for that hardware at the time the game was written?

For the Game Boy Advance, yes. There was both a secretive internal Nintendo GBA emulation kit, and the public emulator development community had devkit games emulated before the system even officially launched.

I understand it was less practical to do this for say, the NES or SNES. But it should have been done for the GBA. It's not just important for future hardware to be able to easily emulate GBA games, it also protects against product refreshes breaking games, something that plagued quite a few classic Game Boy games as newer models came out and fixed bugs in the older models.

Could be original authors knew that the DMA fetched value is on the bus and waited until certain kind of value was visible.

Perhaps to "fix" some weird DMA related bug.

Interesting story. Goes to show you the falacy of writing fault tolerant systems as opposed to writing noisy systems.

Had the GBA architects made the device halt on invalid memory rather than return the last thing on the bus and hope everything would be ok this entire class of bug wouldn't exist and the authors of that code likely would've found that bug during production.

> Had the GBA architects made the device halt on invalid memory rather than return the last thing on the bus and hope everything would be ok this entire class of bug wouldn't exist and the authors of that code likely would've found that bug during production.

Of course, would it matter if these bugs were found? The games ran fine on real hardware.

There’s no maintenance burden either, because updating a gba game is literally impossible.

>updating a gba game is literally impossible.

Not quite. The Pokémon Ruby/Sapphire Berry Glitch bug was fixed by connecting to games that had a patch program, such as Pokémon Emerald, the 3rd in the Ruby/Sapphire/Emerald series.


Huh, cool. Does anyone know how this worked? I know the carts themselves are mostly write-once so the bug and patch must have been related to the save portion or some hardware specific to this cartridge, right?

I think it just used an arbitrary code execution thing to change the system clock, so that the faulty code was still there, but never failed.

> because updating a gba game is literally impossible

Hey now, I'm sure it's not "impossible", you could probably overwrite specific sectors with laser or whatever, but probably had more to do with the distribution, where the economy and environment impact wouldn't make sense.

I’m pretty sure the carts were quite literally write-once. Lasers would just damage the cart.

Are they "bugs" from the point of view of the gba designers or the game programmers? I was u der the impression that the game programmers purposely read invalid memory this way in order to prevent emulation

I don't think game developers care about emulation all that much.

Amazing story. Just shows you how complex hardware can truly be.

Correct. When you're coding really close to the metal you discover (and perhaps depend upon) all kinds of shenanigans that a higher level language compiler might work around.

The graphics on Atari 2600 had to be fed line-by-line during the vertical blanking interval. Made for some really tricky assembly coding, trying to get enough done to make the interval without wasting VERY limited ROM resources (aka can't cheap with NOP instructions).

I'm assuming you meant "horizontal blanking interval" instead of "vertical blanking interval". But I don't think that's entirely correct either. Some games will change the registers while the beam is actively drawing so that, e.g., the left half of the line will be drawn with one set of settings and the right half drawn with another.

This reminds me of the wonderful novel "The Bug" by Ellen Ullman. A beautifully-written fictionalized account of the quest for a "Holy Grail" bug.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact