Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Ghidra Plays Mario (github.com/nevesnunes)
207 points by 0d0a 7 months ago | hide | past | favorite | 29 comments
I've been exploring new ways of testing Ghidra processor modules. In this repo, I was able to emulate NES ROMs in Ghidra to test its 6502 specification, which resulted in finding and fixing some bugs.

Context: Ghidra is used for reverse engineering binary executables, complementing the usual disassembly view with function decompilation. Each supported architecture has a SLEIGH specification, which provides semantics for parsing and emulating instructions, not unlike the dispatch handlers you would find in interpreters written for console emulators.

Emulator devs have long had extensive test ROMs for popular consoles, but Ghidra only provides CPU emulation, so it can't run them without additional setup. What I did here is bridge the gap: by modifying a console emulator to instead delegate CPU execution to Ghidra, we can now use these same ROMs to validate Ghidra processor modules.

Previously [1], I went with a trace log diffing approach, where any hardware specific behaviour that affected CPU execution was also encoded in trace logs. However, it required writing hardware specific logic, and is still not complete. With the delegation approach, most of this effort is avoided, since it's easier to hook and delegate memory accesses.

I plan on continuing research in this space and generalizing my approaches, since it shows potencial for complementing existing test coverage provided by pcodetest. If a simple architecture like 6502 had a few bugs, who knows how many are in more complex architectures! I wasn't able to find similar attempts (outside of diffing and coverage analysis from trace logs), please let me know if I missed something, and any suggestions for improvements.

[1]: https://github.com/nevesnunes/ghidra-tlcs900h#emulation




This is great. I’m not clear on if the bugs you are finding are in Ghidra’s processor model or in the emulator? (Though I think it’s the latter?) Also, why would Ghidra have the best (most accurate?) processor model vs some of the highest quality emulators?

One other question: when the cpu is being emulated at a 50th of its actual speed (or less!) how does replaying recorded input work? Do all games strictly use interrupts to read input or do any poll the state instead (or maybe just at certain sequences or for certain portions of the gameplay)? If the latter, did you have to adjust the key down/key up events you were replaying to avoid a slow-executing cpu missing inputs? (As you might be able to guess, I’m an embedded dev but haven’t dabbled with emulators beyond using them.)

Thanks in advance and again, awesome work!


> I’m not clear on if the bugs you are finding are in Ghidra’s processor model or in the emulator? (Though I think it’s the latter?)

The project README includes a link to a commit fixing bugs in Ghidra's processor model, here is the author's PR submitting those fixes upstream: https://github.com/NationalSecurityAgency/ghidra/pull/5740


From what I've seen, it's usually read at the vblank interrupt.

The input recording has entries in format "<instruction_number> <buttons_bitmask>". If I press a button and it's read from the hardware register after let's say 0x1000 instructions have been stepped, it is stored as "0x1000 0x80", and in the Ghidra emulator script, I only need to count up to 0x1000 instructions before I send that memory write to the other emulator. While the real timings are vastly different, the input will be read after roughly the same number of vblank calls. I say "roughly" because indeed I found a differential on the expected call where it should be read, but it isn't yet clear if that's a logic bug on my side, I'll have to eventually look into it again.


Thanks - that’s along the lines of what I was expecting.


Cool project! I'm very interested in accurate preservation of the behavior of these old systems (chip decapping and scanning, FPGA reimplementation, accuracy-focused emulators) and using Ghidra to reverse engineer old games, especially on the 6502 and m68k architectures. Just an enthusiastic spectator at this point, but I hope to contribute something to the field eventually.

A sidenote: the action at 0:19 in the 50x-speed demo is intriguing. I've played many hours of Super Mario Brothers and watched various tool-assisted speedruns of it, but I don't recall seeing a Goomba reverse direction like that instead of just plowing into Mario. Is that a game glitch that you intended to show off with your recorded keyboard inputs? I haven't played in a long time, so I also wouldn't be surprised to hear that such behavior is common. I didn't find an obvious reference to it in the TAS info here [0].

Edit: there is precedent for that Goomba behavior [1].

[0] https://tasvideos.org/GameResources/NES/SuperMarioBros

[1] https://www.reddit.com/r/Mario/comments/add1fx/changing_goom...


I think that goomba bumped into the squished goomba Mario had just squished. Mario was just a bit to the left so the flat goombas hit box stuck out to the right a bit and the other goomba hit it.


Ahh, yes indeed.


@19s?


I thought I recognized the GitHub username, you contributed to my GNOME Shell extension years ago!


Just wondering, what was the extension?



At 17 seconds into the demo video, Mario appears to run into a Goomba horizontally without being harmed, and it just changes course to head in the other direction as if it bounced off him.

Is this the correct game mechanic behavior? My recollection as a 90's kid is this worked differently: If you run into a Goomba horizontally, Mario is toast (or loses his mushroom bigness).

Am I confused? :D

Edit: Thanks to @h0l0cube and @cylon13 for explaining this in another comment- it turns out nothing is wrong. If you look closely after reading their explanation, you can see it. Certainly a subtle detail.

> I think that goomba bumped into the squished goomba Mario had just squished. Mario was just a bit to the left so the flat goombas hit box stuck out to the right a bit and the other goomba hit it.


I noticed that as well.


Potentially answered in another comment:

https://news.ycombinator.com/item?id=37470025


Thanks for sharing, I enjoyed your project.

I'm also interested in processor module verification. May I offer some performance suggestions:

- You don't need Ghidra to use Ghidra's p-code emulator

- Ghidra's p-code emulator is part of the decompiler which is cpp not Java. It's located in ~/Ghidra/Features/Decompiler/src/decompile/cpp in source. There are examples there as well

- So instead of communicating back in forth with Ghidra itself, hack up your emulator to also use Ghidra's p-code emulator. At every step you can save state, run your emulator and the p-code emulator, and diff the final state. If there's any differences one (or both) emulators are wrong.

This will likely be too slow to play but should be much faster than your current approach. Hope this helps.


Nice, I'll give it a closer look. My only concern so far is memory hooking (still needed for hardware registers), which on Java side was called by FilteredMemoryState [1]. In memstate.cc it looks like just the simpler MemoryState is implemented [2], and there's no equivalent to MemoryAccessFilter. But it might not be that complicated to add...

[1]: https://github.com/NationalSecurityAgency/ghidra/blob/4561e8...

[2]: https://github.com/NationalSecurityAgency/ghidra/blob/4561e8...


Excellent results. As I’m sure you’ll agree there are many stones left to overturn in researching how to play video games without direct human input. I’m looking forward to your next developments.


Thanks, but I think I'm going to disappoint you: the demo is using pre-recorded manual inputs, which are then replayed when emulating in Ghidra. The only logic involved is checking when we are at the right instruction to then send the input. I mentioned it briefly in the README but maybe I wasn't very clear, sorry!


The emulator in Ghidra is really cool. I’ve been improving my Wasm processor module to support better emulation, and I’ve made use of their comprehensive specification tests to validate the implementation.

One thing that I run into a fair bit is the tension between keeping the decompiler output sane vs. implementing every nuance of a particular instruction. Trying to emulate every quirk turns into very complex P-code, which can clutter up the decompiled output. One strategy is to use custom operations (pcodeops) plus an emulator helper, but this makes the operation totally opaque to the decompiler, so it’s not suitable for common instructions.

In general though it’s super cool to have this kind of functionality available. It will be awesome if Ghidra can someday be a powerful tool for dynamic reverse engineering, not just static reversing.


Nice to see another CTF enjoyer :) I've always thought about using Ghidra for vm challenges, but I'm still not sure if it fits the typical timeframe. Although I never used it, something like binja seems more favourable to quick and dirty scripting.

About custom pcodeops, yeah I was really tempted to use them for TLCS-900. For example, instruction `daa` adjusts the execution result of an add or subtract as binary-coded decimal, and the pcode for that is just inglorious (but I'm sure there's worse out there): https://github.com/nevesnunes/ghidra-tlcs900h/blob/5ff4eb851...

Pretty amusing how a single instruction takes more than a dozen lines in the decompilation: https://gist.github.com/nevesnunes/7417e8bec2cddfcaf8d7653c9...


Is it a 6502 processor model in specific? Because the NES used the 2A03 in NTSC regions:

https://www.nesdev.org/wiki/2A03


I think it's closer to the 2A03. Unless I missed something, there isn't any support implemented for binary-coded decimal mode.


The 2A03 and 2A07 contain exact gate level copies of the 6502 with the binary coded decimal (BCD) logic disabled.


Went to check if any of the 6502 bugs were flags related and yep there was. It's a simple chip but in particular the overflow flag behavior is often done incorrectly.


Klaus Dormann's 6502 tests don't rely on a particular emulator environment. They could be used with Ghidra.

https://github.com/Klaus2m5/6502_65C02_functional_tests


so glad Ghidra was released for free!


Very old ML project from a master's student's thesis which is what originally got me into CS. He taught it to play NES games other than mario and had a good breakdown of his results.

http://tom7.org/mario/


The latest on winning Atari with RL+ FWIU:

https://github.com/openai/retro:

> Gym Retro lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games. It uses various emulators that support the Libretro API, making it fairly easy to add new emulators.

.nes is listed in the supported ROM types: https://retro.readthedocs.io/en/latest/integration.html#supp...

> Integrating a Game: To integrate a game you need to define a done condition and a reward function. The done condition lets Gym Retro know when to end a game session, while the reward function provides a simple numeric goal for machine learning agents to maximize.

> To define these, you find variables from the game’s memory, such as the player’s current score and lives remaining, and use those to create the done condition and reward function. An example done condition is when the `lives` variable is equal to 0, an example reward function is the change in the `score` variable.

PPO Proximal Policy Optimization and OpenAI/baselines: https://retro.readthedocs.io/en/latest/getting_started.html#...

MuZero: https://en.wikipedia.org/wiki/MuZero

MuZero-unplugged with PyTorch: https://github.com/DHDev0/Muzero-unplugged

Farama-Foundation/Gymnasium is a fork of OpenAI/gym and it has support for additional Environments like MuJoCo: https://github.com/Farama-Foundation/Gymnasium#environments

Farama-Foundatiom/MO-Gymnasiun: "Multi-objective Gymnasium environments for reinforcement learning": https://github.com/Farama-Foundation/MO-Gymnasium


Jeez, I remember when that came out, "very old", feels like yesterday




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: