Hacker News new | past | comments | ask | show | jobs | submit login
Using Ghidra and Python to reverse engineer Ecco the Dolphin (32bits.substack.com)
420 points by bbayles 1 day ago | hide | past | favorite | 117 comments





The hash is merely a CRC32; exactly this one (polynomial 0x77073096, code is wrong)

https://web.mit.edu/freebsd/head/sys/libkern/crc32.c

(The decoded ints in the post are the constants in this CRC32).

Knowing it's a CRC32 and knowing the polynomial allows inverting the answers in log time instead of exponential time by exploiting the modular math of the polynomial rings.


I know some of these words!

It means that using bruteforce to find all the values would be much faster.

But OP did Brute Force?

Yes, but more slowly. According to the GP, with the knowledge provided, it can be brute-forced in O(log n) time instead of O(n^2).

Is it then still considered brute force?

Brutey force

Binwalk claims to recognize crc polynomial tables i wonder if it could find this one.

Also someone needs to integrate binwalk & ghidra, they synergize too much.


Thanks! I didn't clock that - should have looked at the decrypted values!

Do you have any tips on knowing how the value is a result of CRC32 and/or the polynomial/initial value used?

In this example, the "encrypted data" is xored with the key 4 bytes at a time. The first 4 bytes in the data are the same as the key. For the next 4 and you get the constant I posted above. Plug into Google, find where it is often found, decide rest of table, see it matches.

I've learned programmers either invent their own hashes, random number generators, and crypto, in which case I usually break them, or they reuse existing algos, in which any code constants are searchable.

Plus I've written and reversed enough of all that I recognized the loop as a CRC polynomial remainder loop.

All Crc-n algos are trivially crackable/reversible/collideable. They're a remainder on division of polynomials (learn the math on how they work), so use the polynomial equivalent of extended euclidean algo and you get one answer. Now all sufficient multiples of that mod class give all possible answers, one at a time.

That should give you plenty to chase through


Is it possible to learn this power?

Looking in the binary for the polynomial and knowing what the common ones are from experience is an easy way.

Normally, the polynomial is going to be found right next to a loop that is ingesting bytes incrementally.


When the original Ecco came out on the Megadrive (Genesis), I spent all my hard-earned money to buy it. That game is obscenely hard. I got frustrated, so I sat down for the afternoon with a pen and paper and somehow managed to decode the password system. I teleported to the final level and completed it the next day.

Then I was wracked with guilt about spending all my money on a game I completed in two days.


> I sat down for the afternoon with a pen and paper and somehow managed to decode the password system

Would love to hear more about this, if you have any recollection :)


Philosophically, I would argue that you did not complete the game.

You skipped several levels and saw only some percentage of the intended content, gameplay, story, etc. Games in general, and Ecco the Dolphin is no exception, are very much about the journey and not just the destination. You missed out on themes & experiences like isolation, making friends with those outside of your in-group, conservation, time travel, communing with dinosaurs and, of course, space travel.

So, you really shouldn't have felt so guilty.


I wonder if this will make him feel guilty for feeling guilty

And maybe @dfxm12 will feel guilty about that - it's gonna be a tough for everyone

There are plenty of people who would argue that getting to end credits means beating the game.

You can however say that skipping levels also skips the story, so they did not finish the story.


OP didn't say they "beat" the game. If you want to argue that, it's a different discussion.


Does that mean that taking warp zones in Mario doesn't count as beating the game?

IMO it depends, did you find the warp zones yourself or were you told about them? They're hidden. Finding them by luck doesn't feel like cheating to me, but getting outside knowledge to bypass big parts of the game kind of does.

Not in this context.

If you told me you beat Mario on NES but you didn't even play 24 out of the 32 levels, and you never beat them otherwise, I don't think I'd give you the same credit as someone who beat each level.

This is why Any% speedruns (get to end credits any way possible) are their own category.


In speedrunning circles there are categories like 100%, any% (get to the end in any way), minimum percent (get to the end doing the least possible), glitchless, no major glitches, etc.

People have different interests and finish in their own way.

If you’re really into a game you’re missing out if you don’t try to beat it in different ways.

If you’re really into one particular way you’re really kind of being a bad sport if you insist others enjoy a game in your preferred way.


What a final level, though! Having skipped a large chunk of the game, were you surprised by it?

You must be the only Person in the world that Beat this Game, cheating or otherwise.

https://youtu.be/OGVUuVjXMTA ecco the dolphin any% speedrun world record [17:54]

which is actually faster than the 20:44 TAS! (https://tasvideos.org/228G)


>which is actually faster than the 20:44 TAS!

I had no idea what this was.

For people like me, after clicking through the link and some googling:

TAS = "Tool-assisted speedrun" or "tool-assisted superplay", and they are "generally created with the goal of creating theoretically perfect playthroughs".

That any% run was fun to skim through though. I had no idea what was happening.


Usually, speedruns aren't actually faster than the TAS, unless the speedrun used some new technique was developed after the TAS was made. Normally the biggest different is that TASes and regular speedruns use different methods of timing.

TASes are pretty much always measured from power on to last button input required to trigger the credits. With normal speedrun, timing various from game to game, but a common method is timing from selecting new game to the last hit on the final boss. So games with long openings or with interactive post-final boss sequences that have to be played before the credits start would have inflated times on the TAS.

The TAS counts about an extra 15 seconds before the game starts. The TAS reaches the point the speed run stops counting at 18:52, and continues to play out the ending. So the TAS would be measured as about 18:37 using speed run timing, so the speed run is still genuinely faster than the TAS, but less than the official numbers indicate.

It seems like the speed run uses a glitch that the TAS deliberately avoided. From the TAS description:

> At the same time, a major new glitch allowing Ecco to go through nearly any wall was not used because the frequency of its use would make the run very repetitive.


Is that just because no one cares to create a TAS for the game? I don't know anyway a RTA would be faster than TAS other than no good TAS exists.

Usually the timing rules aren't the same for TAS and RTA, as TAS timing always starts at console power-on while RTA usually when 'new game' is selected from the menu.

You mean "usually" in sense that for this game in particular or "usually" as in "most of the time regardless of the game"? Because I have never heard about TAS being timed differently than RTA in any other game, that doesn't even make any sense. Game is the same, why would you time TAS from different point than RTA?

Whole point of a TAS is to show what "perfect" speed run would look like.


Usually as in "most of the time regardless of the game", yes.

Seems like a sibling comment explained this already in much more detail: https://news.ycombinator.com/item?id=42082420


I’d say more like the TAS is outdated, a person can discover something new that the person doing the TAS didn’t.

Both are a decade old. Looks to be posted within weeks of each other. If this is the best TAS there is for the game, it means that no one cares about the TAS.

QQRIQ is a phonetic abbreviation of "kukuriku", which is the sound of the rooster in Hungarian and in several other languages (Polish "kukuryku", Hebrew " קוקוריקו" etc.). Makes wonder what the process for choosing the passwords was.

Also Gyugyu might be a reference to the Hungarian movie: "The Fifth Seal"

"Just imagine you are about to die, but you will be reincarnated in to one of two people; a slave or the rich master. The slave suffers under the master. He has his tongue and an eye removed and his wife and child are killed. He goes on living knowing he is a good person, as he never committed such appalling, sadistic acts on another like his master has done. The rich master has no moral qualms about it at all. He doesn't think what he did was wrong; the slave needed to be punished. You have the choice, whether to be a poor and righteous slave or be a rich and corrupt master."

Gyugyu is the name of the slave.

https://www.imdb.com/title/tt0075467


Gyu is also the Japanese word for beef.

Maybe, but this game was developer by a Hungarian team.

The programmers for this game were in Budapest, so this is a good guess!

I didn't know that. The QQRIQ jumped out at me because of my Polish background, so I googled it and found it's international.

Popely with a Hungarian "ly" was a giveaway too

Cocorico in french, very close :)

Kikeriki in German!

Very similar to "kikiriki" in Spanish.

Wonder why some went for an o-sound and others an i-sound. To make matters worse it's kykeliky in Norwegian, so both y, e, i.

…and kukeliku in Swedish, neighbour

I wonder if Americans are getting blocked for typing their onomatopoeia.

See also: the town of Kakariko in Zelda which always has chickens

Wow I had no idea that was the origin of that name

could it be that the rooster/cockadoodledoo is something performed like clockwork in the morning, so the showing framedata also being tangential to time and clocks ? probably not

an interesting aside: when asked about his inspirations Ecco's developer Ed Annunziata said, "No, I never took LSD, but I did read a lot from John C. Lilly". Lilly is known for his pioneering work in the fields of animal intelligence, ketamine psychotherapy, isolation tanks, and consciousness exploration.

The name "Ecco" is a reference to Lilly's ECCO (Earth Coincidence Control Office), a supernatural/extraterrestrial base which John posited existed on the other side of the moon to coordinate all earthly "coincidences". He was also one of the first to recognize how intelligent dolphins were and became obsessed with figuring out how to communicate with them, going as far as flooding half of his house in the Carribeans to cohabitate. This is just the tip of the iceberg. I'd highly recommend his autobiography The Center of the Cyclone if any of this is intriguing, he's a fascinating guy


Great read!

Do you have any resources on getting started with Dreamcast game reverse engineering? I've been wanting to do some things with Skies of Arcadia, and I've been hoping there exist techniques more systematic than "see what values change between memory snapshots".


> I've been hoping there exist techniques more systematic than "see what values change between memory snapshots".

FWIW this is pretty much the standard method for locating value locations in RAM. It actually works pretty well. Some emulators have tools built in for that, like Dolphin for example. Even old game hacking tools like the Gameshark for N64 used the technique, with an on-console UI. I don't know if any Dreamcast emulators have tools for it or not.

I wrote about the technique in Dolphin here (and the followup article is also about console game hacking with Ghidra): https://www.smokingonabike.com/2021/01/17/hacking-super-monk...


Incidentally, Skies of Arcadia Legends on Dolphin is fantastic:

https://www.youtube.com/playlist?list=PLwH1xJhcXG0dBlmWL_DTu...


> Some emulators have tools built in for that, like Dolphin for example.

This was an advertised feature of some DS flashcarts back in the day, too. I can't remember if it was the R4, the DSTwo, or what...but I recall an example video for their "Make your own cheats!" feature, which involved playing something like Super Mario Bros, turning on the "Cheats Finder" feature, then grabbing a coin, and maybe doing it a few times. The manager would then figure out the value that's changing in memory (presumably the sector that stores your coin amount), create the "cheat", and then you would enable it and watch your coin value go up.


And old skool 'trainer' tools on PC. Someone more familiar that myself could give better information but I remember trying them out in the early 2000s and remarking how they reminded me of GameShark for PC

I've poked around a bit with that game! The main trick is to import the memory snapshot (various ways of dumping it to a file; people like Cheat Engine for this) into Ghidra.

Ghidra can analyze the SuperH processor machine code natively, so the auto analysis will turn up lots of functions.


I always wondered where to start learning reverse-engineering. Most people will say learn Assembly first. But from there on, there seems to be not much more concrete information online.

Do people just figured it out by trial & error like common patterns in x86 / arm / arcade platforms slowly?

I can't really find much discussion on details online.


It's like debugging.. I'm sure you must have worked on an unfamiliar code base at some point and had to figure it out. Instead of having the source you have the binary and using tools like Ghidra you can start to piece together the source but you'll still need to reason over it the very same way you did on that unfamiliar codebase and this time there's no comments at all ( which isn't uncommon in a lot of source available projects mind you )

So you're probably already half way there. Being familiar with assembly code helps of course.


I knew some from school, but stepping through a debugger with a video game that I remembered from childhood was a better education on computer engineering than anything I got in class.

run through http://microcorruption.com or https://crackmes.one/ for closer to real-life examples

I've taken to older games a lot more in recent years, they feel like they have a lot more soul if that makes any sense. Also sorry about your car! Not going to leave it idling in the driveway anymore, thanks for the warning.

Are you sure you don't miss modern features like mandatory network connectivity and micro transactions?

What happened to their car? I didn't see anything about that in the post or comments.

Wow, a blog that focuses on the Sega Saturn!

Not too long ago, I found a Saturn in a closet at my parent’s house, along with a small handful of game CDs. I don’t have any recollection of owning one, so I’m guessing my little brother must have acquired it after I left for college. Anyway, I plugged it in and all the games worked! But other than that I have no idea what to do with it (obviously the trash is not an option).


The games are worth a fortune. I gave mine away in 2005 and now I get heart palpitations when I look them up on eBay.

Interesting. My son suggests getting a table at VCF next year and setting it up for people to play; perhaps I’ll do that and have a sign that says “make an offer”


You also have

- The Satiator, that plugs into the video card slot so you can still use original CDs.

- The Saroo, that plugs into the cartridge slot, also emulates the RAM expansion carts, and it is much cheaper (but seems to have some compatibility problems).


I'd love to see footage of the underwater soccer cheat in action.

Can we just take a moment to appreciate how incredibly odd the Ecco series is? For anyone that beat the games. You go from swimming in an ocean to flying with aliens. It's bizarre. Some people classify it as a horror game.

The name of the game probably comes from dolphins echolocation ability. Another explanation I like to entertain is that the name is a reference to John C Lilly. He was a scientist who believed in an alien organization called the Earth Coincidence Control Office or E.C.C.O. He also studied dolphin intelligence and communication. He gave dolphins LSD in an effort to communicate with them. John C Lilly is an interesting rabbit hole to go down.

Probably both

My favorite aspect is the music, I regularly listen to the OST while working: https://youtu.be/tqMuvFEKCOk

I've played a decent amount (never finished it), but I never understood why people say it's a horror game?


A friend of mine explained that there are overlapping phobias of water and deep dark spaces that this game triggers; he said it's unbearable to play the later levels.

I'd imagine anyone with claustrophobia would struggle with the last couple levels as well since they're essentially a battle against the screen crushing you.

I just checked what the final boss looks like, combined with the phobias... glad I didn't finish the game as a kid.

Thanks for the article, great read!


It really is a game with a strange mix of aesthetics. Blue skies and bright colors in the early levels; suspense and dread and supernatural stuff in the later levels.

Another Ed Annunziata game called Three Dirty Dwarves is also stylistically unique.


Nice! Interesting how similar games seem to do this, checksumming to fixed integers. Pitfall: The Lost Expedition did something very similar by converting button presses into ASCII-represented strings of the input buttons that were then CRC-ed. The approach was similar to just brute-force in Python and compare to the extracted cheat hashes.

I even spy your CRC32 table hidden in the `decrypted_ints` . The pre-generated tables are so easily searchable. It leaves me curious why they are so often found obfuscated in attempt to make it more difficult compared to generating a new one with your own polynomial.


You should look into PS2 version of this game, it seems to have same code for level unlock. Maybe it will be easier to reverse engineer and figure out what all codes do?

I wish there was more detail on "how" this was done as opposed to just the "what"

It's very much the "how", what were you looking for that's not explained in the blog post?

--- By analyzing a memory snapshot from the flycast emulator, I found that the buffer at 8cfffb34 holds the visible portion of the initials you type in. But if you keep typing, the characters you put in before get pushed into the buffer at 8c3abf18.

After loading the memory snapshot into Ghidra, I found that the function at 8c0334d8 reads this buffer. It performs a transformation on the buffer and then checks whether the transformed value is a list of six special ones. ---

How?


I don't know exactly how flycast works but I've done similar things with other emulators and you take an action in the game (take damage, type something), then search memory for that value. In this case the ascii code for the letter typed. Keep doing this until you've narrowed down a single block of memory that holds everything you've done

Anyone else never beat the second level? Yes, we all eventually figured out that we had to jump over the rock wall… but after that… then what?

im curious about the process to find that initial buffer address - does that involve entering a few different strings and searching the memory snapshot for those byte patterns ?

Yeah, exactly! I took a couple memory snapshots of the name "AAA" and then threw out all of the addresses that had values that didn't match the first snapshot. Then I changed it to "BBB" and threw out all the addresses that did match.

There's a program called Cheat Engine that can make this a point and click thing; that's usually how people find GameShark-style codes.


The 3DS version was very cool. I plan on completing it there eventually. I wonder if those devs had source access.

I wish more people knew how to change to Ghidra's dark theme and change the font.

Maybe the author likes it this way. People can have preferences that differ from your own.

[flagged]


>Python is indeed good for small throwaway scripts

I think you just answered your own question.


Ghidra is an already existing piece of software; the Python script was written for the specific purpose. The title isn't about comparing the relative importance of programming languages to getting the answer, but about explaining the tools used and the effort involved in using them.

[flagged]


This is a fun little post about reversing a dreamcast game.

No one is being a zealot.


Interesting, as a very frequent HN reader, I do not associate this pattern with Python zealots at all. Rust, Swift, Go, more like.

I bet if you used pen and paper occasionally in the process you'd boast "I solved it using pen and paper!"

Maybe the author is more comfortable with Python than the alternatives.

By this logic anything meaningful anyone does is actually done with C, because no matter what you do there is 40 million lines of kernel code underneath...

"What is the importance of Django, it's only 10 thousand lines of Python, powered by millions of lines of interpreter written in C..."


No, the kernel does not solve the major part of the reverse engineering, the Java application Ghidra does.

But Java does not need any marketing, people just quietly use it.


What is the importance of needing to frame Python as tiny here? The reminder that programming languages are acceptable for small throwaway scripts is appreciated

I believe they were calling the function tiny, which it is, it’s 27 lines.

python is great for a lot more than small throwaway scripts.

Depends on who you ask and to what lengths you're willing to go.

"Python is used by Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify, and a number of other massive companies. It's one of the four main languages at Google, while Google's YouTube is largely written in Python. Same with Reddit, Pinterest, and Instagram"

"Python is used heavily in academic research, particularly in bioinformatics, biology, and mathematics. It is the standard introductory language for many university computer science programs."

https://brainstation.io/career-guides/who-uses-python-today

Misquotation alert: I'm not claiming python is perfect for everything. There are times it makes sense to use something else. Not-short-scripts isn't it.


It's utter garbage outside of a controlled environment. Youtube can use it because Youtube will have an official environment and there will be no such thing as a script that was written in one version or with one module installed that then breaks at run time.

The impressive size of the big users actually works against proving how great it is.

Use the official version inside Google or Netflix: ok.

Use in a package where the package manager ensures all dependencies and versions are met exactly: ok

Use by writing and immediately using and discarding today: ok

Write a random script and expect it to work in 6 months or on any other machine or god forbid another platform: forget it.

python is great for the author and miserable for everyone else


I assume the last time you used python was during the transition from 2 to 3?

I haven't had any problems with versions over the last 5 years. conda is a really good way of ensuring you get the same environment if you need to freeze versions.


conda is a non-standard python tooling with canned environments with packages that are years out of date, a constraint solver in its package manager that randomly seems to run forever, and it's a commercial product.

Of all the Python packaging solutions, it's the worst.

The fact that so many people use it, as a matter of course, is further evidence of the fragility and complexity of maintaining Python tooling and codebases in general. The fragility of Python packaging is how we arrived at the current status quo of needing a CD/CI setup for hello-world.py. My statically-linked Fortran executables? I could keep copying around the same binaries until people switched architecture.


hello-world.py works out of the box!

It was true long before 3 was even started. But assume as you please. Enjoy your python, I'm not taking it from you.

that's what I mean, they haven't had any massive break in backwards compatibility for a while. I did get annoyed by it before and attempted to create my own language/platform - if need be, I have a basic setup of imgui/chibi/c-ffi that I could expand into something useful for scripting that isn't likely to break, but honestly it's such a massive amount of work compared to a conda env so I don't mind sticking with python for now. there's also micromamba which is a lot faster for the solver.

Is not this against the Ghidra EULA?

What EULA? It looks like https://github.com/NationalSecurityAgency/ghidra says it's Apache 2.0

> Is this not against the Ghidra EULA?

He's not trying to reverse engineer a serial or key file. It's being used for private use. He's not making $$$$ at SEGA's loss. It's not going to destroy SEGA's reputation.

and finally they are a hacker so the dopamine hit from being curious will be a big pay off.


Even if he was, then what? Quick glance at the repo does not show any kind of End-User License and even if it had what's the NSA going to do? Revoke his license? Sue him?

In the real world, no one cares, unless they're on the receiving end of a lawsuit.

citation needed



Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: