Hacker News new | past | comments | ask | show | jobs | submit login

I've modified Ghidra in order to unlink pieces of an executable back into relocatable object files.

To keep things simple, source code files are compiled into object files which are linked into an executable. Object files have sections (named array of bytes), symbols (either defined as an offset within a section or undefined) and relocations (a request to patch up an offset within a section with the final address of a symbol) while executable files only have sections. The linker takes all the object files, lays out the sections in memory, fixes up the relocation and writes out an executable file without the symbols or relocations.

With Ghidra I can reverse-engineer an executable and recreate symbols, data types and references between symbols. Then, with my modifications I can recreate relocations with that information and, once a range of addresses has been fully processed, I can select it and export it as a relocatable ELF object file.

Why? This allows me to extract parts of an executable as object files and reuse these by linking them my own source code ; I don't need to fully-reverse engineer these extracted parts, I just have to basically identify every relocation there was originally in that part. I can also divide and conquer my way to decompiling an executable by splitting an executable into multiple object files and recreate its source code one object file at a time, like the Ship of Theseus.

So far it works with what I've tested it with and I've been meaning to write a series of articles to explain that process in detail, but writing quality technical articles with illustrations on a topic this esoteric is very hard.

  - My Ghidra fork: https://github.com/boricj/ghidra/tree/feature/elfrelocatebleobjectexporter
  - My initial prototype in Jython (has a readme): https://github.com/boricj/ghidra-unlinker
Note: this works only with 32-bit MIPS, little endian, statically-linked executables. It can be made to work with other architectures by writing a relocation synthesizer for it, but so far I only care about decompiling PlayStation 1 games.



Amazing. Do you have any intention of opening a merge request to get this into Ghidra? Or maybe in the way of a plugin?


I tried to upstream some of my refactorings/modifications needed to support this, but it was rejected by upstream [1]. I don't blame the Ghidra project for this decision ; my modifications are fairly intrusive (modifying the relocation table after the initial load, extensive refactoring of the ELF support code...) and my workflow is essentially unproved in public.

By that I mean I have no documentation, no series of technical articles describing this process and no public, non-trivial project to demonstrate it in real life. I do have a currently private decompilation project that uses this successfully [2], but it's not currently public and it's nowhere near finished.

Also, I only wrote a relocation synthesizer for statically-linked, 32-bit, little endian MIPS ELF. That's a fairly obscure platform, I'd expect most people care about mainstream instruction sets like x86_64 or ARM64.

If you can suggest a forum where people would be interested in this, I can drop a message there and answer more in-depth questions if you want. So far I've worked on this all on my own and I'm kinda out of the loop from the rest of the reverse-engineering community.

[1] https://github.com/NationalSecurityAgency/ghidra/pull/5010#i...

[2] https://news.ycombinator.com/item?id=35739949


That might make LGPL static linking more legally feasible for a lot of programs too!


Absolutely incredible.


Thanks!

If you want to take a look at the source code, here are some pointers:

  - The relocation synthesizer for MIPS: https://github.com/boricj/ghidra/blob/feature/elfrelocatebleobjectexporter/Ghidra/Processors/MIPS/src/main/java/ghidra/app/delinker/MipsElfRelocationTableSynthesizer.java
  - The Ghidra analyzer that leverages this synthesizer: https://github.com/boricj/ghidra/blob/feature/elfrelocatebleobjectexporter/Ghidra/Features/Delinker/src/main/java/ghidra/app/analyzers/RelocationTableSynthesizerAnalyzer.java
  - The classes that implement the ELF object exporter: https://github.com/boricj/ghidra/tree/feature/elfrelocatebleobjectexporter/Ghidra/Features/Base/src/main/java/ghidra/app/util/exporter/elf
  - The Ghidra exporter for ELF object files: https://github.com/boricj/ghidra/blob/feature/elfrelocatebleobjectexporter/Ghidra/Features/Base/src/main/java/ghidra/app/util/exporter/ElfRelocatableObjectExporter.java


Thank you! I’m fascinated by what must have led you to develop this knowledge.


I got inspired by various decompilation projects of old video games and decided to do one myself. I specifically chose "Tenchu: Stealth Assassins", a game for the PlayStation.

I haven't asked around, but I assumed nobody else out there had both the skills for reverse-engineering video games in general and motivation to work on this game in particular. I started reverse-engineering the game with Ghidra and quickly realized that "this game's code is kind of held together with glue and duct tape" (quoting a speedrunner of this game). It's quite the understatement: the code's a complete tangled mess.

I realized that with my current tooling and knowledge there was no way I could hope to complete this decompilation project by myself. I wanted to divide and conquer the problem into smaller, reasonably-sized pieces, but I just have one big executable and I can't just split it into pieces... or can I?

So I tried to innovate my way out of this mess. Ironically, perfecting the unlinking process and making it usable in practice has taken a long time, but it was intellectually rewarding and progress was tangible, so I did not lose motivation along the way.

As for the reverse-engineering of the game itself, my biggest achievement so far is managing to unlink the archive code from the game into a relocatable object file and writing an utility that leverages it to extract files from the game data archive. That sounds complicated, but with my tooling I just need to identify and annotate about 30 functions and global variables used in that part of the program to be able to export it, independently of the rest of the program. Then it's just a matter of writing some C glue code, compiling it to a Linux MIPS program and using QEMU user mode emulation to run the utility, without ever having rewritten that archive code in C or figuring out how it actually works.


> As for the reverse-engineering of the game itself, my biggest achievement so far is managing to unlink the archive code from the game into a relocatable object file and writing an utility that leverages it to extract files from the game data archive. That sounds complicated, but with my tooling I just need to identify and annotate about 30 functions and global variables used in that part of the program to be able to export it, independently of the rest of the program. Then it's just a matter of writing some C glue code, compiling it to a Linux MIPS program and using QEMU user mode emulation to run the utility, without ever having rewritten that archive code in C or figuring out how it actually works.

I figured you’d have to be exceptionally proud of this. I don’t find this specific, yet extremely useful skill, to be common among reverse engineers.

Though you’d wish it was!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: