
Pokémon Crystal disassembled source code - beltex
https://github.com/kanzure/pokecrystal
======
minimaxir
Two interesting fun facts about Pokemon Crystal:

1) Pokemon Gold/Silver, the prior versions of Crystal, was programmed by
_four_ people.
([http://en.wikipedia.org/wiki/Pokémon_Gold_and_Silver#Develop...](http://en.wikipedia.org/wiki/Pokémon_Gold_and_Silver#Development_and_release))

2) Crystal is 1MB in size. Yes, one megabyte.

~~~
ginko
> Crystal is 1MB in size. Yes, one megabyte.

Is this notable because it's little or because it's a lot?

Zelda III for the SNES was 1MB. Pokemon Red and Blue was 512KB, which is also
quite a lot for a Gameboy game.

~~~
jrockway
Considering busybox (statically-linked ls, etc.) is 2MB, 1MB is pretty
impressive.

~~~
pdw
There was an interesting talk on this at Linux.conf.au a couple of years ago,
trying to figure out why modern binaries are so large.
[http://www.youtube.com/watch?v=Nbv9L-WIu0s](http://www.youtube.com/watch?v=Nbv9L-WIu0s)

> Bloat: How and Why UNIX Grew Up (and Out) - Rusty Russell, Matt Evans

> The 'ls' binary on the original release of Unix (version 6) was 4920 bytes
> long. Thirty six years later, 'ls' on Ubuntu is 105776 bytes. Is this the
> laziness of modern coders? Increasing features? Does 'cat' really now do 313
> times more stuff, or is there something else going on?

~~~
nitrogen
OT: I find it slightly disturbing to read people's Google+ conversations that
were imported to extend the YouTube thread. I hope YouTube/Google isn't really
buying Twitch, despite the return that would mean for the Twitch investors.

------
aculver
Unlike a disassembly of code that is written in a higher-level language like
C, these old NES and Gameboy games were written in the assembly language
you're seeing disassembled. Obviously there are no comments and labels, but
the actual _logic_ and the intent of the original developer is clearly
communicated.

This allows for a really special type of awesome when you're working on fan
translations and ROM hacks. You actually have the opportunity to analyze the
work of the developers you idolized when you were younger, and celebrate a
their clever hacks or curse them for their spaghetti code, 15 to 25 years
later. Furthermore, you can contribute your own clever hacks to the code base.

~~~
kanzure
> Gameboy games were written in the assembly language you're seeing
> disassembled. Obviously there are no comments and labels,

These look like labels and comments to me:

[https://github.com/kanzure/pokecrystal/blob/master/engine/ti...](https://github.com/kanzure/pokecrystal/blob/master/engine/title.asm)

[https://github.com/kanzure/pokecrystal/blob/master/battle/ai...](https://github.com/kanzure/pokecrystal/blob/master/battle/ai/scoring.asm)

[https://github.com/kanzure/pokecrystal/blob/master/home/deco...](https://github.com/kanzure/pokecrystal/blob/master/home/decompress.asm)

[https://github.com/kanzure/pokecrystal/blob/master/battle/hi...](https://github.com/kanzure/pokecrystal/blob/master/battle/hidden_power.asm)

~~~
ANTSANTS
They meant "obviously the comments and labels of the original programmers are
not included."

------
Ideka
This is certainly a thing of beauty. See, for instance, the way the trainers
are defined:

[https://github.com/kanzure/pokecrystal/blob/master/trainers/...](https://github.com/kanzure/pokecrystal/blob/master/trainers/trainers.asm)

It almost looks like a high-level language.

~~~
monocasa
This is a pretty commmon technique in video games generally referred to as
"data driven design". Basically put as much of the runtime decisions that are
made into a data value that can be tweaked by designers without having to ask
an engineer to do it each time.

~~~
ionforce
Final Fantasy for the NES seems to be designed the same way. A lot of the
spells are actually "call function A with strength N, call function B with
strength M and flag X". And with a large enough vocabulary of functions and
flags, you get fire spells, healing spells, instant death spells, protection
spells, etc.

------
kanzure
Repo owner here. Yes, it's a disassembly, but also it compiles back into the
original ROM if you follow the README (it's not a "dump and flee").

~~~
serf
Would you care to comment on what tools are popular for disassembling these
ROMs? I think it's a pretty neat hobby, anywhere suggested to read more about
it?

~~~
beltex
From looking at the docs

Assembler used:
[https://github.com/bentley/rgbds](https://github.com/bentley/rgbds)

RE tools used: [https://github.com/kanzure/pokemon-reverse-engineering-
tools](https://github.com/kanzure/pokemon-reverse-engineering-tools)

~~~
serf
totally missed that, thanks.

------
LazerBear
This made me wonder if it's possible to automatically reverse engineer a small
binary file into human readable (and understandable) source code. Assuming you
know the language and compiler used (and all of its quirks and optimizations),
and considering that human written programs aren't so random and their
patterns are most likely predictable, I think it should be possible though not
at all trivial. Are there any projects attempting this?

~~~
dmm
Labels and comments go a long way to making an assembly project readable. I
don't know how an automatic tool could interpret the human intention behind a
label.

~~~
userbinator
It's not that difficult, but I think the main obstacle is gathering and
representing the collection of knowledge in a useful form.

There was a disassembler called Sourcer that would annotate code with a set of
predetermined comments based on its knowledge of the PC hardware. For example
a sequence of instructions that enabled the interrupt controller by setting a
specific bit in its register would be identified and get the comment "enable
interrupt controller". I seem to remember IDA can do the same thing, although
it's been a while since I last used it.

------
niedbalski
This is fairly nonlegal, right? ...

~~~
Vespasian
To sad that so much work is likely to vanish within a few days. But they
probably knew what was coming.

~~~
ANTSANTS
Many famous Nintendo games have been publicly disassembled before, and
Nintendo is either unaware or turns a blind eye. The only different thing
about this disassembly is that it is hosted on GitHub, where it is much more
likely to be seen by people that aren't already interested in ROM hacking and
retroprogramming.

~~~
kristofferR
Thankfully Github doesn't do preemptive DMCA removals, so it looks like it's
safe for the time being.

------
Sir_Cmpwn
Relevant:
[https://bitbucket.org/iimarckus/pokered](https://bitbucket.org/iimarckus/pokered)

~~~
sanqui
That link is, alas, outdated. pokered is now located here:
[http://github.com/iimarckus/pokered](http://github.com/iimarckus/pokered)

~~~
Sir_Cmpwn
Thanks! I didn't know.

------
Rolpa
Well, it looks like I can finally catch'em all (after fifteen years!)

~~~
VikingCoder

        printf("You caught them all!\n");
    

I'm a winner!

------
serf
any clue what language the original development took place in?

~~~
kanzure
Originally the game was written in a variant of asm resembling z80.

~~~
serf
I knew that the Gameboys were Zilog machines, but I never really put any
thought into game development. I imagined they used a higher level language
during actual development and ran the code through something that would result
in the assembly for use in whatever. Pretty cool stuff.

~~~
ANTSANTS
General purpose high level languages[1] didn't become practical in commercial
game development[2] for even what we would consider to be "non-critical" code
until Doom, the PlayStation era for consoles, and the Game Boy Advance era for
handhelds. Pretty much everything produced for the systems before them was
done completely in assembly, and the practice remained relatively common for a
while afterward.

This is partially because C compilers of the time weren't that great,
partially because "everything is a critical section" when you're pushing the
limits of extremely limited hardware, and partially because C's virtual
machine model does not fit the average 8-bit CPU architecture at all, making
pretty much impossible to this day to compile regular C code to (say) 6502
machine code that is anywhere close to optimal.[3]

[1] Ok, many games had custom bytecode interpreters, like SCUMM or the typical
RPG's textbox language, but that's not quite the same thing. They could have
used FORTH more (were there any games that used FORTH?), but I guess even in
the days of HP calculators, people hated RPN.

[2] I guess I'm ignoring the many games done in BASIC for the 8-bit micros,
but there's a clear difference in quality between these games and the average
NES action game.

[3] There is a C compiler for various 8-bit architectures called cc65, but it
chokes on anything resembling modern C. To get anywhere close to the
performance of even naively written assembly, you have to hold its hand with
machine-specific annotations (like "stick this variable in the zero page," or
"make every variable static because the machine has no stack-relative address
modes") to the point that it's easier to just write in assembly from the
start.

~~~
armada651
Actually Pokémon Crystal itself, the rom we were discussing, has an extensive
higher-level scripting language built-in for events, messages and movement. It
still looks very much like assembly, but it is built up of Pokémon commands
instead of machine code.

For example, this is the script to choose your starter pokemon:
[https://github.com/kanzure/pokecrystal/blob/master/maps/Elms...](https://github.com/kanzure/pokecrystal/blob/master/maps/ElmsLab.asm#L171)

------
pdw
Wow, it looks like half a dozen people have been working on this for two
years...

------
jevinskie
I flagged this story because I don't like to see blatant copyright violations
on HN. Instead, I would have loved to seen a tool that automatically
disassembles a user-provided ROM to the equivalent of this repo.

~~~
scrollaway
> Instead, I would have loved to seen a tool that automatically disassembles a
> user-provided ROM to the equivalent of this repo.

Well ain't you fancy.

Look at the commit log. Please. This stuff can't be "automatically
disassembled" into the great looking codebase it's currently presented as.

Whatever though, I guess asking some people to think for more time than it
takes them to click a link is a bit much these days.

~~~
jevinskie
I did see the commit log. I believe the project could instead check in symbol
names, data structures, and comments that tag addresses in the ROM. You would
also want to group those symbols into logical translation units. The copyright
status on this repo may still be murky (are the comments and symbol names
derived works?) but much better than the current situation where the
copyrighted ROM can be rebuilt from from the repo.

Edit: I believe you can do something similar with IDA database dumps. The
database doesn't contain the original executable image but instead contains a
log of the IDA commands applied to the imported image. Another user can import
their executable image and "replay" the IDA commands in the database.

~~~
ripter
Have you written a program that can do this?

These people put a lot of hard work into something people find interesting.
I'm glad this is on hacker news because I'm interested in how these games were
written and developed.

~~~
scrollaway
> Have you written a program that can do this?

Heavens no, it's much easier to complain and criticize other people's work.

I had a feeling "being useful" is something people like OP don't actually do,
and checking his online presence only proved me right. This is making me
abnormally furious.

