Hacker News new | comments | show | ask | jobs | submit login
How to program an NES game in C (nesdoug.com)
381 points by muterad_murilax on Aug 13, 2017 | hide | past | web | favorite | 69 comments

IMHO a 6502 is too limited to be effectively programmed in C; even this part of the article gives all the limitations: https://nesdoug.com/2015/11/15/2-how-cc65-works/

With this important note: "clean unaltered C code will compile into very slow code that takes up too much of the limited memory space."

In other words, "C" written for such a CPU will be in a vastly different style from more "normal" C, so that it might be better to just use Asm.

Then again, 8051s, PICs, and other extremely constrained MCUs have been targeted by "C" (subsets), so it's definitely possible if a bit awkward. Personally, I think something like an 8080 would be the minimum to do relatively comfortable C programming in, with a Z80 being far more preferable.

Does c65 turn structs into parallel arrays?

If you have something like

    struct Monster {
      unsigned char hitPoints;
      unsigned char damage;
      unsigned char shieldLevel;
      char* name;
    static Monster s_monsters[] = {
      { 5,   1, 0, "orc", },
      { 50, 10, 5, "dragon", },
      { 10,  3, 1, "goblin", },
that's a no-no on 6502. You generally need to store those in parallel byte arrays including the string pointer's upper and lower bytes each in separate arrays. That was usually done with assembler macros or compile time tools but it was important not to have to generate addresses that have to manually assembled in 6502. Similarly there's no multiply so math like sizeof(Monster) * index.

Didn't John Carmack use structures rather than parallel arrays even in Apple II assembly? Source: one of his old .plan files, reproduced verbatim in _Masters of Doom_

Edit: Found it online here: https://github.com/ESWAT/john-carmack-plan-archive/blob/mast...

I wrote a few lines about 6502 vs Z80 a couple of years ago: https://news.ycombinator.com/item?id=10766264 (parent of linked-to comment may also be of interest) - the discussion there was about relative efficiency of the two CPUs, but the addressing mode limitations that make one approach a bit inefficient on the 6502 are the same limitations that make it a bit inconvenient.

I wonder what Carmack had in mind.


P.S. In my comment there, I didn't mention the possibility of having the arrays interleaved. Then it looks like a struct in memory (byte 0 is X and byte is DX, say), and you add the field offset to the base of the array. It's not uncommon for assemblers to have some syntax to make it easy to specify the layout of each struct, to simplify future changes, e.g.:

    X DS.B 1
    DX DS.B 1

    LDA #0
The disadvantages of this are that it reduces the number of accessible items (the X register is only 8 bits...); it makes moving from item to item more difficult (all you can do with the X register is increment or decrement it, but now you need to add arbitrary values to it); and it means there are invalid item indexes (e.g., in the example above, index 1 is not valid).

And the advantages... well, there aren't any, really, I don't think? Everything has to be done 1 byte at a time, so it makes no difference where the bytes are relative to one another... the code ends up the same anyway. (If anything, because you can do INX/DEX to get from item to item, it's actually slightly neater to have a separate table for each byte.) So you might as well do that.

cc65 has a whole page on how to deal with the fact that it's a single-pass compiler with hardly any optimization [1]. It's almost more like a macro assembler than an actual compiler.


Then again, I managed to compile the code for the crypto hash Blake2s with it, producing code that validated all of the (few) test vectors I threw at it, so whatever expectations one might have on "actual compilers", for me, cc65 does fulfil a lot of them.

I don't think it does, and I don't think it's possible to make a nearly standards compliant C compiler that does. A struct is a contiguous piece of memory, which makes things like memcpy and malloc generally possible.

I feel like handling ROM banking would be the most challenging part of coding in C. You have to swap portions of your program in and out constantly (unless you manage to fit it all in a single bank). It seems hard to abstract that efficiently from a high level language.

I heard there were C compilers for the Game Boy as well but as far I know they were never widely used.

Yeah, C generally assumes a 'linear' memory space, so memory banks don't really fit into it extremely well.

That said, with a little work you could likely coax the linker into doing a lot of the work for you. You would have to be a little careful though, as code in different banks could not effectively call each-other directly, even though the compiler would likely not yell at you. But you could just organize your code so that the code for each bank is in it's own section, and then tell the linker to link each of those sections at the address that it will be switched in at (Likely the same address for most of them). It might just barf at you, but a linker specifically for systems that do memory banking should have a way to express this.

My impression is that the banking was more often used for graphics rather than logic.

It's used for both. Take for instance Super Mario Bros 3, different banks holds different data for the various enemies, and the same banks also contains the subroutines of logic for those enemies.

That's why, even if you hacked so that Goomba's death behavior should point to the same as that of a Drybones, it would fail as the subroutine for that behavior isn't in the same bank as the Goomba.

This approach can be used for code and resources. For simpler games it is easier to just not bank the code, but for more resource intensive games you can end up banking code and resources.

It's just linker magic and tagging sections on everything.

They had the same features up until at least Xbox 360; they're just called 'overlays' on non execute from ROM systems. It doesn't really take an esoteric compiler or environment.

I believe that cc65 has a good linker that handles once set up for a platform/memory configuration. I don't know how efficient it is, but I bet that manually organizing code and data to minimize cross-page access will be more efficient.

On old (16-bit) x86 you used "far" pointers, which were larger for cross segment access. These were larger and I assume slower since they encoded both the within-segment address and the segment number.

C programmers used to handle banks of EMS RAM, on the PC. Ditto with program overlays. I'd assume that NES ROM banks could be handled similarly.

Actually it was mostly Assembly code being called from C and Pascal, or via language extensions like virtual registers.

I've been reading the code for Ultima Underworld for a while. A lot of that is written in C, compiled in an early Borland C++ compiler. The C lib covers a few things; it's got some functions for manipulating EMS, and supports some fancy automatic overlay handling. So certainly the functions in the C library are written in assembly, wrapping calls to int67 and such.

And admittedly, it can be hard to see where the hand-coded assembly ends and the compiled C code begins. There are occasional stylistic clues, but I'm generally spending more time figuring out what something does and why it's written than I am which language each piece was coded in. Most of the file and memory handling seems to be done through calls to the C library in that game.

Thanks, TIL about memory banking!

> IMHO a 6502 is too limited to be effectively programmed in C

I've had some thoughts on this before and I think it boils down to the requirements. Beyond the obvious (the speed and memory requirements for your application):

If a 256 byte separate data stack is large enough, and particularly if you can put that all in zero page you can save many cycles. Then you can use the X index register as a stack pointer, benefiting from shorter and faster zero page indexed addressing for stack-relative loading and storing and using values on the stack directly for indirection using the indirect addressing modes without first copying stuff to zero page.

That's not to say that Z80 isn't a much better target for a C compiler. Among 16-bit 6800-likes, 6809 doesn't seem half bad, though.

From that page, "use ++g instead of g++ (it’s faster)"

I've seen people say this a repeatedly when talking about "embedded" environments, but never understood why - won't it end up compiling to the same thing either way?

Depends on the compiler.

'a++' (postfix) and '++a' (prefix) have slightly different meanings in terms of return values. Postfix will return the value of 'a' before the increment, while prefix will return 'a' after the increment operation.

In a simple compiler, you would store the previous value of 'a' in the case of postfix in case you need to assign it. This results in a superfluous store instruction in most cases.

Of course, a smarter compiler could see the lack of assignment and throw away the store instruction during an optimization pass.

I’ve used crappy compilers where there’s essentially no optimisation, and moderately complex expressions produce absurd code, so if you want reasonable output, you end up writing the assembly you want in the notation of the higher-level language, like:

    /* a = (b * 2) + 1 */

    a = b;    /* mov a, b */
    a <<= 1;  /* shl a, 1 */
    a |= 1;   /* or a, 1 */

TI used to have an Assembler that was exactly like that, don't recall for which processor though.

In a moderately sophisticated compiler, yes they should be the same thing.

The difference in the two is that the latter has to produce the original value, presumably by keeping it in a temporary variable. A compiler which looks at all the code in a function when optimising, rather than just blindly emitting code for each individual operation, will see that the value is not needed, and delete the temporary.

I guess cc65 is a bit more basic. Compilers for more unusual embedded environments are often a bit more basic, hence the general advice for the embedded context.

I think you could probably fix the compiler for this trivial case in the time it takes you to write down the advice though.

I'd argue you regardless you should write what you mean

If you mean "increment g" then write "++g"

If you mean "temp = g; increment g; return g" then write "g++"

It doesn't really matter that the compiler may optimize out the "temp". If you're writing "g++" you're indicating you want the value before it's been incremented.

If you're not using the value you can't “want it”. There is only one sane way to interpret the following two expressions statements (in C), and it is that they are equal.


Well, no, you can do stupid code like:

if(g++) {

//do something


Which obviously will do something if g was non-zero before incrementing it by one. It's a really poor way of writing code, but I have seen it done.

They said 'if you're not using the value', and you've presented an example where they are using the value, so you're criticism doesn't apply in the case they're talking about.

First of all, I don't agree that code like that is stupid regardless of circumstances. Any programmer worth his salt understands that condition perfectly well without having to think about it.

However, it's an altogether different situation to the one I wrote about, given that the value of the incrementing expression is specifically used in that case.

I think in C the temporary will be optimized away.

I keep meaning to test this in C++. I presume that it would for fundamental types, but wouldn't be able to be objects, because the side effects of that operation could be in another module. I use ++i for C++ iterators for that reason, and the habit tends to spill over to C.

> IMHO a 6502 is too limited to be effectively programmed in C; even this part of the article gives all the limitations: https://nesdoug.com/2015/11/15/2-how-cc65-works/

In my not so humble opinion, almost all of those are limitations of a compiler, not of the CPU. Then yes, on a 8-bit CPU, you'd better avoid larger words unless you really have to or it will be slow.

Even a 8080 would be problematic, given near and far segments and the memory tricks we used to do to fit everything into the available memory.

Then it also depends on the IO and video architecture. The PC ones were a bit of a pain.

A 8080 has a linear 64K address space, like the 8085 and the Z80. Perhaps you were thinking of 8086?

In any case, 64K address space is not the real limiter with a 6502 --- it's other things, like the tiny number of registers, relative shortage of 16-bit addressing modes, and fixed(!) 256-byte stack which make it difficult to program in C. Even the 6800, with its 16-bit index and stack pointer registers, would be easier.

Yeah, I thought 8086 also had segments, did not bother to cross-check it.

Oops, another typo, I meant 8080.

I mixed it up with 8086.

Also AVR is nice for C programming, it even have GCC support.

Forth then ?

6502 FIG Forth kernel: https://pastie.se/3ec4854e

How did they actually make NES games? By that I mean, what types of computers were they using for creating NES games? Other 6502-based computers? Could they run the NES games on there? Or did they have to burn to a cart to test things every time?

How did they design graphics? Was it basically graph paper, which then they translated into sprites by hand?

Not a NES game, but HAL Laboratory used a Twin Famicom-based system to develop Kirby's Dreamland (GB) using just a trackball as an input device: http://sourcegaming.info/2017/04/19/kirbys-development-secre...

Here you can see some developer pictures. The computers are HP 64000. The light pen/sprite program looks very interesting.


Here's a neat video, where Miyamoto describes some of the processes. You can see near the beginning that the sprites were hand drawn on graph paper and translated onto the computer. https://www.youtube.com/watch?v=DLoRd6_a1CI

Alllllll that tooling had to be built in-house. Sprite editors, level editors, debug and test tools. Doing it well requires multiple people working across multiple games.

I guess smaller studios were left having to roll their own stuff, which set them at an even further disadvantage to the Capcoms and Konamis of the world.

IIRC an NES dev kit was mostly just the hardware and a manual, not even an assembler

> By that I mean, what types of computers were they using for creating NES games? Other 6502-based computers?

At the place I worked in the early '80s [1], which did some games for Mattel for a couple 6502-based systems (Atari 2600 and Commodore VIC-20), we used small PDP-11s running some DEC operating system (I don't remember which one), using in-house written 6502 cross development tools. Each developer had a PDP-11.

Same setup for developing for game systems we worked with that used processors other than the 6502, such as the GI 1610 that was used in the Mattel Intellivision and the next generation successor to the Intellivision we were designing for Mattel.

We're talking old school here--as in you started your work day by toggling in a short boot loader via the front panel switches to get your computer going.

[1] https://en.wikipedia.org/wiki/APh_Technological_Consulting

As I recall, Nintendo (eventually) had an official NES devkit which had a special ram cartridge and was linked to a pc running ms dos with an assembler and other programs to make graphics and sound, but there were NES devkits from third parties as well.

I might be wrong but I seem to remember the Sharp X68000 being a preferred platform in the early days?


That has a release date years after the NES and seems to have been similar to and used as a development platform for 68k-based arcades. It's completely different hardware, much more powerful than a NES but too slow to sensibly emulate one.

A few years ago, some colleagues and I wrote a NES "demake" of Splatoon, in C, over the course of ~2 days. https://github.com/SplatooD/splatood

I definitely think our ability to do interesting things quickly (in terms of runtime) was hampered by the use of C over assembly, but it did allow us to get a functioning game done in a very short period of time.

Wanting to make games for the NES over 30 years ago was the reason I became interested in programming in the first place . If the author is lurking, thank you for this!

Yeah, making an NES game is on my bucket list. I'm very new at (complex) programming, but I think I might just do it in assembly. If you're going to run a marathon, why not actually do a triathlon?

In reality most of the complexity lies in how "close to the metal" it is, not so much on the language or the programming techniques per se. Basically, to program for an old console with limited hardware like the NES, you have to understand how the hardware works at a relatively low level to do something with it. There's no "drawSprite" function that you call and be done with, but rather you have to understand what the video processor can do, how do your feed data to it, how you compose a sprite in a layer, etc etc.

So you learn how the different pieces of the hardware work, how do you instruct them to do things, what are their limitations and so on. And once that clicks, it's not particulary more complex than writing modern code. Like programming today, it's less about typing and more about thinking about how all the dots connect together towards doing what you want to do. You do end up with a lot more code that does a lot less, since there are no abstractions and you do everything by hand. But otherwise it's something that anyone with aptitude for programming can pick up.

Might be easier (but just as much fun) to whet your palate by creating a pico-8 game first: https://www.lexaloffle.com/pico-8.php

I'd beg to differ. Interpreted lua on a 4GHz pc with 4G of ram is not like compiled C on a 8bit micro. Avr dev or arduino could be arguably closer.

You should do a little more research into pico-8. Although its true that they aren't the same pico-8 has it's own set of artificial arbitrary execution cost and memory limitations that makes it a lot different than if you were using Love2D or LuaJIT embedded in an SDL2 app. Many things that should be trivial on a PC with effectively infinite resources are hard or impossible to do at frame rate on Pico 8.

The fantasy console that Pico 8 presents is still way more powerful in most ways than an NES though.

remember that pico-8 is a very limited fantasy console and only has:

32KB cartridges


a 1MB RAM limitation for the lua VM

I'd start with https://en.m.wikipedia.org/wiki/CHIP-8 first. It won't take too long, and you'll be much better prepared for nes.

I've built some pretty user-friendly development tools targeting Chip8:


Your marathon/triathlon comparison, as someone who has done them, makes little sense.

You can also use Python: https://gutomaia.net/pyNES/

I'm going to reboot megaman. Again.


> All NES assembers are command line programs. What does that mean? It has no graphic user interface.

Who the hell is this written for?

I find a lot of these homebrew and ROM hacking documents are written for an enthusiast that might not necessarily have a background in software development.

At first I found it a little disorienting because the technical level of the writing would fluctuate, but at the same time I can appreciate it being accessible for someone who might not know much about programming.

There's also an assumption with most HN users (who have a heavy web dev or unix background) that commandline is required in order to get any sort of development done. This isn't really the case if you have more of a game development background, where people are often using Windows and GUIs to get most work done. Someone coming to this tutorial that is used to Unity and Visual Studio could be technical, have solid programming abilities, but rarely, if ever touch the command line. I've worked with a lot of people like this, so it's not a theoretical thing.

Agreed! Visual Studio has such a prominent place in game development, likely for its ubiquity and ease-of-debugging.

I used to dabble a lot in homebrew programming for PSP(playstation portable) and many people would be shocked how much of the entire toolchain was made by people with little if any formal programming education - yet it all worked and produced valid executables. I suspect a lot of it was done by other 15-year old teenagers experimenting with low level programming for the first time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact