A Great Old-Timey Game-Programming Hack (2013)

dx211 · on Nov 3, 2015

Way back in the day (mid high-school), I wrote a Tetris game in Pascal and Z80 assembly for the TRS-80. At some point I realized that since what was displayed on the screen was just a representation of an area of memory, I could keep track of the "cup" and all of the pieces that had already fallen into it directly in video memory instead of variables. I also optimized the code for drawing the "cup" so all three sides of it would get rendered in one shot in a single loop. The first time I ran it, a bug caused half of the bottom not to get drawn, and the first Tetris piece dropped through the hole and sliced straight into whatever vital memory came after the video area, immediately crashing the computer.

PeterisP · on Nov 3, 2015

Reminds me of the concept of 'compiled bitmaps' - where instead of having a function that reads an arbitrary image and writes it to screen buffer (as the one in this article), you'd have a function that writes a single specific hardcoded image to screen buffer, which could be done in less cycles than if you'd need to read the data from somewhere.

The machine code for such functions can be generated from any image in a straightforward way with all kinds of useful optimizations - i.e. simply skipping transparent pixels without doing anything instead of checking transparency every single time, but require some space tradeoff, as the code was noticeably larger than the source image.

T-hawk · on Nov 3, 2015

So the core of this trick is to point the stack pointer into video memory, because pull/push instructions can deliver faster throughput than traditional load/stores.

The Atari 2600 has hardware designed to take advantage of such a trick. It has some 1-bit graphical objects that are turned on or off by writing to a certain register. The hot bit of that register is located at bit 1.

Why that, why not the more obvious low or high bit? Because bit 1 corresponds to the location of the 6502's zero flag in the flags register. So you can correctly turn the object on or off with literally just a compare on its coordinate then a push-flags instruction. You'll end up pushing a 1 or 0 to enable or disable the object. No branches or conditionals, and the code is even time-invariant, a helpful property in the tightly cycle-counted world of this machine.

Finally, the video chip registers are laid out with several such objects in succession, so you can continue pushing into several of them without resetting the stack pointer. (This machine has no interrupts, so hijacking the stack pointer into video memory is perfectly safe as long as you never issue a call or other push-pop instruction.)

The machine also aligns some of its read registers the same way. The joystick button is mapped to bit 7 of its input register, so you can read and then immediately branch on the sign flag without a compare instruction.

dang · on Nov 3, 2015

Discussed at the time: https://news.ycombinator.com/item?id=6913467.

kilport · on Nov 3, 2015

I never click on articles with titles like "classic game programming hack", because it's invariably another discovery of that quake inverse square root. The old chestnut that hypnotizes anyone who has never read HAKMEM or "Hackers Delight".

Happy to be wrong for once ;-)

iconjack · on Nov 3, 2015

And now we know why the Windows bitmap format stores the scanlines from bottom to top.

paraknight · on Nov 4, 2015

Terry Davis of TempleOS fame commented on this!

guard-of-terra · on Nov 3, 2015

Were they the first people who were making a platformer and had to redraw screen thast way? If not, why not start with whatever other guys did?

PeterisP · on Nov 4, 2015

1. The specific approaches required were different for each particular CPU, and that time was very heterogenous with lots of very different machines in use. For example, the article involves two people; the developed optimization is intended for one of them but would not even work on the author's own slightly older CPU.

2. Information exchange was much less efficient than now. They literally would have no way of knowing what the other guys did unless each new developer was ready do reverse engineer their code themselves, which requires a significant investment of time that most likely exceeds the time they would be investing on the actual platformer. Nowadays a single person can reverse engineer the technique and everyone else can just use it, but then it simply wouldn't be published and you're on your own.

The tools available were crude; disassembly or easy finding of the 'hot' inner loops just wasn't available, and simply attaching a debugger would often result in the system not working as the game code would often rely on specific cycle timing, interrupts not happening (e.g. the article mention of any interrupts corrupting their data due to abuse of stack pointer), etc. Reverse engineering could and did happen, but it's not that feasible to two boys in the basement at the time.

Furthermore, 'the pros' at the time would deploy their code in firmware (arcade machines or cartridges) where it's even less accessible to hobbyists of the time. The only actual way of spreading such knowledge was by some of those early professionals publishing books, that among basic techniques also listed the optimization tips and tricks that they knew.

blt · on Nov 3, 2015

Because there was no Stack Overflow in the 80s, and game programmers usually enjoy the challenge of making stuff faster.

guard-of-terra · on Nov 3, 2015

You can just take apart other games already out there.

Finding loop that takes 80% cpu should be simple enough and it can't be big.

You might spend weeks making stuff faster and arrive to inferior or wrong solution. Better ship it weeks earlier and enjoy lying on a beach in the sun.