Hacker News new | past | comments | ask | show | jobs | submit login
A Great Old-Timey Game-Programming Hack (moertel.com)
571 points by acqq on Dec 16, 2013 | hide | past | web | favorite | 143 comments

This reminded me of a game programming hack I did back in highschool. I had just started a school course on Pascal and decided to code a small game of snake, just for fun. I knew very little about actual programming, I was a real novice. The game was very simple, it was running in a windows console (cmd) without any graphics, the actual assets were ASCII art. The grid of the game was represented with asterisks and the snake was dots with a smiley face (one of those weird ASCII symbols nobody knows why it's there). Every game update I would redraw the whole grid, snake and the comma that was used to output the food.

The problem was that this was terribly slow, it flickered like crazy and it was unplayable. I was very sad because my game was working but unplayable for anybody so I tried to engineer a way to make it stop flickering. The solution came when I found out about a couple of functions in pascal that let you clear a specific character in the console at a specific X,Y coordinate and write another character that that coordinate. What I ended up doing was keep track of all the changes in the game for each frame (snake movements, food position) and just re-draw only the portions of screen that had changed.

This was great, no more flickering and the game was playable. (Nobody really played it because nobody cared but I was really proud of it).

Found out years later that this approach is pretty much what Carmack did in his old games: Adaptive Tile Refresh[1]


If you're curious why the smiley faces and such were in MS-DOS extended ASCII:

Bill Gates: "... We were also fascinated by dedicated word processors from Wang, because we believed that general-purpose machines could do that just as well. That's why, when it came time to design the keyboard for the IBM PC, we put the funny Wang character set into the machine—you know, smiley faces and boxes and triangles and stuff. We were thinking we'd like to do a clone of Wang word-processing software someday."[1]

[1] https://en.wikipedia.org/wiki/Code_page_437

That's really interesting! Thanks!

I made a similar game for a TI calculator where the issue I was having was a lack of memory for the game state. Eventually I found that you could read the graphics memory, so I stopped saving any of the game state at all: I'd tell if there was an obstacle by just reading the graphics memory and seeing if the according pixel was already black. Now system dialogs like the calculator's "reminders" could draw their dialogues to the screen and become obstacles themselves. Now I couldn't tell food from obstacles, so I just got rid of the food idea and now you grow infinitely, and the randomly dropping obstacles XOR their targets to potentially free up walls to pass through.

Like it sounds, the game went through a lot of changes like this in structure through the course of development, where the nature of the game had entirely changed. I had a number of friends with my model of calculator, which was not the more-common TI-86 at my school, so I was the only, err, game in town. Because it was in-effect several games over its lifetime, and different people wanted different versions of it, I learnt a lot about maintainability and version management. "You have the six-month-old version of this with food, and it has a bug that I fixed five months ago in a different game. And the fix doesn't apply here". Or "I solved the slowness you're seeing in this case by removing 3 instructions from inside of this loop here, but that cut the key response time in half, is that alright?"

It was a major learning experience for me. My calculator is long-dead and I don't have a version of the game anymore, but I bet that I could still remember how to make changes to it if I saw the code again.

I remember writing a naive recursive bruteforce Sudoku solver in TI-BASIC on a TI-82-stats in high school. The "recursive" part was non-trivial, because, though a sub could call itself, there were no local variables, only global variables shared between all the subs. Fortunately, there were global lists that you could use, so I ended up using a global list as the stack, along with a global variable to maintain the nesting depth between the recursive instances and serve as some sort of pointer to the current stack position. It worked; it was so slow it took about one second per call, so you could really see it explore the tree, but it usually managed to solve Sudoku grids in a few hours.

I also recall writing minesweeper (this time on a TI voyage 200), also in TI-basic. The main difficulty, of course, was that when you hit an empty square you have to recursively uncover all adjacent empty squares. So I wrote this recursively, but on this TI there were local variables to recursive calls which also meant that the calculator would sometimes run out of memory when doing its recursive calls (at vaguely unpredictable intervals). Fortunately, I discovered that this could be caught with the rudimentary exception-handling mechanism provided, so I would use this to explore not everything but a small radius around the selected position. The puzzling thing, however, was that the radius was uneven, and got smaller and smaller (as if some memory was being leaked by the interpreter), and more surprisingly, in the graphics view that I used, whenever the exception handling would trigger the TI would display a small warning "differentiating an equation may yield a false equation". To this day I have no idea why.

It all sounds silly now but I learnt a lot typing naive code for a slow, buggy, and limited interpreted, on a cramped calculator keyboard, during boring classes. If smartphones had existed back then, I wonder what it would have changed.

What you describe is also similar to a bitmapped graphics technique called dirty rectangles.

It's also a specific instance of a more general technique of incremental computation, which is, if you look at it from the right perspective, a more specific instance of integration: rather than computing a function directly, you're computing the differences and then integrating. So you could almost say that the method dates back to the 18th century: http://en.wikipedia.org/wiki/Euler%27s_method.

It's definitely one of those tricks that comes in handy now and again :).

Another way to avoid the flickering was to wait for the screen refresh interrupt before writing to the video buffer, but it did slow things down if you weren't careful.

The flickering most likely was the result of the most naïve way of redrawing the screen:

This redraws twice, by first clearing and then overwriting again. The flickering should go away completely if you actually redraw the screen, e.g.

    gotoxy(1, 1);
At least in the things I wrote in Turbo Pascal back in the day this was enough to fix performance problems from drawing too. gotoxy() calls cost time too and simply writing a large bunch of characters in one go is often fast enough. Of course, writing directly to video memory is by far the fastest approach.

The last time I had fun with this was with FreePascal in my first semester with a friend. We wrote a text-mode UI library for one assignment (it wasn't strictly necessary, but it was a fun learning experience); it had a message loop that handled focus and dispatched input to controls, full-screen "windows", sub-screen dialogs, and buttons, text input boxes, combo boxes, and tabular view. We used gotoxy() and write() calls throughout, which had the benefit that we could test it on my friend's old VT-100 terminal as well (it was slow as hell, though, but you could literally see where to optimize draw calls, similar to using X remotely over a slow network where you can watch it draw each menu half a dozen times), but planned on moving to a more optimized strategy. Alas, we never really had the time for a v2 rewrite so the project just went to die. And by now it's probably of no use to anyone anyway (but it taught me a lot about how windowing systems work or at least can work).

I've done similar game, had same problem, then I found out that you could write directly to the text buffer memory in pascal - the address was B800:0001 if I remember correctly? Every even byte was ASCI code, and odd bytes were color numbers. It was fast enough that I could redraw the whole screen each frame no problem (and I could do nice effects like changing the colors of the screen without changing the content).

acqq on Dec 16, 2013 [flagged]

I'm sorry, but believing you did in the highschool what Carmack managed to achieve on the PC in his early games is delusional.

What Carmack achieved (by using the hardware features of the graphic cards of that day) is properly described in the link you give: smooth side scrolling on the PC. The kind of scrolling seen on the consoles of that day, which had a hardware support, or in the famous http://en.wikipedia.org/wiki/Super_Mario_Bros (a copy of which is the unreleased "Dangerous Dave in Copyright Infringement" made by Carmack). That has nothing to do with drawing only the changed characters with simple print at the given screen coordinates, what you did.

To achieve that, it used exactly hardware feature of changing the origin of the area in memory that is going to be picked up by the graphic chip to be presented on the screen. That changing was made in one pixel increments. Without it, no smooth scrolling. Second, he redraw the tiles, he didn't print the characters.

So no, that is absolutely not "pretty much what Carmack did in his old games." What you did at the end was done by absolutely every game at these times (using the commands of cursor movements to position the characters then printing them, in Turbo Pascal using gotoXY from the console library), what you started with (reprinting the whole screen just because) was simply as "pessimized" approach as it can be made at all. You didn't even optimize anything, you did what everybody did, you just started with the "pessimization." I'm old enough to remember. There is also at least somebody here who can provide a link to any computer magazine which printed such games in a few lines of code (at that time sources were made small enough to be printed in the magazines).

For the reference, at least the terminals in 1973 already had "Direct Cursor Addressing" -- the technology behind the gotoxy you used:


(It was already standardized anyway!) That terminal had 1440 characters, all directly addressable, and the maximum communication speed was 2400 bps which gives 300 characters per second or almost 5 seconds to push the whole screen to it. If you moved for example 10 characters, each with a 5 byte sequence, you needed only 50 bytes, so you can change their position 6 times per second. Everybody at that time knew that stuff. It was slow enough to see.

I'm not sure where all the bitterness comes from but obviously I'm not comparing an amateurish attempt at optimizing a terminal-based snake game written in Pascal to the genius of Carmack, those are two entirely different things.

And yes, I did that while I was in highschool, which is still years after Carmack did his deed. It was around uh... 2005?

I thought I'd just share an interesting "hack" I did when I was a teenager which, to me, felt like a real genius trick to smooth gameplay. Back then my code was literally a lot of messy spaghetti goto hell, so you can understand how much of a novice I was.

I recommend re-reading what I said, I never claimed I "pretty much" did what Carmack did, just that the idea behind incremental upgrades was "pretty much" similar to Carmack's genius. No more, no less.

Your words "this approach is pretty much what Carmack did in his old games."

Let me rephrase in less words, it is absolutely not pretty much what Carmack did. You just "rediscovered" cursor positioning feature introduced eons ago on the character-based terminals. Carmack did something else and with the bit-mapped screens using the specifics of the graphics adapters of that time. The tracking you try to refer to he did on the scrollable pixel addresses, you just did simple character prints for $deity sake!

For the explanation of my opinion, see my previous post.

From wikipedia[1]: "The technique is named for its other aspect, the tracking of moved graphical elements in order to minimize the amount of redrawing required in every frame."

That's all I did, really. Not pretending I did anything more complex than that, I didn't even have access to a framebuffer or anything... Let's say the "principle" was the same, the delivery far from it.


So according to your logic, as you've read in your school book and then wrote in your notebook a^2 + b^2 = c^2 that is "the same principle" what Einstein did with discovering E = m c ^2 since it has ^2 and the letters. Impressive.

Your school book is the equivalent to the Turbo Pascal on-line help (or TP book if you used TP 3). The similarity of the formulas is the similarity of your implementation and Carmack's "being first." Having your post the top of HN discussion of one real hack makes me deeply sad as it obviously reflects there are some readers who support your view of "the same principle." Or just don't understand, in my view.

You seriously should to leave the computer, come back tomorrow, and then reread these posts. You are trying to stomp on someone's fond memory of a simple problem that they solved in a way that was novel to them. They clearly don't think that they are deserving of a Turing award, they were just young and felt some pride.

Not really arguments. Otherwise, I also do here "the same principle" as Platon and Aristotle as I discuss anything with you. And I'll have the fond memories of this. Don't stomp on them.

My claim is that anybody who thinks moving the text cursor is "the same principle" as Carmack's one pixel at the time smooth scrolling doesn't have elementary understanding of both. And I explained the difference.

I have nothing against morgawr writing about his discovering cursor positioning. I have against supporting him equating this with something completely different. If a five year old goes out of the go cart and says "ma I can ride the plane" nobody cares. If he's 20 something and claims the same after the same "feat" on the pilot forum, some pilot can try to correct him, even if it's the lost time. Even if he also knows that there's the famous "don't argue with .... People might not know the difference."

I still think it was worth writing these comments.

Not really. Dirty rects are dirty rects! And anybody familiar with coding for crappy systems that sported hardware scrolling would use the same technique, because your options are limited when the screen moves due to base address changing and yet you want in-memory sprites (i.e., bits copied into screen memory by software or hardware, rather than images drawn automatically at a particular position during the scan) to move independently while drawing nicely on top of the background. And for full-screen scrolling games it is usual for at least the player sprite to stay centred. BBC Micro examples from the 1980s would include JCB Digger, Uridium and Firetrack. I bet Atari ST and Amiga games did the same thing too; the principle extends obviously to multi buffering.

The cunning part of the Carmack games was presumably widening the screen - I don't think I ever saw this done on the BBC. But as a schoolboy in 1989/1990 I'd experimented with reducing the BBC's screen height by one character row (by tweaking R6) to avoid flicker when scrolling vertically. Same principle. It's a fairly obvious thing to do, once you've figured out what's going on.

I think one can come up with better evidence for John Carmack's uniquely amazing skills.

My own understanding was that it was just that particular configuration of the EGA hardware that hadn't been tried before Carmack's implementation.

This is someone who found out from Romero exactly what the Keen games did: http://www.vintage-computer.com/vcforum/showthread.php?30619...

You discuss Carmack's pixel scrolling and miss the fact that this guy here who claims he did "the same" just had to draw one character of the snake's head at the new position, overwriting the previous head with the "body" character. And the tail character with the "empty" character unless the snake eats something.

What in the world has that to do with Carmack's algorithm of updating Mario after one pixel hardware scroll (after which he has to redraw the "unmoving" part fully) and adding more background once he spends the hardware moves?

Same principle, as I see it, whether it's writing 1 byte, calling gotoxy/putch, or copying a whole bunch of bytes. Dirty rects! The insight is the same: you draw the things that have changed, and leave alone the things that stay the same.

Christ, you sound like such an insufferable prick

Is there anything that I wrote that you can show it is not true? I think I've given very specific arguments.

""" Walter: Am I wrong? The Dude: No you're not wrong. Walter: Am I wrong? The Dude: You're not wrong Walter. You're just an asshole. """

I know we try to discourage random quotes and so on here, but that one was pretty much perfect.

yeah, except he didn't pull a gun on someone, he pointed out that the example was overly flattering for the author. (which it was)

Honestly, do you understand the technical aspects discussed here? Can you contribute on that level to the discussion?

Nobody said you were technically wrong. They said you are being an asshole. Do you understand the interpersonal dynamics here? Can you comment on that level?

I'm really sad I don't have your e-mail or the possibility to send you personal messages to demonstrate you how hurtful it is when you are being named the names. But I will not fail to your level to do it here in public. And I still don't see that you understood at all the topic which I discussed. Stay classy my friend, I won't answer you here anything more no matter what you write. The same goes to the other name namers.

Think about all the personal and professional feedback you've received over your entire life. Have you ever been told you're too curt, abrasive, or aggressive? Has anyone told you that what you say is fine, but how you say it alienates or frustrates people? Do you feel like people avoid interacting with you or generally bring up issues (especially personal issues) with other people well before they bring them up with you? Do you feel like you're constantly being misunderstood or that people are taking what you're saying the wrong way? Do you often find yourself rationalizing the impact your words have by telling yourself other people are too sensitive, too stupid, or too deluded to hear what you're saying?

Because right now, that's what's happening. And unless this is a particularly off day, I'd be astonished if you haven't had this reflected to you before by friends, family, and/or co-workers.

You're 100% right on the technical merits. Everybody here understands what you're saying and agrees with you — nobody cares. It seems like you're going out of your way to humiliate the original commenter for no good reason, e.g., by calling him delusional, even after he admitted that he understood what he did in high school in no way rose to the level of what Carmack did.

The OC was describing the sensation vindication and pride he felt when he figured something out on his own and later learned that a more advanced version of same idea already had a name. This is a feeling most programmers can relate to, I think. I only learned what a "trie" was, for example, years after I implemented a shitty version on my own.

Whatever your intentions are, you have to understand the impact your words have. And in this thread, at least, it's not pretty.

The worst thing about this exchange is that you seem to honestly, actually not understand why people are having problems with you. You don't need to respond to this, but if you really totally don't understand why people are reacting badly to you here, it would probably be good for you to try reading some basic interpersonal self-help stuff like "How to Win Friends and Influence People", or to ask someone you know and trust whether they think you have an excessively abrasive/confrontational style.

Or maybe you're just having a bad day, in which case, hope it improves, chill, try not to get caught up in arguments when you're already pissed off (I know I do that and come off as a jerk sometimes).

I haven't, and won't, 'name you names'. You could have given a nice technical explanation of what Carmack did without having to call the person 'delusional', saying they started with 'pessimisation' then further going on to belittle them by saying you'll 'explain with less words'. Even after they go further and say their programming was a goto spaghetti hell made in 2005 as he was learning to program, you keep trying to brow beat him with your decades of knowledge and experience.

Why are you angry at him? I thought the main point of the fine article wasn't just about the technical details, but also the fact that they wouldn't give up. Would you rather that he had seen his flickering game and just say 'Aw screw it, programming is crappy!' and walk away?

Or do you want a signed and printed statement of your 'absolute' rightness, a full retraction from him never to compare himself with Carmack again and pg to come into this thread and strike down his comment from the top and relegate it to the bottom of the page until he spends years learning what people knew 30 years ago?

Most of the people here understood what you were discussing. My first computer was a Sinclair ZX81, and I remember well those magazines you brought up.

But you were really rude the way you were discussing it, and quite frankly, you were calling him names, unless you think that delusional is a nice way to refer to someone.

And yes, my email is in my profile if you do want to email me privately and call me names.

No you definitely are being an asshole, it is you who needs to stay classy, maybe get a fresh nights sleep wake up with some nice coffee and review your statements.

Expletives say the most about those who use them, not much about anything else.

Go to bed dude. No one is on your side here.

When the entire world thinks you are the asshole, it is worth stopping and considering the possibility that they may be right.

Nothing you said is factually wrong but you're being obnoxiously pedantic and rude. OP was merely commenting on the similarities between his and John Carmack's approach to solving the problem and is in no way deluding himself by comparing the two in terms of complexity. It's absolutely possible to write a naive solution to a small problem and still take a similar approach to a more complex one

Of course it's not exactly the same. But the approach was similar. They each believed that more could be done to bring their creations closer to perfection if they understood the underlying technologies better. Commenter took that to one level, and Carmack took that to the extreme. But both relied on tech abstractions that effectively had a multiple levels of proficiency.

>The challenge wasn't overwhelming complexity, as it is today. The challenge was cramming your ideas into machines so slow, so limited that most ideas didn't fit.

I like this line right here. It does seem like we've piled on abstraction after abstraction in these days. Sure this does make things easier, but I think things have gotten so complex that it's much harder to have a complete mental model of what your code is actually doing than in the simpler machines of the past.

There's totally a parallel between this and the video game music of Mario, Zelda and Final Fantasy- they could only play 2-3 notes at once, so they had to work with those constraints. The music couldn't be too complex, so every note had to go really, really far. The result? Extremely memorable tunes with catchy melodies. It was a matter of necessity.

There's a lot to be said about creativity coming from being forced to deal with constraints. It's counterintuitive, but we get most juiced about solving problems when they're tough. It's hardest to start writing when it's blank piece of paper.

Here's one story on it: http://plumecopy.com/bach-picasso-on-the-creative-process/

When people think of "complexity" in regards to music, they often think of instrumentation, texture, production quality. But some of the most complex music ever written - the Well-Tempered Klavier - was written in mostly 2-4 voices for an instrument with no dynamics. I think the 2-3 note limit really forced chiptune composers to focus on compositional complexity, thus making that music stand out even today.

This is the reason why I love chiptunes. Just the right amount of complexity, in that you can still focus on the individual waveforms/voices.

The Mega Man games had amazing music too, but I think Metroid still stands out as one of the best examples of fantastic 8 bit music. Haunting!

Honestly I think a lot of this is nostalgia - I grew up with the c64, never had a NES. I enjoy chiptunes but none of the NES stuff really does much for me. I think you have to have had this stuff embedded into your brain by repeated play. I've listened to NES music and haven't had an urge to listen to it enough to learn and hum it, but I could totally hum the eerie opening theme to Parallax, the driving pseudo-Jarre of The Last V8, the electro-rock of Oxxonian, and more.

I'm not entirely sure it's all nostalgia; the c64's sound chip was a bit more flexible than the NES's, and there was a real culture of musicians on the c64, so I may have just gotten used to a higher level of musicianship as a base.


And I'd say today there's a tendency to overabstract and overdesign stuff.

In today's world of "everything fits", some modern developers don't try to "bottom align" their code, but rather "top align". No, a hello world app doesn't need an XML config file.

Of course, not being constrained by machine limitation anymore (unless you're doing video/simulations/games etc) is a blessing

I'm more a fan of 'seat of your pants' programming but to be fair, if abstractions are good they are meant to be a solid bedrock upon which to build a higher level mental model. For example, assembly is one such bedrock - you don't need to understand microcode or transistors to get the most out of it. C is arguably another, Java arguably another (understanding the JVM is nice but not essential).

It's only due to the deficiencies within certain abstractions that it frequently becomes necessary to subordinate and go lower down the chain.

But the whole idea of abstraction is that you don't have to have a mental model of the whole code — just the chunk you're working on right now.

If and only if the abstraction isn't leaky. Unfortunately, non-leaky abstraction layers are exceedingly rare, to the point that I now believe most software engineers don't even aim at trying to avoid leakage in their designs.

If every abstraction were perfect, sure. Of course, they never are, and you will always, at some point, need to understand what your code is actually doing in order to solve a problem.

This is why I love microcontroller programming (but not Arduino, which loses the magic to "high level"). I wrote some PIC24 code where I needed some very fast deterministic interrupt handler[1], so I dropped down to assembly and cycle-counted my way to success. It was nothing nearly as fancy as the article describes. It was really just the ability to accurately count cycles.

[1] I was basically sharing buffers between SPI send/recv interrupts and the code that used them, so I needed a way to lock buffers. I later replaced the code with much simpler, much slower synchronous code.. :-/

I thought exactly the same thing. Read the whole article in my lunch break after getting po'ed at some stupidly complex PHP code written in a Java idiom spread across 5+ files.

Loved the story! Mostly because I lived this stuff, and I'm 25 years old :p. In my Electronic Engineering graduation we had three professors crazy about assembly and slow PCs (in fact, FPGAs and microcontrollers). I remember the nights I spent awake trying to make a Viterbi Encoder/Decoder fit into a tiny FPGA, cramming a complex temperature controller (while reading sensors, commanding motors, and handling the input/output) in an 8051, or programming a 128khz sound recorder in assembly on an (old as hell) ARM, while communicating to a PC, showing info on a LCD and doing all the filtering digitally (the only analog stuff we were allowed to use were an anti-aliasing filter and the input/output conforming circuits). Ah, the crazy filters we devised to use all the old ARM's juice.

I lost myself there, but my main point is: in electronics (embedded systems, mainly) all this beautiful joy of crazy optimizations is still alive :D

You had some really outstanding teachers. Congratulations.

This is a really neat article. One thing: the author falls victim to a common, unfortunate mistake in calculating the percentage gains: ...120 cycles. That’s a 30-percent speed-up. and then ...98 cycles. Compared to the original code, that’s 60 percent faster.

The right way to calculate this figure is (t1 - t0)/t0, rather than the author's formula which seems to be (t1 - t0)/t1. For instance: (157 - 98)/98 = 60%, but the actual amount is (157 - 98)/157 = a 38% speed up. A heuristic: 60% of 157 will be much more than 60 (since 60% of 100 = 60), which means a 60% speed up would reduce the speed to below 97 cycles.

It gets even more misleading the more efficient it gets: Adding up the cycles, the total was just 1689. I had shaved almost 1200 cycles off of my friend’s code. That’s a 70 percent speed-up! The author has 1200/1689 = 71%, but the correct numbers yield 1200/(1689+1200) = 42%.

Not that I don't think these are significant gains, but it's just misleading to label them like this. If you've removed less than half the cycles, there's no way you've seen a 70% speed up.

Thanks for your feedback. (Author here.)

By my calculations (which I'll machine-check now with Maxima), the 30% and 70% speed-up claims are sound. First, let's fire up Maxima:

    $ maxima
    Maxima 5.30.0 http://maxima.sourceforge.net
    using Lisp SBCL 1.1.8-2.fc19
    Distributed under the GNU Public License. See the file COPYING.
    Dedicated to the memory of William Schelter.
    The function bug_report() provides bug reporting information.
Now let's calculate how much faster we made the row-copy code by unrolling its loop:

    (%i1) orig_loop_speed:     (1*row) / (157*cycle)$
    (%i2) unrolled_loop_speed: (1*row) / (120*cycle)$
    (%i3) unrolling_speedup:   unrolled_loop_speed / orig_loop_speed,  numer;
    (%o3)                          1.308333333333333
In other words, the unrolled-loop speed is 1.3 times the original-loop speed. That's 30% faster, right?

Likewise, how much faster did shaving those 1200 cycles make the tile-copy code?

    (%i4) copy2_speed:        (1*tile) / (2893*cycle)$
    (%i5) copy3_speed:        (1*tile) / (1689*cycle)$
    (%i6) copy3_vs_2_speedup: copy3_speed / copy2_speed,  numer;
    (%o6)                          1.712847838957963
I'm willing to believe that I've screwed up here, but I can't see where. Can you show me?

Thanks for your help!

Your only "mistake" was to switch to a different measure when giving the percentages. Your calculations are actually fine, because you explicitly label them "speedup".

However, this can confuse a non-thorough reader because your previous measures are not about "speed"; they are about "cost" (cycles). The parent just did not realize the switch, and thought that you were claiming to have reduced "cost" (cycles) by 30%/70%, which wouldn't be true.

Thanks for this insight! I never would have seen it had you not told me. (I'm an engineering guy and switching between time-per-unit and units-per-time is so automatic for me that I didn't realize it wasn't automatic for everybody.)

I've updated the story to explain the calculation the first time it happens.

Thanks again!

As a fellow Maxima user, thanks for showing me "expr, numer". I've always used "float(expr)" for that, but I like yours better.

I'm having trouble with your post. You said the right way to calculate it is (t1 - t0)/t0, but said later that the actual amount is (157 - 98)/157, which is (t1 - t0)/t1.

Also I believe the author is referring to speed improvements, whereas you are referring to time improvements. Let's say I take 200 operations to perform a task, then I optimize code to perform the same task in 100 operations. The program runs at 20 operations per second regardless of how many operations there are.

The task which took 10 seconds before will now take 5 seconds, or a 50% time reduction, which is how you calculate it. However, the program is now running twice as fast, because in 20 seconds, the program can run twice, which is how the author calculated it.

Both of you are correct, just depending on what your context is.

We did this in our Sinclair Spectrum games to blit the backbuffer to the display memory. Interrupts were not a problem because if they occured during the PUSH (display memory), the corruption would be overwritten immediately when the blit continued, and if they occured during the POP, the backbuffer was going to be overwritten in its entirety the next frame.

However, we had to leave some space at the edge of backbuffer memory, because if there's an interrupt right at the beginning of the blit, the interrupt handler's stack frame could overflow outside of the backbuffer and corrupt other memory. That one was fun to find. [Edit]: I seem to have missed the second footnote where he already describes this issue.

This is not unlike how 'fast' screen updates are done on the Apple IIGS. The fastest memory operations on the 6502 and 65816 involve the stack, so one ends up mapping the stack to the top of framebuffer RAM and pushing a lot of values onto it in an unrolled loop. The unrolled loop is itself rewritten by other code to provide the data for the next update.

Apple developer support themselves described this idea in Technote #70, http://www.1000bit.it/support/manuali/apple/technotes/iigs/t...

That's even faster than the push-pop sequence that this describes.

Reminds me fondly of the time I was writing assembler for the ARM2/3... it had such a nice instruction set that made hand-writing assembler pleasant.

It had a "barrel shifter" that gave you free shifts of powers of two, so you could calculate screen byte offsets quickly:

  // offset = x + y * 320
  ADD R0, R1, R2, LSL #8
  ADD R0, R0, R2, LSL #5
  // = 2 cycles
It also had bulk loads and stores that made reading/writing RAM cheaper. The trick there was to spill as many registers as you possibly could, so that you could transfer as many words as possible per bulk load/store.

  LDMIA R10!, {R0-R9}
  STMIA R11!, {R0-R9}
  // Transfers 40 bytes from memory pointed to by R10 to memory pointed to by R11,
  // And updates both pointers to the new addresses,
  // And only takes (3+10)*2 = 26 cycles to do the lot.
Happy days...

Here's how that would look in the instruction set we use here ($100M commercial product)

    ; ARM spells it
    ADD R0, R1, R2, LSL #8
    ; We spell it
    CLX R0           ; 1 cycles
    ADX R1           ; 1 
    ADX R2           ; 1
    LDX N, 8         ; 3
    RSX              ; N (const 8 in this case)
Try optimizing THAT!

If you ever feel like reliving a bit of your past the Game Boy Advance uses the ARMv3 instruction set. With an easily available flash cart you have a cool little portable system to develop on.

Nit: the GBA uses an ARM7TDMI that IIRC implements ARMv4T, which isn't a strict superset of ARMv3 (I think that's the first ARM that dropped the old crazy addressing). Still very fun to mess with.

Sorry about that! The ARM naming scheme confuses the hell out of me. I always mix up the processor generations and instruction sets.

ARM was my start with assembler too. But you appear to a have forgotten there are 5 more registers you can use for that blit ;-)

I can't remember how tight I made it... but I would think you need a couple spare: one for the number of iterations, and another for the screen stride. But yes, spilling registers like SP and LR is a useful trick if you need it.

Great story, thanks for this; it's a refreshing change from bitcoin and VC chatter.

I have been searching for at least 10 years for the term "involution": the set of functions where f(f(x)) = x. Now i have it. Thank you.

I don't recall which of the Atari cart game did this (might've been Combat) rather than using space for storing sound effects the game would refer to its own code in memory for a random noise sound effect.

So true that back in the day much of a game programmers mental effort was spent on how to make big ideas fit inside small memory, anemic color palettes, and slow processors.

Having just spent a good chunk of my weekend reliving my Commodore 64 assembly coding days, this was an excellent way to top it off!

I'm almost a bit disappointed I missed out on those days. Having to code like that probably would've taught me several useful lessons.

Well, you can still relive some of this by writing code to target an emulator or, more modern, an ARM chipset.

Also, the quality of tools nowadays is so much better than what you would've used back then. So much so, in fact, that these old machines could even become pretty useful as a learning tool since they're (relatively) simple and allow much easier testing and debugging since they can be used effectively as some kind of VM.

Arcade video game programmers of that age have told me warstories of themselves. BitBlits? That stuff is still handled by the BIOS / OS. The real arcade programmers would code at the level of scan-lines manually. (IIRC, Pacman was programmed at this level).

Every 30th of a second, the screen would have to be refreshed. Arcade programmers would perfectly tweak the loops of their assembly programs such that the screen refresh would happen at the right timing. As the CRT scanline would enter "blanks", they would use the borrowed time to process heavier elements of the game. (ie: AIs in Pacman). The heaviest processing would occur on a full-VSync, because you are given more time... as the CRT laser recalibrates from the bottom right corner to the top left corner.

Of course, other games would control the laser perfectly. Asteroids IIRC had extremely sharp graphics because the entire program was not written with "scanlines" as a concept, but instead manually drew every line on the screen by manipulating the CRT laser manually.

Good times... good times...

Actually Pac-Man was part of the second generation of videogame hardware where everything was "tile" based. The hardware generator was given a small RAM map of the playfield and then it would pull shapes from ROM and put them on-screen in a grid pattern. Less reliance on scanline timing and more logic could be done asychronously from the beam. The Nintendo NES would follow this design architecture later.

The Pac-Man Dossier is a great reference for this kind of stuff:


Thanks for the correction. I must be thinking of games older than Pacman then...

IIRC, Kangaroo used the old scanlines system, but its release date is after Pacman. I guess everyone was still trying to figure out how to do scanlines correctly at that time... so it must be different from arcade-game to arcade game.

Every manufacturer brewed their own hardware in-house. Remember this was cutting edge stuff in the early 80s. Things like RAM-based frame buffers were also insanely expensive, so you got tricks like this instead. Some went tile-based, others used custom ASICs or PLCCs to accelerate graphics.

Exactly. This article brought back memories. Actually I realized, why not just put this stuff old time onto Github. There is no real use if only to remind people how different it was back then when you wanted to make a game... Some people will relate.

Here https://github.com/gorhill/rayoid

So much time spent on counting CPU cycles... Atari ST's pixels layout was really tricky.

There were better games out there, I will always remember Llamatron, my favortie (I wonder if the sources are available somewhere.)

The Atari "vector display" games actually had a second processor that did the work of driving the electron beam. The primary game processor would compute and send a list of coordinates to the vector generator, and it would do the hard work of moving the beam around.

This lead to some neat hacks, like the guys that swapped the vector generator stuff for a laser projector:


I'll join the chorus reminiscing about hacking for game performance.

In my case, it was on a Mac on a PowerPC CPU. It's a far cry from the limited resources of early personal computers, but this was at a time when 3D was hitting big time - the Playstation had just come out - and I was trying to get performance and effects like a GPU could provide. A hobbyist could get decent rasterization effects from a home-grown 3D engine, but I was working as far forward as I could. All that unrolled code, careful memory access, fixed-point math... I spent a lot of time hand-tuning stuff. It wasn't until I dug into a book on PowerPC architecture that I found some instructions that could perform an approximation of the math quickly, and suddenly I was seeing these beautiful, real-time, true-color, texture-mapped, shaded, transparent triangles floating across the screen at 30fps.

It was about that time that the first 3DFX boards started coming out for Macs, though, and that was the end of that era.

Used to do a similar thing with old Archimedes games (the first computer to use an ARM chip, in 1988). The original ARM had 16 x 32 bit registers, and a single assembler command could write some or all of them to memory in one go. In practice you could use about 12 of the registers for graphics data (the others being program counters and stack pointers etc). Each pixel was 2 bytes, so with 12 registers you could do 1 row of 24 pixels - all in one instruction. Fetch some new data into the registers and write them again 24 times and you had a 24x24 sprite drawn very fast. To really use this technique you had to draw at word boundaries, thus the movement had to be 4 pixels per frame. But you could do a good full-screen scroll with this at around 12-15 fps (Archimedes could also do double-buffered screen memory so you draw one while displaying the other) and still plenty of time to do all the other work for each frame.

Another possible way to get around interrupts overwriting your screen, would be to turn them off and update the audio after every line or two.

Or do most of the game logic in the interrupt handler. Many C64 games would do this, as it allowed precise timing of when updates would happen so as to for example not need double buffering at all - hanging everything off the raster interrupt meant you could trivially ensure you updated the screen when the graphics chip was not processing that part of the screen.

It can be that then the points where the new audio information comes wouldn't be properly spaced to produce pleasant sound. The solution described in the article is really beautiful: you let the interrupts happen, just make sure that the as soon as the interrupt returns the task continues but "covers" all the tracks that the interrupt happened, even if there was no separate stack area at the time the interrupt happened. Really cool.

Depending on the DAC you need so many interrupts "every line or two" is already too slow. Not sure if this was the case on the Tandy, though.

So it was Tandy Color Computer 3 (CoCo 3). Thanks royjacobs.

Wikipedia has one more interesting detail about CoCo 3:


"Previous versions of the CoCo ROM had been licensed from Microsoft, but by this time Microsoft was not interested in extending the code further.[citation needed] Instead, Microware provided extensions to Extended Color BASIC to support the new display modes. In order to not violate the spirit of the licensing agreement between Microsoft and Tandy, Microsoft's unmodified BASIC software was loaded in the CoCo 3's ROM. Upon startup, the ROM is copied to RAM and then patched by Microware's code."

Or do what the original Macintosh did, have the sound buffer consist of a block of memory at the end of each line in the framebuffer. So each scanline would give the audio hardware new data to process.

Aren't the interrupts for DMA, buffer empty and other similar signals? If you do it manually I think performance suffers.

Thanks for sharing this. This is what Eugene Jarvis did to make Defender fast. It was a common tool in the toolbox for any clever game programmer for the 6809. I think it is awesome that Tom & buddy to experience the pleasure of its rediscovery.

So, question: what sits in memory below the bottom of the framebuffer? It seems like if a sound interrupt occurs while drawing the lowest-address tile, you might corrupt something below there.

Edit: Oh! Just got to footnote 2. Thanks, author!

I still remember a hack that I figured out on the Commodore 128 to speed up the 80 column display. I'm not aware of any program that actually made use of it (probably because the C128 and its 80 column display did not have a big enough user base to make it worthwhile to develop programs that needed speedy output).

The C128 had two separate video chips/ports, a C64 compatible chip showing a 40x25 character (320x200 pixel) display, and the "VDC"[1] showing 80x25 characters (640x200, or with interlacing, 640x400 or more), which was output on a separate connector. The VDC had a hideous way to change the display: it had its own video RAM, which the CPU couldn't access directly, instead the video chip had two internal registers (low and high byte) to store the address you wanted to access, and another register to read or write the value at that address. But that wasn't enough, the CPU couldn't access those VDC registers directly either, there was a second indirection on top: the CPU could only access two 2nd-level registers, one in which to store the number of the 'real' register you wanted to access, then you had to poll until the VDC would indicate that it's ready to receive the new value, and you would save the new value for the hidden register in the other 2nd level register. (There's assemply on [1] describing that 2nd level.) Those two registers were the only way of interaction between the CPU and the 80 column display.

[1] http://en.wikipedia.org/wiki/MOS_Technology_VDC

This was extremely slow. Not only because of the amount of instructions, but the VDC would often be slow to issue the readyness flag, thus the CPU would be wasting cycles in a tight loop waiting for the OK.

Now my discovery was that the VDC didn't always react slowly, it had times when the readyness bit would be set on the next CPU cycle. Unsurprisingly, the quick reaction times were during the vertical blanking period (when the ray would travel to the top of the screen, and nothing was displayed). During that time, there wasn't even a need to poll for the VDC's readyness, you could simply feed values to the 2nd level interface as fast as the CPU would allow, without any verification. Thus if you would do your updates to the screen during the vertical screen blank, you would achieve a lot more (more than a magnitude faster, IIRC), and the "impossibly slow" video would actually come into a speed range that might have made it interesting for some kinds of video games. Still too slow to do any real-time hires graphics, and the VDC didn't have any sprites, but it had powerful character based features and quite much internal RAM, plus blitting capabilities, so with enough creativity you might have been able to get away by changing the bitmaps representing selected characters to imitate sprites. And you could run the CPU in its 2 Mhz mode all the time (unlike when using the 40 column video, where you would have to turn it down to 1 Mhz to not interfere with the video chip accessing RAM in parallel, at least during that chip's non-screenblank periods.) My code probably looked something like:

        lda #$12       ; VDC Address High Byte register
        sta $d600      ; write to control register
	lda #$10       ; address hi byte
        sta $d601      ; store
        ldx #$13       ; VDC Address Low Byte register
        ldy #$00       ; address lo byte
        stx $d600      ; select address low byte register   4
        sty $d601      ; update address low byte            4
        lda #$1f       ; VDC Data Register                  3 ?
        sta $d600                                           4
        lda base,y     ; load value from CPU RAM            4 ?
        sta $d601      ; store in VDC RAM                   4
        iny                                                 2
        bne loop       ; or do some loop unrolling          3 ?
(28 cycles per byte, at 2 Mhz, => about 300-400 bytes per frame. Although the C128 could remap the zero page, too (to any page?), and definitely relocate the stack to any page, thus there are a couple ways to optimize this. (Hm, was there also a mode that had the VDC auto-increment the address pointer? Thus pushing data to $d601 repeatedly would be all that was needed? I can't remember.))

How would you time your screen updates to the vertical blanking period? There was no way for the VDC to deliver interrupts. It did however have a register that returned the vertical ray position. Also, the C128 had a separate IC holding timers. Thus IIRC I wrote code to reprogram the timer on every frame with updated timing calculations, so that I got an interrupt right when the VDC would enter the vertical blanking area.

As I said, I'm not aware of any production level program that used this; perhaps some did, but at least the behaviour was not documented in the manuals I had.

The VDC felt even more like a waste after I discovered this. The only use I had for it was using some text editor. I wasn't up to writing big programs at the time, either.

PS. sorry if that was a bit long.

This is the kind of genius I come to this site for. Thank you.

There should be a website where this kind of articles are collected!

I do love the lesson that is implicit here. At least for me. The game was basically playable and doing what it was supposed to do before these interesting hacks were done.

Another interesting tidbit that should be obvious, but I miss a lot. The format of the graphics was fixed and not necessarily on the table for things that can be changed to make the code work. All too often it seems I let what I'm wanting to accomplish affect how I plan on storing the data I'm operating on.

I love this post.

Today most of game programmers just ask for a bigger GPU.

If a programmer back then doing crazy assembler stuff to get barely acceptable performance were given the option to work on today's hardware and write code that is portable across chips and operating systems, provided they never do any clever hacks - do you really think they'd refuse?

That comparison as stated is a bit too extreme. However, I once had an Amiga programming wizard refuse a job because he didn't want to write C/C++ code for Windows PCs. He went on to do his magic on PSX and GBAs. In more recent times, there's plenty of hardcore programmers who have had a blast as PS3 SPU specialists.

Every time I see people complaining about Java, .NET, Go, and many other languages being slower than C or C++, I smile while remembering the same arguments being wrestled against Pascal dialects and C from Assembly developers back in those days.

And similar fun probably keeps going on the latest consoles, considering they have massive GPU sharing memory with the CPU, the devices even support atomic operations between GPU and CPU.

Incidentally that reminded me. How does one get to these positions? As the API's are closed there is no real way for an outsider to learn the ropes.

> How does one get to these positions?

- Deep understanding of low-level programming issues: assembly, architectures, caches, DMAs

- Experience doing advanced stuff in available platforms like OpenGL, DirectX, Cuda/OpenCL, etc.

- Fluency in Maths is not necessary for all low-level development, but will certainly be for physics and graphics work.

Check out places like http://directtovideo.wordpress.com/, http://www.iquilezles.org/www/index.htm and http://fgiesen.wordpress.com/. If you can show you even understand what they are talking about, you have a foot in the door. The rest is hand-on experience. :)

I meant more that I haven't really seen any open positions anywhere about positions like that, so I've thought it's more about who you know than what you know. I might be wrong though. Regarding skillset I pretty much have all of those. I have mostly thought that particular set of skills is not _that_ much needed (compared to <nodejs/C#/java/whatnot>)

Thanks about those links. While I'm well acquainted with iquilezles the other two were new for me.

The first link was especially interesting as I'm quite interesting in OpenCL raytracing, so that will save me a lot of research time in the near future, especially about the data structure selection. Interestingly with OpenCL 2.0 and devices that have shared memory anything that is fast to build on CPU and slow/open area of research on GPU can be pretty much ignored as the bottleneck between the two devices do not exist anymore.

I believe they found these "clever hacks" a funny part of their work.

I myself, for example, find these clever hacks very rewarding, maybe even more than finishing the stuff I'm working into.

I think you mean 'fun', not 'funny'. Fun means enjoyable and rewarding, while funny means humorous. (Not criticizing you, but if I am right, English is not your first language, and I'd appreciate the help if I were writing in Spanish.)

You are right =), I meant fun, not funny. (Spanish is my first language and I appreciate very much your clarification)

Did you guys see the top comment? TempleOS:


Some features:

64-bit ring-0-only single-address-map (identity) multitasking kernel

HolyC programming lanaguage interpreter

Praise God for binds using timer based random number generators

Create comics, hymns, poems as offerings to the Oracle

Terry is also a (hellbanned) regular on HN. [1] He is believed to have schizophrenia, and most of his comments come across as a bit ... "out there". But occasionally his skill at systems programming shines through, which is fairly sad considering how crippled by his mental illness he appears to be.

[1] https://news.ycombinator.com/user?id=losethos

What computer and game could this be? Looking at Wikipedia reveals that the Motorola 6809 was not used for many computers, and not any that I recognize as being very popular.

Oh .. just a few video arcade games you might have heard of .. Defender, Joust .. Juno First .. and so on .. so, actually quite a few games machines use the 6809 cpu ..

Those are not computers.

Sir, I beg to differ. I personally know of two individuals who are regularly reprogramming their 6809-based arcade boards, freely, with new software.

So, I urge you to reconsider your opinion; I believe it may be wholly incorrect.

The article says this was done on "a 1980’s-era PC" with a 6809 processor. teddyh was clearly wondering which PC (Tandy CoCo 3, it turns out) and which game, not asking for a random list of arcade games that happened to use the 6809. I'm not sure why he's getting downvotes when you're the one being obtuse.

By that metric, an abacus is a computer. This is not a workable definition for the discussion at hand.

Sorry, but wut? We're talking about systems that utilize the 6809 processor. This CPU is a general-purpose processor, used in quite many video game machines. What does an abacus have to do with anything?

I notice you called them “games machines” and not “home computers” or even “computers”. It seem to me that you knew very well what I was talking about, but choose to be obtuse.

I grew up in an era when in fact the most potent computing power I had access to, as a hacker/programmer, required 20c.

Alas, I realize not everyone thinks the way I do, but I consider among the worlds most functional 'computing experiences' occurred during the Arcade era, where, indeed, one could wonder at the .. sheer .. computing power .. being used to render sprites. With 256 colors.

So, at the risk of perhaps moving the goal-posts, let me just say that while the 6809 might've been a 'games console cpu', at the time, it was also used in a few .. relatively interesting .. micro's.

Anyone returning to that era, in their own way, contributes what they can. In my case, I consider the still-functioning Joust and Defender machines in my neighbourhood, at the very least, accessible through a repository ..

Not general-purpose computers, no, but how were they not computers?


"The 6809E was featured in the TRS-80 Color Computer (CoCo), the Acorn System 2, 3 and 4 computers (as an optional alternative to their standard 6502), the Fujitsu FM-7, the Welsh-made Dragon 32/64 home computers,"

I know and have/had almost all of them. Maybe more popular in Europe?

The Dragon was basically a CoCo clone. Don't recognise those Acorn computers (assuming this is the UK Acorn, those were all named "BBC Micro Computer" and had model names like "A", "B", "Master" etc), I'm pretty sure the Acorn used 6502, but it did have the option to interface to other processors (e.g. Z80) so 6809 would not surprise me.

The Acorn System computers are rack mounted computers that pre-date the Acorn Atom - http://fjkraan.home.xs4all.nl/comp/atom/history/index.html

Aside from the video titles mentioned below, Williams used the 6809 for every pinball machine built from 1990 to 1999. Probably a hundred thousand machines or so.

And I can also confirm that they also used the PSHS/PULU hack to do some very fast memory copies and blanks.

royjacobs recognized that first here in his comment to forktheif's top: it was Tandy Color Computer 3. The article author answered that question in the comments.

I used to spend my afternoons waypoint'ing all the computer shops I could access with a single days bus pass. This was a regular jaunt in the late 70's, early 80's, and .. at the time .. was a great way to prove ones chops. When you only had 20 minutes to type something in, it better be funky. (Coz the sales guy/gal is either going to let you save it to their "DEMO" floppy, or kick you out of the shop.)

Until you come back and decide to buy something, and suddenly .. it mattered which CPU you chose. Z80, 68xx, the rumoured 68k, and .. so on. It was a bewildering time.

(Thankfully, the still-functioning, embedded Video Game Arcades were able to provide some relief. I knew a few arcade owners, lets just say.. ;)

Anyway .. there was a Tandy shop, and I had a CoCo section in my (long-lost) notebook. There were some funky things, no? But still .. in the Amstrad invasion, the Atari megalith, the Commodore death spiral.. Tandy kind of lost the plot. Strangely so. The CoCo were somehow 'tainted' in my mind, back then, because of the TRS80 connection. Things didn't have to be so grey.

Nevertheless, if the connection with the Arcades had been made, I'm sure the CoCo could have kicked some serious ass. What heady times!

I think he means "Personal Computer" and it could be a Tandy or a Dragon, both Motorola 6809 based.

The Tandy CoCo was fairly popular in its day; The Commodore and Apple II were more popular, but before the IBM PC became the dominant platform there were lots of different microcomputers that competed, and required porting of games.

Great story! I grew up with this type of programming.

Brought back nice memories.

>> each tile was 28 by 28 pixels.

Why not a power of 2 like 16 or 32?

Having worked on similar applications, I can tell you that it was likely a graphical choice. It's difficult to balance legibility with screen space. 32x32 probably put too few tiles on the screen, and 16x16 probably didn't let him get enough detail in.

Indeed. In practice it makes very little difference how wide the sprites are, provided they are some exact number of bytes. You end up multiplying the Y coordinate by the row stride but working from column to column will be controlled by some kind of counter so any value will be fine.

I'm confused about something. After they've implemented their final solution that lets tiles become corrupted before they're overwritten, what happens to the sound? The sound is now being written to the screen, where it will be promptly overwritten by the copy tiles routine. Wouldn't that cause audio corruption? Or did playback of the sound complete before the interrupt returned?

Calling the interrupt routine to play a sound sample used the stack, hence wrote register contents (and return addresses upon subroutine calls within the handlers) onto the tile target. No problem, as while the interrupt routine was running, the normal program was stopped. After returning from the interrupt, the stack pointer would be reset to the original value, and the normal program would go on overwriting the locations that had just been used for the interrupt handler stack. Playing the audio would not be disturbed at all. (This assumes that the stack wraps around at the end. Actually, there was probably still image corruption happening, if the irq was triggered at the very end of the stack range. Except perhaps the stack size was 16bit?)

Edit: yes, the stack pointers were 16 bit (http://en.wikipedia.org/wiki/Motorola_6809).

You have a quick mind to spot that edge case! See footnote 2 of the article for how we handled it.

The interrupt routine would only be writing the next byte of audio to the DAC which would be at a fixed address, whats causing the corruption is that the CPU stores the state of the CPU when the interrupt occurs onto the stack; this isn't where the audio is being played from

    void execute_interrupt(void * isr) {
        // push the registers of the program running when
        // the interrupt happens onto the stack that's currently
        // in use. this is something the CPU looks after usually
        // (I think).
        // This meant in the article that the contents of the
        // registers were being pushed to the buffer holding
        // the current image, overwriting whatever data was
        // above the stack pointer. The fix was to swap which
        // register was the source and which the dest, so that
        // the ISR would be pushing to the buffer that was about
        // to be written to.

         // do whatever's needed to get the CPU into a consistant
         // state for the ISR

        // call the interrupt service routine for whatever interrupt
        // has occured
        // just decrements the system stack pointer, leaving
        // whatever was above the point corrupted in the
        // original version using the two stack regs

The problem isn't that sound is written to the screen, but that triggering an interrupt pushes to the stack (which is pointing at the screen).

Their fix of drawing in the opposite order works because a push/pop of the stack will then affect the upcoming graphics bytes instead of the previous ones.

At that rate, sound glitches would be in the sub-millisecond range and impossible to hear, specially with 8-bit audio.

Sounds similar to the scrolling 'hack' John Carmack used on Commander Keen.

I love this kind of stuff. It's the kind of article that today makes me very interested in embedded linux and systems that supposedly don't have enough resources to do things that we've been doing for decades.

Brilliant blog post!

Good read.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact