
Coding tricks of game developers - damian2000
http://www.dodgycoder.net/2012/02/coding-tricks-of-game-developers.html
======
toonse
For a launch product of a certain console I had a nasty bug report from QA
that took 20+ hours to reproduce. Finally (with 24 hours left to go to hit
console launch) tracked it down to some audio drivers in the firmware that
were erroneously writing 1 random byte "somewhere" at random times where the
"somewhere" was always in executable code space. I finally figured out that
any given run of the game that "somewhere" was always the same place, luckily.
1st party said sorry, can't fix it in time as we don't know why it's being
caused! So I shipped that game with stub code at the very start of main that
immediately saved off the 1 byte from the freshly loaded executable in the
place I knew it would overwrite for that particular version of the exe. There
was then code that would run each frame after audio had run and restore that
byte back to what it should be just in case it had been stomped that frame.
Good times! We hit launch.

To this day I still feel very very dirty about this hack, but it was needed to
achieve the objectives and harmed no-one :)

~~~
dedward
I wouldn't call it dirty - that's elegant. I'd call it a clean, fast way to
ship a product because of someone else's un-fixable screwup.

The dirty code is the stuff that was stomping yours in the first place.

~~~
toonse
True enough :) We did what we had to do given the circumstances.

------
psykotic
There are many neat tricks for branchless algorithms. Once you get a feel for
what makes them work, they're easy to invent on your own. Two representative
examples:

    
    
        // branchless octree traversal
        i   = x <= t->mx;
        i <<= 1;
        i  |= y <= t->my;
        i <<= 1;
        i  |= z <= t->mz;
        t   = t->child[i];
    
        // unrolled branchless binary search of a 64-element array
        if (p[32] <= x) p += 32;
        if (p[16] <= x) p += 16;
        if (p[ 8] <= x) p +=  8;
        if (p[ 4] <= x) p +=  4;
        if (p[ 2] <= x) p +=  2;
        if (p[ 1] <= x) p +=  1;
        return p;
    

(In the binary search, you'd implement the if statements with conditional
moves or predicated adds, depending on what your platform offers.)

Both of these examples replace branching with data-dependent array indexing,
which is a recurring pattern.

Branchless algorithms are often a performance win on consoles and embedded
systems. But they really shine when you're writing code for a wide-SIMD
architecture with gather-scatter support, where divergent memory accesses are
generally much less costly than divergent branches, GPUs being the most
widespread example.

~~~
pandaman
The second one has branches, perhaps you meant p += (p[32]<=x)*32; etc?

~~~
psykotic
I thought this caveat made it clear:

> (In the binary search, you'd implement the if statements with conditional
> moves or predicated adds, depending on what your platform offers.)

It's usually a bad idea to use arithmetical trickery in an effort to get the
compiler to generate conditional moves or predicated instructions. If you want
to make sure the compiler does the right thing, use macros that are
conditionally expanded to the appropriate intrinsics on each
compiler/platform.

Since I didn't want to get into that, I used if-statements with the above
caveat. Makes sense?

~~~
pandaman
I admit I've overlooked the caveat while posting the response yet even with
the caveat this does not make much sense, as every code can be turned into
branchless when you use predicated instructions (a bit less so with
conditional moves). This is what GPUs do often (many architectures don't have
branch instructions so all branches are always executed with predication).

~~~
psykotic
> I admit I've overlooked the caveat while posting the response yet even with
> the caveat this does not make much sense, as every code can be turned into
> branchless when you use predicated instructions (a bit less so with
> conditional moves).

Now you're just being intentionally obtuse. When I write if (x) y += z and say
that it should be implemented with conditional moves or predicated adds per
your platform, there really isn't much room for confusion.

> This is what GPUs do often (many architectures don't have branch
> instructions so all branches are always executed with predication).

Actually, all modern desktop-class GPU architectures have branch instructions.
They only revert to predication (which in NVIDIA's case is implicit, i.e. not
controlled by microcode-visible mask registers, though they of course also
have CMOV-like instructions) when the different SIMD lanes take divergent
branches. That's been the case since the G80/Tesla microarchitecture, which
the GeForce 8800 from 2006 was the first product to use. Mostly-coherent
dynamic branching is heavily used in modern shader code, to say nothing of
GPGPU code, and is almost free. Incoherent branching is the big issue.
Replacing that with incoherent memory accesses using branchless code can be a
huge win, even though incoherent memory access is far from free.

~~~
pandaman
>Now you're just being intentionally obtuse. When I write if (x) y += z and
say that it should be implemented with conditional moves or predicated adds
per your platform, there really isn't much room for confusion.

May be I am obtuse though not intentionally. If you wanted to say that if's
can be replaced with predicated instructions why such a convoluted example
(which is a nice piece of code, by the way)?

>Actually, all modern desktop-class GPU architectures have branch
instructions.

Indeed, and as you say, they are not always executed because a single SIMD
core executes threads in a lock-step and it's only possible to branch when all
the threads yield the same condition. Besides there are still GPUs without
branches, e.g. PS3 fragment shaders.

~~~
psykotic
> If you wanted to say that if's can be replaced with predicated instructions
> why such a convoluted example (which is a nice piece of code, by the way)?

That's not what I wanted to say; it was a side note to a piece of code.
Anyway, sorry if it was confusing. It was a spur-of-the-moment comment on
Hacker News, not a carefully wrought essay.

> Besides there are still GPUs without branches, e.g. PS3 fragment shaders.

The RSX is ancient at this point, a turd in a box. It was a turd even at the
time when, in the eleventh hour of the PS3 project, Sony begged NVIDIA to give
them a GPU after their fantasy of Cell-based rasterization had proved
ludicrous. It's so bad that you have to offload a ton of work on the SPUs to
get acceptable graphics performance.

------
malkia
I worked on the port on Metal Gear Solid to PC from Playstation 1. Released
2000 by Microsoft.

On PSX every triangle you draw is with screen integer coordinates, this
creates a lot of "shaking" which was okay for the regular PS1 user, but when
such game is ported to the PC it looks even worse.

What I did was a global array "float numbers[65536]" that kept for every major
GTE function (matrix rotation, scaling, vector multiply, etc.) a better
precision number. For example if after projection of coordinate 123.434 to
screen, it would write numbers[123] = 123.434 (can't remember whether I used
float or fixed).

So later when triangles would draw, if I had to draw triangle from X or Y
coordinate of 123 I would reuse the number 123.434

Now this is not accurate, but good enough - after all not many things end up
being on screen at coordinate 123, and most likely they would've been the same
calculated coordinate but for different tris... in fact it might've helped
sticking together certain things...

I dunno - but the shaking effect was gone :)

There was a lot to learn from Hideo's team too - We had an artist who doubled
all textures (eyes, clothes, etc.) - Hideo specially forbid the better looking
eyes. He said that they specifically blurred all eyes, because they could not
put eye animation in the cut scenes - this way, because they are blurred you
can't really see where the eyes look at it - hence preventing the Uncanny
Valley.

Another trick from Hideo's team was storing in one of the pointer bits game
play related info. On PSX if one of the bits of a pointer was up, it was
pointing to the same memory, but with uncached access. So they had pointers to
C4 bombs, that if were planted to the ground had the bit off, and when planted
to wall the bit on.

They also had a nice solving for T-junctions (they do happen a lot with
geometry). Rather than drawing extra polygon to fill, they were stretching a
little bit the polygons so they overlap by pixel or two.

And many other tricks, and memory failing me :)

~~~
LogicHoleFlaw
I played that port _so much_ back when I was in college in that time frame.
The VR missions too. Great stuff.

~~~
malkia
Thanks!

------
potatolicious
The "rainy day memory pool" is interesting, I've run into something similar in
the web-programming context.

I used to work for a company that had a horrific hardware requisition policy.
If your team needed a server, it had to go through a lengthy and annoying
approvals process - and even then, it took months before Infrastructure would
actually provide said servers.

In other words, when a project gets handed down from above to launch in, say,
3 months, there's no way in hell you can get the servers requisitioned,
approved, and installed in that time. It became standard practice for each
team to _slightly_ over-request server capacity with each project and throwing
the excess hosts into a rainy day pool, immediately available and
repurposeable as required.

New servers will still get requested for these projects, but since they took
so long to approve, odds are they'd go right into the pool whenever they
actually arrived, which sometimes took up to a year.

Of course, it was horrifyingly inefficient. Just on my team alone I think we
had easily 50 boxes sitting around doing nothing (and powered on to boot)
waiting to pick up the slack of a horrendously broken bureaucracy.

------
feralchimp
> Game developers often experience a horrific "crunch" (also known as a "death
> march"), which happens in the last few months of a project

That word again was _months_. Is that common in game dev?

~~~
Periodic
There was a notable case in 2004 with EA:

[http://en.wikipedia.org/wiki/Electronic_Arts#Treatment_of_em...](http://en.wikipedia.org/wiki/Electronic_Arts#Treatment_of_employees)

It was basically standard in the industry for everyone to work weekends and
nights for a month before release. It's calmed down perhaps and more people
are hourly so at least they get paid for it, but it's still expected that you
will put in significant weekend and evening hours before a milestone.

One issue is that a lot of the games don't seem to develop well under more
Agile practices. Games are very large projects and it can be hard to build
them incrementally, so they always end up with the standard software-
management over-budget and deadline slips.

At least, this is what I'm told by my wife and friends in the industry. I
still believe that their requirement of overtime is due to poor management,
pressure by execs, poor specs, and inflated goals.

~~~
psykotic
> I still believe that their requirement of overtime is due to poor
> management, pressure by execs, poor specs, and inflated goals.

It's only a requirement because game developers have tended to put up with it.
Especially in the earlier days, game development attracted people who were
extremely passionate about making games. This meant they were willing to put
up with a lot of shit. You don't have to ascribe much malice or incompetence
to companies to explain crunch time under those conditions--it would be more
surprising if it hadn't happened. Fortunately, things have changed for the
better, although there's still a long way to go.

I'm fine with crunch time so long as (1) it is surgically applied, (2)
employees know what they're getting into, and (3) employees are rewarded for
going above and beyond the call of duty.

------
Dove
_Because the camera was using velocity and acceleration and was collidable, I
derived it from our PhysicalObject class, which had those characteristics. It
also had another characteristic: PhysicalObjects could take damage. The air
strikes did enough damage in a large enough radius that they were quite
literally "killing" the camera._

 _This_ is what's wrong with inheritance.

~~~
SamReidHughes
Um, no? If their camera wasn't a PhysicalObject, they'd have had to write
custom code for handling the camera case for anything that handles
PhysicalObjects. All this means is that perhaps a PhysicalObject shouldn't be
the thing that takes damage.

~~~
Jare
The reason is that inheritance is too coarse-grained and promotes bundling
together lots of unrelated concepts. So it's likely that with a finer grained
system, they would have separated physics and damage in different classes.
They might have still made the mistake of not separating them, or of copy-
pasting the damage object into the camera anyway, so the point and punchline
of the story still stands.

~~~
fruchtose
_They might have still made the mistake of not separating them, or of copy-
pasting the damage object into the camera anyway, so the point and punchline
of the story still stands._

The possibility that a programmer could make a mistake in the implementation
of a paradigm is not an argument against that paradigm. The first part of your
comment is very real and very valid, but copy-pasting is a potential issue
regardless of your programming language.

~~~
Dove
I am convinced that this is not just a programmer's mistake. This is an
illustration of the problem with the paradigm itself. Inheritance makes you
_want_ to glue unrelated things together.

Inheritance views the world as fundamentally hierarchical; you can divide
things up into branches of a tree. A node on the tree either has all of a set
of functions or none of them. Any code you put into Reptile will _never_ be
needed by a Mammal, and will _never_ be inappropriate for an object that needs
other things in Reptile.

But of course, that's unrealistic. Where would you put swim()? Where would you
put fly()? Code reuse can't always be decomposed into a tree. And in an
inheritance paradigm, you get ugly workarounds in that case -- cut and pasted
code, nasty multiple inheritance hacks, or util classes. All bad things.

What you want are things coupled, not by convenient proximity in current
function, but by logical relationship. You want a class that has swimming-
related code _and nothing else_ , that any swimmer can use without
preconceptions about whether it's a Reptile or a Mammal. You want, in short,
Traits.

~~~
zoul
Inheritance is not a design methodology, it’s a tool. As a tool it does not
“view the world”, unless it’s meant in the sense that when you only have a
hammer, every problem looks like a nail to you. When you only have inheritance
in your toolbox, things are going to end badly indeed. But that does not mean
that inheritance is a bad tool.

------
Morendil
I can confirm the first trick, reserving a block of memory - I learned this in
the early 90s coding on the Mac as the "rainy day fund" memory allocation.

I'm amused to find that this has been written up as a pattern, the Memory
Overdraft Pattern: <http://www.charlesweir.com/papers/Mempres8AsHtml.html>

~~~
damian2000
Another method occurred to me that might sort of work on the PC would be to
finish a game, then delay the release for 1 year or so, during which time (due
to Moore's law), everyone's gfx cards & processors have gotten faster and
hence the average user's game playability would be improved.

~~~
zoul
I think the bigger titles already account for the hardware advancements and
aim for beefy, top-of-the line machines that are going to be common in the
several years it takes to develop the title or engine.

------
lucb1e
The first one can't be true actually. It might make the rest of the team all
cheerful and happy when you did this and 2 days before the release you call
them together and show them, but it doesn't actually help a bit.

It seemed in this first point that nobody really gave memory usage a lot of
thought ('write the solution first, optimize second'), and thus they were way
over the limit. Let's say the goal was 120MB and they were at 160. Then they
started compressing and optimizing everything, and after a while they got very
close; 121.5MB. So the experienced programmer removes the 2MB allocation and
saves the day.

If the experienced programmer hadn't done this, I don't think (seeing how much
they cared about the memory usage before) the memory usage would have been
more. They might have been at 158.5MB before optimization as well, and gotten
under the limit with the same optimizations as they already had to do now.

So as far as I can see, it seems the only value of doing this is the
psychological value. Might still be worth something, though!

~~~
maximilianburke
Sure it can, it happens all the time in fact. Many console development kits
have more memory available for development purposes that isn't available on
retail units. This is hugely beneficial in development because it means you
can actually run a build that has asserts enabled, or you can use special
memory allocators that pad allocations for debugging, etc. I have seen, and
been involved with, the mad scrambles to bring memory footprint down as a
project edges toward completion.

It's not as fatal to blow your memory budgets on PC as it is on a console but
if you're trying to hit a certain memory footprint so that the game is
playable on the minspec machines defined by your publisher then it could very
well be an issue.

------
Lewton
As someone who's read the gamasutra article before. This was really annoying
to read because of the new content being mixed in with the content from the
gamasutra article, without anything to distinguish what was new and what
wasn't.

Why mix the stories together like that?

~~~
Destroyer661
I haven't read the gamasutra article and found it nice to have all of them on
one page. I can't believe you're really being this pedantic.

~~~
Lewton
I did not have an issue with having them all on one page. I had an issue with
not grouping the gamasutra ones together. Instead they've been mixed. ie, nr
1, 4 and 6 are from the gamasutra article, while 2, 3 and 5 were new.

------
seagaia
I think #15 - attacking a bug directly instead of going about it with patches
(although possibly there could be a fine line between "direct' and using
patches) can be especially relevant for anyone (even outside of games).

The CRC one, #16, was cute.

~~~
Natsu
I read that and was surprised that the solution wasn't changing it to use MD5
or some other hash, rather than a CRC.

I guess they didn't have time?

~~~
avar

        > I read that [..] I guess they didn't have time
    

It says in the original post that the reason they didn't change the hashing
algorithm was that they didn't have time.

------
davidsiems
I think the 'best' trick I've seen is using pointer tagging on an object's
virtual function table pointer to squeeze in an extra flag during garbage
collection.

Adding another variable was thrashing the cache, so instead the GC would tag
the VFT pointer (making it unusable obviously) and then untag it before GC
ended, fixing the object.

I wasn't sure if I should be horrified or applaud when I found out about this.

~~~
Ogre
I can't recall the exact details, but just days or maybe a week before gold
master of one title I worked on, there was a case where an object's virtual
table was getting munged somehow. I do remember I managed to figure out
exactly what was happening. But the amount and type of work it would take to
fix would have likely delayed shipping.

So I wrote some code that constructed a new object of the same type on the
stack, then

    
    
      memcpy(&realObject, &stackObject, 
             (char * )&stackObject.m_firstDataMember - (char * )&stackObject));
    

It worked. I'm not proud of it, but I am amused by it. I'm sure it wasn't
guaranteed to work by any standard, but in practice worked fine for us.

------
dyselon
Just a warning to programmers implementing the first technique: Time your
release of that memory carefully, and coordinate with other programmers.

Your designers and artists know about the technique and grudgingly accept it
BUT having multiple different programmers independently come up having "found"
some memory AFTER I spent all day cutting everything from my level instead of
fixing bugs...

------
dfan
_We cut megabyte after megabyte, and after a few days of frantic activity, we
reached a point where we felt there was nothing else we could do. Unless we
cut some major content, there was no way we could free up any more memory.
Exhausted, we evaluated our current memory usage. We were still 1.5 MB over
the memory limit!_

This made me laugh, as the first PC game I worked on had system requirements
of 570K of RAM and 2M of hard drive space. (That was in 1992, and he's talking
about a "late-90's" title, so things had already changed a lot.)

~~~
ido
Interesting - wasn't extended/expanded memory already common by '92?

~~~
dfan
Yes, but we didn't require it.

(To be honest, I just found the system requirements by looking them up online
now; I don't remember them myself, and my personal copy of the game is at
work. I'll come back here and correct it if I was wrong.)

------
stephan83
One of these tricks is that you should fix 'the root of the problem' while a
lot of the other tricks are hacks to ship the game on time. I'm a bit
confused.

~~~
Retric
His patches comment was basically _early on we realized the collision
detection code was horribly broken so we just started patching every edge case
we could find._ That's practically an endless treadmill. On the other hand
when there is a vary specific problem really late in the production cycle that
has little do do with the rest of the game then you can just patch that
specific problem and ship the code. When an audio driver you have no control
over corrupts a single bit in your EXE that is the root problem not the
symptom.

------
bowyakka
The first one reminds me of a story I was told by my embedded engineer
friends. They had just been recently hired to maintain this video harddisk
storage application, and had found it to be horrifyingly slow. Looking into
the code it was found that the core sorting was little better than bubble sort
(and the data was on disk not in memory), as such it was quickly reprogrammed
to be a merge sort.

Oddly people started to complain that the application was _too fast_, after
much arguments (many of these being management thinking that, because it was
too fast it was not working properly) they came upon a solution.

The solution in all its glory was a loop at the end of the sort, with a sleep
in it.

The best thing was, that when they were feeling lazy they would put out a
release that was, for example 33% faster - by reducing the sleep time.

