Hacker News new | past | comments | ask | show | jobs | submit login
Why I Write Games in C (yes, C) (jonathanwhiting.com)
647 points by WoodenChair 27 days ago | hide | past | web | favorite | 544 comments

> The stop-the-world garbage collection is a big pain for games, stopping the world is something you can't really afford to do.

I love this opinion from games programmers because they never qualify it and talk about what their latency budgets are and what they do in lieu of a garbage collector. They just hand wave and say "GC can't work". The reality is you still have to free resources, so it's not like the garbage collector is doing work that doesn't need to be done. What latency budgets are you working with? How often do you do work to free resources? What are the latency requirements there? Even at 144 fps, that's 7ms per frame. If you have a garbage collector that runs in 200us, you could run a GC on every single frame and use less than 3% of your frame budget on the GC pause. I'm -not- suggesting that running a GC on every frame is a good idea or that it should be done, but what I find so deeply frustrating is that the argument that GC can't work in a game engine is never qualified.

edit: wow for once the replies are actually good, very pleased with this discussion.

Disclaimer: I'm not a game developer; but, I've worked on a lot of projects with tight frame time requirements in my time at Netflix on the TVUI team. I also have no experience in Go so I can't comment on the specifics of that garbage collector vs. V8.

I don't think it's necessarily that it "can't" work as much as it takes away a critical element of control from the game developers and the times you find yourself "at the mercy" of the garbage collector is pretty damn frustrating.

The problem we ran into with garbage collection was that it was generally non-deterministic both in terms of when it would happen and how long it would take. We actually added an API hook in our JS to manually trigger GC (something you can do when you ship a custom JS runtime) so we could take at least the "when it happens" out of the picture.

That said, there were often fairly large variances in how long it would take and, while frame time budgets may seem to accommodate things, if you end up in a situation where one of your "heavy computation" frames coincides with a GC that runs long, you're going to get a nasty frame time spike.

We struggled enough that we discussed exposing more direct memory management through a custom JS API so we could more directly control things. We ultimately abandoned that idea for various reasons (though I left over 6 years ago so I have no idea how things are today).

> it was generally non-deterministic

This is basically it. People who have never worked on actual real-time systems just never seem to get that in those environments determinism often matters more than raw performance. I don't know about "soft" real-time (e.g. games, or audio/video) but in "hard" real-time (e.g. avionics, industrial control) it's pretty routine to do things like disable caches and take a huge performance hit for the sake of determinism. If you can run 10% faster but miss deadlines 0.1% more often, that's a fail. It's too easy for tyros to say pauses don't matter. In many environments they do.

Its not just pauses either. Non-deterministic memory usage is a big deal too.

I have actually worked on a system where malloc() was forbidden. In fact it always returned null. Buffers were all statically allocated and stack usage was kept to a minimum (it was only a few kB anyways).

The software was shipped with a memory map file so you know exactly what each memory address is used for. A lot of test procedures involved reading and writing at specific memory locations.

It was for avionics BTW. As you may have guessed, it was certified code with hard real time constraints. Exceeding the cycle time is equivalent to a crash, causing the watchdog to trigger a reset.

Malloc was forbidden in a AAA title I worked on. Interestingly enough we also had two (!!!) embedded garbage collected scripting languages that could only allocate in small (32MB, 16 MB) arenas and would assert if they over allocated.

In modern C libraries malloc is mmap’ed anon pages underneath for an increasingly larger set of conditions these days.

Sounds very similar to NASA C programming guidelines. Each module during the initialization period would allocate its static memory size. Every loop had a upper bound's max iteration to prevent infinite loops.

I think they posted the guidelines and it was a wonderful read about how they developed real time systems.

Do you happen to have a link to the guidelines?

I was wondering why I couldn't open this pdf on my android pdf viewer and this link gives me cloudflare's captcha. It uses google's ReCaptcha - storefronts, bicycles, semaphores. What a terrible practice. People stop overusing cloudflare!

Why even define a malloc() in this environment if it always returns NULL?

Because even if you don't use it directly, you might use some code that uses it. And even then, it might not ever be called given the way you are reusing the code, so...a dynamic check is the best way to ensure it never actually gets used.

> Because even if you don't use it directly, you might use some code that uses it.

It seems extremely unlikely that any general purpose code you might adopt, which happens to invoke malloc() at all, would be fit for purpose in such a restricted environment without substantial modification; in which case you would just remove the malloc() calls as well.

> And even then, it might not ever be called given the way you are reusing the code, so...a dynamic check is the best way to ensure it never actually gets used.

In such a restricted environment, it is unlikely you just have unknown deadcode in your project. "Oh, those parts that call malloc()? They're probably not live code and we'll find out via a crash at runtime." That's like the opposite of what you want in a hard realtime system.

So, no — a static, compile/link time check is a strictly superior way to ensure it never gets used.

Spoken like someone who never had a 3rd party binary blob as a critical dependency.

If you need your system to never dynamically allocate memory, an opaque 3rd party binary blob which might still do that doesn't seem like a valid inclusion.

Again, we're talking about a very restricted hard realtime environment. How do you trust, let alone qualify, a 3rd party binary blob that you know calls malloc(), which is a direct violation of your own design requirements?

Not OP but the most common reason I've seen is that this was an additional restriction that they imposed for their project. The standard library includes a perfectly working malloc. You override it so that you can't end up calling it accidentally, either explicitly (i.e. your brain farts and you do end up malloc()-ing something) or implicitly (i.e by calling a function that ends up calling malloc down the line). The latter is what happens, and surprisingly easy and often. Not all libraries have the kind of documentation that glibc has, nor can you always see the code.

Some things are okay to allocate at some point in time. So you can disable malloc later when you no longer want it to provide memory.

That's plausible, but not how OP defined it:

> a system where malloc() was forbidden. In fact it always returned null.

So you crash immediately if someone tries to use it or a library gets added that does?

Never experienced the malloc() thing but do throw exceptions and fail fast under conditions like these so they're caught in testing.

I think what he's asking to do is to make it a linker failure by not adding it to the standard lib at all. Compile fails are nicer than runtime fails.

Like sibling commenter says, a compile/link error is even better and faster than finding out at runtime with a crash.

I’d be interested in knowing more if you care to write more about this. I’m currently costing/scoping out what’s required in writing software at that level but don’t really have some real industry insight into what other people do

Static memory is something we already do but we’re pretty interested whether industry actually adopts redundant code paths, monitors, to what extent watchdogs (how many cycles can be missed?), etc.

I suggest you take a look at the books by Michael Pont and the offerings through his company "SafeTTy Systems". His/Company's work deals with hard real-time systems with all the associated standards.

Wow - disabling the caches is an extreme measure. I get it though. After doing that, you can (probably) do cycle counting on routines again. Just like the 80s or earlier.

There's probably enough jitter in the memory system and in instruction parallelism that accurate cycle counting will still be challenging.

Also, you probably want some padding so that newer versions of the CPU can be used without too much worry. It's possible for cycle counts of some routines to increase, depending on how new chips implement things under the hood.

[says a guy who was counting cycles, in the 1980s :-)]

I don't. Maybe we're thinking about different kind of caches, but if these are transparent, no-performance-impact caches, then why wouldn't you prove the system works well with caches off (guarantee deadlines are met), then enable caches for opportunistic power gains?

> if these are transparent, no-performance-impact caches

If there were no performance impact, there would be no point. I'm not just being snarky; there's an important point here. Caches exist to have a performance impact. In many domains it's OK to think about caching as a normal case, and to consider cache hit ratio during designs. When you say "no performance impact" you mean no negative performance impact, and that might be technically true (or it might not), but...

But that's not how a hard real-time system is designed. In that world, uncached has to be treated as the normal case. Zero cache hit ratio, all the time. That's what you have to design against, even counting cycles and so on if you need to. If you're designing and testing a system to do X work in Y time every time under worst-case assumptions, then any positive impact of caches doesn't buy you anything. Does completing a task before deadline because of caching allow you to do more work? No, because it's generally considered preferable to keep those idle cycles as a buffer against unforeseen conditions (or future system changes) than try to use them. Anything optional should have been passed off to the non-realtime parts of the larger system anyway. There should be nothing to fill that space. If that means the system was overbuilt, so be it.

The only thing caches can do in such a system is mask the rare cases where a code path really is over budget, so it slips through testing and then misses a deadline in a live system where the data-dependent cache behavior is less favorable. Oops. That's a good way for your product and your company to lose trust in that market. Once you're designing for predictability in the worst case, it's actually safer for the worst case to be the every-time case.

It's really different than how most people in other domains think about performance, I know, but within context it actually makes a lot of sense. I for one am glad that computing's not all the same, that there are little pockets of exotic or arcane practice like this, and kudos to all those brave enough to work in them.

While you might test your hard real-time requirements with caches disabled, there's still reason to run the code with caches afterwards.

E.g. errors that didn't match a branch or input scenario during testing which would go over budget without cache, but with cache might prevent a crash.

Another could be power consumption, latency optimization, or improvement of accuracy. E.g. some signal analysis doesn't work at all if the real-time code is above some required Nyquist threshold, but faster performance improves the maximum frequency that can be handled, improving accuracy.

You could be right on some of those. That didn't seem to be the prevailing attitude when I worked in that area, but as I said that was a long time ago - and it was in only a few specific sub-domains as well.

Forgot to mention: cache misses can be more expensive than uncached accesses, so testing with caches off and then turning them on in production can be a disaster if you hit a cache-busting access pattern. Always run what you tested.

Because you lose deterministic behavior, and there are cases where that is non-negotiable, regardless of performance cost.

Who will provide a guarantee that the caches are truly transparent and will not trigger any new bugs?

Essentially, you would need to prove the statement "if a system works well with caches off, then it works well with caches on" to the satisfaction of whatever authority is giving you such stringent requirements.

You'd have to very solidly prove that in the wors-case a cache only ever make execution time equal to or faster than a processor not using cache and never causes anything to be slower.

Even that is not enough. The cache may make everything faster, but it could lead to higher contention on a different physical resource slowing things down there. The cache cannot be guaranteed to prevent that.

""soft" real-time (e.g. games, or audio/video) "

Hah. Who sez that audio and video products have 'soft' real time? Go on now.

I didn't make up the terminology. It was already in common use at least thirty years ago when I was actively working in that domain. To simplify, it's basically about whether a missed deadline is considered fatal or recoverable. That difference leads to very different design choices. Perhaps some kinds of video software is hard real-time by that definition, but for sure a lot of it isn't. I'd apologize, but it was never meant to be pejorative in the first place and being on one side of the line is cause for neither pride nor shame. They're just different kinds of systems.

What percentage of the market is A/V build to actual hard real-time standards, and not expected to run on devices that can't provide it (so no PCs with normal OSes, no smartphones)? For the vast majority, soft real-time is fine, since an occasional deadline-miss results in minor inconvenience, not property damage, injury or death.

I assume some dedicated devices are more or less hard real time, due to running way simpler software stacks on dedicated hardware.

I take it that scarcely anyone here has written software for video switchers, routers, DVEs, linear editors, audio mixers, glue products, master control systems, character generators, etc. etc. Missing a RT schedule rarely results in death, but you'd think so given the attitude from the customer. That's a silly definition for it.

There's a whole world out there of hard real time, the world is not simply made up of streaming video and cell phones.

The cool thing on HN is you can get down voted for simply making that observation. It's a sign of the times I'm afraid.

I actually have written software for video routers and character generators. We didn't consider them hard real time, though I wouldn't claim that such was standard industry usage.

For example, if you're doing a take, you have to complete it during the blanking interval, but usually the hardware guarantees that. In the software, you want you take to happen in one particular vertical blanking interval (and yes, it really is a frame-accurate industry). But if you miss, you're only going to miss by one. We didn't (so far as I know) specify guarantees to the customer ("If you get your command to the router X ms before the vertical interval, your take will happen in that vertical"), so we could always claim that the customer didn't get the command to the router in time. Again, so far as I know - there may have been guarantees given to the customer, but I didn't know about them.

But that was 20 years ago, back in the NTSC 525 days.

Nice name, by the way. Do you know of any video cards that will do a true Porter & Duff composite these days? I recall looking (again, 20 years ago) at off-the-shelf video cards, and while they could do an alpha composite, it wasn't right (and therefore wasn't useful to us).

I currently work on software controlling the hardware like video routers, and this is definitely my experience. It’s all very much soft real-time.

In terms of customers and how much they care, the North American market seems to care less than Europe.

I work on an open-source music sequencer (https://ossia.io) and no later than two days ago I had a fair amount of mails with someone who wanted to know the best settings for his machine to not have any clicks during the show (which are the audio symptoms of "missed deadline"). I've met some users who did not care, but the overwhelming majority does, even for a single click in a 1-hour long concert.

If it's running in a consumer OS (not a RT one) and it counts on having enough CPU available to avoid missing the deadline, that's exactly what soft-realtime is.

Compare your “not a single click in an hour [for quality reason]” to a “not a single missed deadline in 30 years of the life expectancy of a plane, on a fleet of a few thousands planes [for safety reasons]”. That's the difference of requirements between hard and soft RT.

I did some soft real-time (video decoding) and I have a friend working on hard real-time (avionics) and we clearly didn't worked in the same world.

Yeah. To me, hard real time is when you count cycle (or have a tool that does it for you), to guarantee that you make your timing requirements. We never did that.

You just agreed with the OC.

RT video/audio failing never results in death. Where as failures in "avionics, industrial control" absolutely can / do. That seems to be where OC was drawing the line.

Seems to be a common distinction, although GP is right with the addition that the production side of things is more demanding (and at least would suffer financial damage if problems occur to often) than the playback side formed by random consumer gear, and has some, especially low-level/synchronization-related, gear to hard standards. But often soft is enough, as long as it's reliable enough on average.

>For the vast majority, soft real-time is fine, since an occasional deadline-miss results in minor inconvenience, not property damage, injury or death.

A "minor inconvenience" like a recording session going wrong, a live show with stuttering audio, skipped frames in a live TV show, and so on?

Most professional recording studios are using consumer computer hardware that can't do hard realtime with software that doesn't support hard realtime.

People like deadmau5, Daft Punk, Lady Gaga all perform with Ableton Live and a laptop or desktop behind their rig. If it were anything more than a minor inconvenience, these people wouldn't use this.

It's very unlikely to have audio drop outs, a proper setup will basically never have them. But still if you have one audio dropout in your life, you're not dead, your audience isn't dead, a fire doesn't start, a medical device doesn't fail to pump, and so on.

And yes you can badly configure and system, but the point is you can't configure these to be 100% guaranteed, 99.99% is perfectly fine.

Edit: Sometimes people call these "firm" realtime systems. Implying the deadline cannot be missed for it to operate, but also that failure to meet deadlines doesn't result in something serious like death (e.g in a video game you can display frames slower than realtime and it kind of works but feels laggy, however you cannot also slow down the audio processing because you'll a lowered pitch, so you have to drop the audio.)

As long as the individual event happens seldom enough few of these actually are a big problem. Soft real-time being allowed to blow deadlines doesn't mean it can't be expected to have a very high rate of success (at least that's the definitions I've learned), and clearly a sufficiently low rate of failure is tolerated. There's a vast difference between "there's an audio stutter every day/week/month/..." and "noticeably stuttering audio". The production side is obviously a lot more sensitive about this than playback, but will still run parts e.g. on relatively normal desktop systems because the failure rate is low enough.

The production side usually renders the final audio mix off-line, so no real-time requirements there for getting optimum sound quality. I'd say the occasional rare pop or stutter is worse to have during a live performance than when mixing and producing music.

There's probably 500+ successful GC based games on the Steam store another 100000 to a million hobby games doing just fine with GC.

I started game programming on the Atari 800, Apple 2, TRS-80. Wrote NES games with 2k of ram. I wrote games in C throughout the 90s including games on 3DO and PS1 and at the arcade.

I was a GC hater forever and I'm not saying you can ignore it but the fact that Unity runs in C# with GC and that so many quality and popular shipping games exist using it is really proof that GC is not the demon that people make it out to be

Some games made with GC include Cuphead, Kerbal Space Program, Just Shapes & Beats, Subnautica, Ghost of a Tale, Beat Saber, Hollow Knight, Cities: Skylines, Broforce, Ori and the Blind Forest

Kerbal Space Program is a lot of fun, but for me it freezes for half a second every 10 seconds... due to garbage collection. Drives me crazy.

Even on a fast PC, Kerbal Space Program audio is choppy because of GC pauses.

It's successful in spite of that, but that doesn't make it any better.

But maybe it had to do with them being able to use a higher level library, not worry about gc, and focus on other things

And a very important one, Minecraft, atleast the Java version. It doesn't matter until you are running a couple hundred mods with lots of blocks and textures, when the max memory allocated gets saturated and it stutters like hell.

I don't think the argument is "you can't ship succesfull game with GC"

You might spend more time fighting the GC than benefitting from it. And that seems to be the experience for large games - simpler ones might not care.

Unity offers a lot more than just a language, and developers have to choose, are they willing to put up with GC to get the rest of what Unity offers.

Minecraft is the only Java based popular game I can think of. And damn did I love Subnautica.

The recent roguelite hit Slay the Spire (over a million copies sold on Steam) is also made in Java.

> and the times you find yourself "at the mercy" of the garbage collector is pretty damn frustrating.

You're still at the mercy of the malloc implementation. I've seen some fairly nasty behaviour involving memory leaks and weird pauses on free coming from a really hostile allocation pattern causing fragmentation in jemalloc's internal data.

Which is why you generally almost never use the standard malloc to do your piecemeal allocations. A fair number of codebases I've seen allocate their big memory pools at startup, and then have custom allocators which provide memory for (often little-'o') objects out of that pool. You really aren't continually asking the OS for memory on the heap.

In fact, doing that is often a really bad idea in general because of the extreme importance of cache effects. In a high-performance game engine, you need to have a fine degree of control over where your game objects get placed, because you need to ensure your iterations are blazingly fast.

Doesn’t this just change semantics? Whatever custom handlers you wrote for manipulating that big chunk of memory are now the garbage collector. You’re just asking for finer grained control than what the native garbage collection implementation supports, but you are not omitting garbage collection.

Ostensibly you could do the exact same thing in e.g. Python if you wanted, by disabling with the gc module and just writing custom allocation and cleanup in e.g. Cython. Probably similar in many different managed environment languages.

I mean, nobody is suggesting they leave the garbage around and not clean up after themselves.

But instead what you can do is to reuse the "slots" you are handing out from your allocator's memory arena for allocations of some specific type/kind/size/lifetime. If you are controlling how that arena is managed, you will find yourself coming across many opportunities to avoid doing things a general purpose GC/allocator would choose to do in favor of the needs dictated by your specific use case.

For instance you can choose to draw the frame and throw away all the resources you used to draw that frame in one go.

The semantics matter. A lot of game engines use a mark-and-release per-frame allocation buffer. It is temporary throwaway data for that frame's computation. It does not get tracked or freed piecemeal - it gets blown away.

Garbage collection emulates the intent of this method with generational collection strategies, but it has to use a heuristic to do so. And you can optimize your code to behave very similarly within a GC, but the UI to the strategy is full of workarounds. It is more invasive to your code than applying an actual manual allocator.

> A lot of game engines use a mark-and-release per-frame allocation buffer.

I've heard of this concept but a search for "mark-and-release per-frame allocation buffer" returned this thread. Is there something else I could search?

It’s just a variation of arena allocation. You allocate everything for the current frame in an arena. When the frame is complete. You free the entire arena, without needing any heap walking.

A generational GC achieves a similar end result, but has to heuristically discover the generations, whereas an arena allocator achieves the same result deterministically And without extra heap walking.

Linear or stack allocator are other common terms. Just a memory arena where an allocation is just a pointer bump and you free the whole buffer at once by returning the pointer to the start of the arena.

Getting rid of this buffer is literally nothing. There is no free upon the individual objects needed. You just forget there was anything there and use the same buffer for the next frame. Vs. Waiting for a GC to detect thousands of unused objects in that buffer and discard them, meanwhile creating a new batch of thousands of objects and having to figure out where to put those.

You can do many things in many languages. You may realize in the process that doing useful things is made harder when your use case is not a common concern in the language.

C's free() gives memory back to the operating system(1), whereas, as a performance optimization, many GCd languages don't give memory back after they run a garbage collection (see https://stackoverflow.com/questions/324499/java-still-uses-s...). Every Python program is using a "custom allocator," only it is built in to the Python runtime. You may argue that this is a dishonest use of the term custom allocator, but custom is difficult to define (It could be defined as any allocator used in only one project, but that definition has multiple problems). The way I see it, there are allocators that free to the OS and those that don't or usually don't (hereafter referred to as custom). In C, a custom allocator conceivably could be built into, say, a game engine. You might call ge_free(ptr) which would signal to the custom allocator that chunk of memory is available and ge_malloc() would use the first biggest chunk of internally allocated memory, calling normal malloc() if necessary. Custom allocators in C are a bit more than just semantics, and affect performance (for allocation-heavy code). Furthermore, they are distinct from GC, as they can work with allocate/free semantics, rather than allocate/forget (standard GC) semantics. Yes, one could technically change any GCd language to use a custom allocator written by one's self. But Python can't use allocate/free semantics (so don't expect any speedup). Python code never attempts manual memory management, (i.e. 3rd party functions allocate on the heap all the time without calling free()) because that is how Python is supposed to work. To use manual memory management semantics in Python, you would need to rewrite every Python method with a string or any user defined type in it to properly free.

(1) malloc implementations generally allocate a page at a time and give the page back to the OS when all objects in the page are gone. ptr = malloc(1); malloc(1); free(ptr); doesn't give the single allocated page back to the OS.

Python is a bad example to talk about gc, because it uses different garbage collector than most of languages. It is also the primary reason why getting rid of GIL and retaing performance is so hard. Python uses reference counters and as soon as the reference count drops to 0 it immediately frees the object, so in a way it is more predictable. It has also a traditional GC and I guess that's what was mentioned you can disable it. The reason for it is that reference count won't free memory of there is a loop (e.g. object A references B and B references A, in that case both have reference count 1 even though nothing is using them), do that's where the traditional GC steps in.

Freeing memory to the OS causes TLB cache stalls in all other threads in the process.

If the program runs for any length of time, it will probably need the same memory again, so freeing it is a pessimization.

Standard C library free() implementations very, very rarely free memory back to the OS.

It's not a performance optimisation not to give space back. GCs could easily give space back after a GC if they know a range (bigger than a page) is empty, it's just that they rarely know it is empty unless they GC everything, and even then there is likely to be a few bytes used. Hence the various experiments with generational GC, to try to deal with fragmentation.

Many C/C++ allocators don't release to the OS often or ever.

That's true, and it's why the alternative to GC is generally not "malloc and free" or "RAII" but "custom allocators."

Games are very friendly to that approach- with a bit of thought you can use arenas and object pools to cover 99% of what you need, and cut out all of the failure modes of a general purpose GC or malloc implementation.

Interestingly, it's fully possible to disable the automatic garbage collection in Go to achieve this.

Disable the garbage collector:

Trigger garbage collection:

It is also possible to allocate a large block of memory and then manage it yourself.

Due the low throughput of Go's GC (which trades a lot of it in favor of short pause duration), you risk running out if memory if you have a lot of allocations and you don't run your GC enough times.

For a computer game, if you start out by allocating a large block of memory, then manage it yourself, I don't see how this would be a problem.

You're not using the GC at all then. Why use Go (and praise its GC) in that case?

For the joy of getting the round peg through the square hole of course.

Go has many advantages over C that are not related to GC.

In a context where you don't allocate memory, you lose a lot of those (for instance, you almost cannot use interfaces, because indirect calls cause parameters to those calls to be judged escaping and unconditionally allocated on the heap).

Go is a good language for web backend and other network services, but it's not a C replacement.

If you allocate a large block of memory manually at the start of the program, then trigger the GC manually when it suits you, won't you get the best of both worlds?

Go also has many disadvantages, compared to plain C.

Can't think of any. Do you have an example?

You can't call native libraries without going through cgo. So unless you don't want to have audio, draw text and have access to the graphic APIs, you'll need cgo, which is really slow due to Go's runtime. For game dev, that's a no go (pun intended).

Additionally, the Go compiler isn't trying really hard at optimizing your code, which makes it several times slower on a CPU-bound task. That's for a good reason: because for Go's usecase, compile-time is a priority over performances.

Saying that there is no drawbacks in Go is just irrational fandom…

You are only talking about the Go compiler from Google.

GCC also supports Go (gccgo) and can call native libraries just like from C.

I'm not saying there are no drawbacks in Go, just that I can't think of any advantages of C over Go.

Go was pushed as a C replacement, but very few C programmers switched to it, it seems like it took hearts of some of Python, Ruby, Java etc programmers.

Nonetheless, Go has many advantages over C that are not related to GC.

So does Python or Ruby, that doesn't mean it is a C replacement.

> It is also possible to allocate a large block of memory and then manage it yourself.

At which point you're mostly just writing C in Go.

Actually you're not.

I would very much prefer a stripped down version of Go used for these situations rather than throwing more C at it. The main benefits of using Go are not the garbage collection, its the tooling, the readability (and thus maintainability) of the code base, the large number of folks who are versatile in using it.

Readability is subjective.

Large user base? C is number 2. Go isn't even in the top 10.[1]

Tooling? C has decades of being one of the most commonly used languages, and a general culture of building on existing tools instead of jumping onto the latest hotness every few months. As a result, C has a very mature tool set.

[1] https://www.tiobe.com/tiobe-index/

Unfortunately the excellent standard library is a major benefit of Go, and it uses the GC, so if you set GOGC=off you're left to write your own standard library.

I would also like to see a stripped-down version of Go that disables most heap allocations, but I have no idea what it would look like.

Are you saying that there are more go developers than c developers? Is there a user survey that shows such things? I'm curious what the ratio is.

I'd be willing to wager that C programmers would be more comfortable working with a Golang codebase than Golang programmers would be working with a C codebase.

There may be more "C programmers" by number but a Golang codebase is going to be more accessible to a wider pool of applicants.

In my experience it takes a few days for a moderate programmer to come up to speed on Go, whereas it takes several months for C. You need to hire C programmers for a C position, you can hire any programmers for a Go position.

If they don't already know C though, how well will they cope with manual memory management?

How do people learn C without knowing about manual memory management? They learn about it as they learn the language. This can be done in any language that allows for manual memory management (and most have much better safeguards and documentation than C, which has a million ways to shoot yourself in the foot)

It will be a learning curve, but a much, much smaller one than learning C.

But the entire point of this line of questioning is that there are more programmers who already know C.

You’re writing in a much improved C. Strong type system (including closures/interfaces/arrays/slices/maps), sane build tooling (including dead simple cross compilation), no null-terminated strings, solid standard library, portability, top notch parallelism/concurrency implementation, memory safety (with far fewer caveats, anyway), etc. Go has it’s own issues and C is still better for many things, but “Go with manually-triggered GC” is still far better than C for 99.9% of use cases.

Go’s compiler is not at all optimized for generating fast floating point instructions like AVX and its very cumbersome to add any kind of intrinsics. This might not matter for light games but an issue when you want to simply switch to wide floating point operations to optimize some math.

Yeah, C compilers optimize much more than Go compilers. Performance is C’s most noteworthy advantage over Go.

GCC can compile both C and Go. I searched for benchmarks but found none for GCC 9 that compares the performance of C and Go. Do you have any sources on this?

I don’t have a source, but it’s common knowledge in the Go community. Not sure how GCC works, but it definitely produces slower binaries than gc (the standard Go compiler). There are probably some benchmarks where this is not the case, but the general rule is that gcc is slower. gc purposefully doesn’t optimize as aggressively in order to keep compile times low.

Personally I would love for a —release mode that had longer compile times in exchange for C-like performance, but I use Python by day (about 3 orders of magnitude slower than C) so I’d be happy to have speeds that were half as fast C. :)

Which compiler? The one from Google, GCC (gccgo) or TinyGo?

Does Go really let you use closures, arrays, slices and maps when you disable the garbage collector? If so, does that just leak memory?

Yes, the idea is that you must invoke the GC when you’re not in a critical section. Alternatively you can just avoid allocations using arenas or similar. (You can use arrays and slices without the GC).

To make sure I understand, is this an accurate expansion of your comment?

Yes it would leak, to avoid leaking you could invoke the GC when you’re not in a critical section. Alternatively, if you don't use maps and instead structure all your data into arrays, slices and structs, you can just avoid allocations using arenas or similar. (You can use arrays and slices without the GC, but maps require it).

Yes, that is correct. Anything that allocates on the heap requires GC or it will leak memory. Go doesn’t have formal semantics about what allocates on the heap and what allocates on the stack, but it’s more or less intuitive and the toolchain can tell you where your allocations are so you can optimize them away. If you’re putting effort into minimizing allocations, you can probably even leave the GC on and the pause times will likely be well under 1ms.

And not to forget that using Go correctly you'd end up doing mostly stack pushes and pops

With JS the trick is to avoid creating new objects and instead have a pool of objects that are always referenced.

Definitely! Object pools are common in game dev too from what I know. We used them extensively to reduce the amount of GC we needed.

Speedrunners thank you for reusing objects! I'm certain that decisions like this are what lead to interesting teleportation techniques and item duplications. Games wouldn't be the same without these fun Easter eggs!

Once I wrote a very small vector library in JS for this very reason: almost all JS vector libraries out there tend to dynamically create new vectors for every vector binary operation, this makes JS GC go nuts. It's also prohibitively expensive to dynamically instantiate typedarray based vectors on the fly, even though they are generally faster to operate on... most people focus on fixing the latter in order to be able to use typedarrays by creating vector pools (often as part of the same library), but this creates a not-insignificant overhead.

Instead my miniature library obviated pools by simply having binary operators operate directly on one of the vector objects passed to it, if more than one vector was required for the operation internally they would be "statically allocated" by defining them in the function definition's context (some variants i would also return one of these internal vectors - which was only safe to use until a subsequent call of the same operator!).

The result this had on the calling code looked quite out of place for JS, because you would effectively end up doing a bunch of static memory allocation by assigning a bunch of persistent vectors for each function in it's definition context, and then you would often need to explicitly reinitialize the vectors if they were expected to be zero.

... it was however super fast and always smooth - I wish it was possible to turn the GC off in cases like this when you know it's not necessary. It was more of a toy as a library, but i did write some small production simulations with it - i'm not sure how well the method would extend to comprehensive vector and matrix libraries, I think the main problem is that most users would not be willing to use a vector library this way, because they want to focus on the math and not have to think about memory.

You wouldn't have this still up somewhere like GH would you? I'm currently writing a toy ECS implementation and have somewhat similar needs, and I've been trying to build up a reference library of sorts covering novel ways of dealing with these kind of JS issues

No sorry, this is from more than 5 years ago and it was never FOSS, but I only implemented rudimentary operators anyway, you could easily adapt existing libraries or write your own, the above concept is more valuable than any specific implementation details... the core concept being, never implicitly generate objects, e.g operate directly on parameter objects, or re-use them for return value, or return persistent internal objects (potentially dangerously since they will continue to be referenced and used internally).

All of these ideas require more care when using the library though.

Thank you though, what you've written here is very useful -- you're describing things I'm immediately recognising in what I'm doing. As I say, it's just a small toy (and I think one that would definitely be easier in a different language, but anyway...). At the minute I'm actually at the point where I'm preallocating and persisting a set of internal objects, and at a very very small scale it's ok, but each exploration in structure starts to become manageable pretty quickly.

Three.js implements their math this way.


You can see most operations act on the Vector and there are some shared temporary variables that have been preallocated. If you look through some of the other parts you can see closures used to capture pre-allocated temporaries per function as well.

Ah brilliant, that's definitely useful. Thank you for the pointer

This kind of place orientated programming can make the actual algorithm very hard to follow. I really hope that JS gets custom value types in the future.

But then that damages performance because your objects are always globally visible rather than being able to be optimised away.

Go’s GC is low latency and it allows you to explicitly trigger a collection as well as prevent the collector from running for a time. I would wager that the developer time/frustration spent taming the GC would be more than made up for by the dramatic improvement in ergonomics everywhere else. Of course, the game dev libraries for Go would have to catch up before the comparison is valid.

Why is this downvoted? is it factually wrong? (I don't know Go so I'm asking)

It’s factually correct. My other post got downvoted for pointing out that not every GC is optimized for throughout. Seems like I’m touching on some cherished beliefs, but not sure exactly which ones.

I know this subject quite well and I will later publish a detailed article.

The real run-time cost of memory management done well in a modern game engine written without OOP features is extremely low.

We usually use a few very simple specialized memory allocators, you'd probably be surprised by how simple memory management can be.

The trick is to not use the same allocator when the lifetime is different.

Some resources are allocated once and basically never freed.

Some resources are allocated per level and freed all at once at the end.

Some resources are allocated during a frame and freed all at once when a new frame starts.

And lastly, a few resources are allocated and freed randomly, and here the cost of fragmentation is manageable because we're talking about a few small chunks (like network packets)

+1. We have a large Rust code base, and we forbid Vec and the other collections.

Instead, we have different types of global arenas, bump allocators, etc. that you can use. These all pre-allocate memory once at start up, and... that's it.

When you have well defined allocation patterns, allocating a new "object" is just a "last += 1;` and once you are done you deallocate thousands of objects by just doing `last -= size();`.

That's ~0.3 nanoseconds per allocation, and 0.x nano-seconds to "free" a lot of memory.

For comparison, using jemalloc instead puts you at 15-25 ns per allocation and per deallocation, with "spikes" that go up to 200ns depending on size and alignment requirements. So we are talking here a 100-1000x improvement, and very often the improvement is larger because these custom allocators are more predictable, smaller, etc. than a general purpose malloc, so you get better branch prediction, less I-cache misses, etc.

Do you use any public available crate for those allocators? Would love to take a look. I'm currently trying to write a library for no-std which requires something like that. I currently have written a pool of bump allocators. For each transaction you grab an allocator from the pool, allocate as many objects from it as necessary, and then everything gets freed back to the pool. However it's a bit hacky right now, so I'm wondering whether there is already something better out there.

> Do you use any public available crate for those allocators?

Not really, our bump allocator is ~50 LOC, it just allocates a `Box<[u8]>` with a fixed size on initialization, and stores the index of the currently used memory, and that's it.

We then have a `BumpVec<T>` type that uses this allocator (`ptr`, `len`, `cap`). This type has a fixed-capacity, it cannot be moved or cloned, etc. so it ends up being much simpler than `Vec`.

Seems like lifeguard could solve this: https://github.com/zslayton/lifeguard

What arena types do you use?

If you need to store pointers and want to conserve a bit of memory, perhaps my compact_arena crate can help you.

Is this code base open source? Or do you know of open source rust code like it?

Exactly this. Writing a basic allocator can give you a significant bang for your buck. Hard-coding an application specific arena allocator is trivial. The harder part is being diligent enough to avoid use after free/dangling pointers.

I'm a huge Java nerd. I love me some G1/Shenandoah/ZGC/Zing goodness. But once you're writing a program that to the point that you're tuning memory latency in many games anyway, baking in your application's generational hypothesis is pretty easy. Even in Java services you'll often want to pool objects that have odd lifetimes.

This is exactly what's continually drawing me to Zig for this sort of thing. Any function that requires memory allocation has to either create a new allocator for itself or explicitly accept an existing one as an argument, which seems to make it a lot easier/clearer to manage multiple allocators for this exact use case.

This sounds very interesting. Please do write an article abput the topic! Seems like you would introduce specific allocators for maybe different games.

I buy this - specialized memory allocators, after all I've used mbufs. So which languages would be good or recommended for coercing types when using memory allocators?

C++ or Rust

> a modern game engine written without OOP features

Is there a particular codebase you are thinking of here?

It would be interesting to see what the impact of adding hints about allocation lifetime to malloc would be. I have to suspect someone has already written that paper and I just don't know the terminology to look for it.

I'm reminded of an ancient 8031 C compiler. It didn't support reentrancy. Which seems like a bad thing until you realize that the program could be analyzed as a acyclic graph. The compiler then statically allocated and packed variables based on the call tree. Programs used tiny amounts of memory.

Given the problems people have with Rust and the borrow checker, I would guess that people will not accurately predict how long an allocation will live (it's a similar issue with profiling---programmers usually guess wrong about where the time is being spent).

+1 for that article, especially the part about OOP downsides at scale and batch memory allocation. Love to see more!

as an example point, the Go garbage collector clears heaps of 18gb in sub-millisecond latencies. If I'm understanding the problem at hand (maybe I'm not!), given an engine running at a target framerate of 144 frames per second, you're working with a latency budget of about 7ms per frame. Do you always use all 7ms, or do you sometimes sleep or spin until the next frame to await user input?

We can also look at it from the other direction: if your engine is adjusting its framerate dynamically based on the time it takes to process each frame, and you can do the entire work for a frame in 10ms, does that give you a target of 100 fps? If you tack on another half millisecond to run a GC pause, would your target framerate just be 95 fps?

And what do you do when the set of assets to be displayed isn't deterministic? E.g., an open world game with no loading times, or a game with user-defined assets?

There is no general answer to this question. Frame latency, timing and synchronization is a difficult subject.

Some games are double or triple buffered.

Rendering is not always running at the same frequency as the game update.

The game update is sometimes fixed, but not always.

I've had very awful experience with GC in the past, on Android, the game code was full C/C++ with a bit of Java to talk to the system APIs, I had to use the camera stream. At the time (2010) Android was still a hot mess full of badly over engineered code.

The Camera API was simply triggering a garbage collect about every 3-4 frames, it locked the camera stream for 100ms (not nanoseconds, milliseconds!) The Android port of this game was cancelled because of that, it was simply impossible to disable the "feature".

You didn't have a bad experience with GC in the past, you had a bad experience with a single GC implementation, one which was almost certainly optimized for throughput and not latency and in a language that pushes you toward GC pressure by default. :)

This is an example.

I never worked with Unity myself but I worked with people using Unity as their game engine, they all had problems with stuttering caused by the GC at some point.

You can try to search Unity forums about this subject, you'll find hundreds or thousands of topics.

What really bothers me with GC is that it solves a pain I never felt, and creates a lot of problems that are usually more difficult to solve.

This is a typical case of the cure being (much) worse than the illness.

What is Unity’s STW time? Is it optimized for latency? If not, you’re using the wrong collector. The pain point it solves is time spent debugging memory issues and generally a slower pace of development, but of course if you’re using a collector optimized for throughout, you’ll have a bad time. Further, a low-latency GC will pay off more for people who haven’t been using C or C++ for 10 years than those who have.

It is very nice to say that, in theory, a GC could work very well for performance demanding games. But until someone builds that GC, in an environment where everything else is suitable for games also, it is academic. We can't actually build a game with a theoretical ecosystem.

Of course, this is a theoretical conversation. My point is that you don’t refute the claim, “low latency GCs are suitable for game dev” with “I used a high latency GC and had a bad time”. Nor do you conclude that GCs inherently introduce latency issues based on a bad experience with a high latency GC. There is neither theory nor experiment that supports the claim that GCs are inherently unsuitable for game dev.

"Clearing an 18GB heap" that's full of 100MB objects that are 99% dead is different than clearing an 18GB heap of 1KB objects that are 30% (not co-allocated, but randomly distributed across the whole heap).

this is exactly the kind of dismissive hand-waving that is frustrating. The 18gb heap example is from data collected on production servers at Twitter. Go servers routinely juggle tens of thousands of connections, tens of thousands of simultaneous user sessions, or hundreds of thousands of concurrently running call stacks. We're essentially never talking about 100mb objects since the vast majority of what Go applications are doing is handling huge numbers of small, short-lived requests.


I'm not a game developer, just a programming language enthusiast with probably above average understanding of how difficult the problem this is.

Can you point out in the post where they expand on my point? The only this I see is this:

> Again we kept knocking off these O(heap size) stop the world processes. We are talking about an 18Gbyte heap here.

which is exactly my point - even if you remove all O(heap size) locks, depending on the exact algorithm it might still be O(number of objects) or O(number of live objects) - e.g. arena allocators are O(1) (instant), generational copying collectors are O(live objects), while mark-and-sweep GCs (including Go's if I understand correctly after skimming over your link) are O(dead objects) (the sweeping part). Go's GC seems to push most of that out of Stop-The-World pause, instead it offloads it to mutator threads instead... Also, "server with short-lived requests" is AFAIK a fairly good usecase for a GC - most objects are very short-lived, so it would be mostly garbage with simple live object graph...

Still, a commendable effort. Could probably be applied to games as well, though likely different specific optimisations would be required for their particular usecase. I think communication would be better if you expanded on this (or at least included the link) in your original post.

There are plenty of known GC algorithms with arbitrarily bounded pause times. They do limit the speed at which the mutator threads can allocate, but per-frame allocations can be done the same way they are done in a non-GC'd environment (just allocate a pool and return everything to it at the end of each frame), and any allocations that are less frequent will likely be sufficiently rare for such algorithms.

I know people who have written games in common-lisp despite no implementations having anything near a low-latency GC. They do so by not using the allocator at all during the middle of levels (or using it with the GC disabled, then manually GCing during level-load times).

> Go garbage collector clears heaps of 18gb in sub-millisecond latencies

At the expense of a lot of work from the GC. In production we had 30% of the usage coming from the GC alone…

But for game development the problem isn't going to be the GC anyway: cgo is just not adapted to this kind of tasks (and you cannot avoid cgo here).

Best comment here. I have spent some time evaluating Go for making games. Considering the fact that for 95% of games it doesn't really matter if the language has a GC it could have been a very nice option.

Currently CGO just sucks. It adds a lot of overhead. The next problem is target platforms. I don't have access to console SDKs but compiling for these targets with Go should be a major concern.

Go could be a nice language for making games but with the state it is in the only thing is good for is hobby desktop games.

> as an example point, the Go garbage collector clears heaps of 18gb in sub-millisecond latencies.

The trick here is to double the heap size, which would be completely unacceptable for a game that's making use of what the hardware offers.

Go's GC has latencies that would be good enough for a great many games, but its throughput might become problematic, and in general, as a language it is problematic for the kinds of games that would worry about GC pauses, since FFI with c libraries is very expensive. If that could be solved Go might be very appealing for many many games.

I worked as a games programmer for 8 years in C/C++, and spent an accumulated 2 years just doing optimisation, during the time of 6th and 7th generation consoles. Freeing resources in a deterministic manner is important for the following reasons:

FRAME RATE: having a GC collect at random frames makes for jerky rendering

SPEED: Object pools allow reuse of objects without allocing/deallocing, and can be a cache-aligned array-of-structs. Structs-of-arrays can be used for batch processing large volumes of primitives. https://en.wikipedia.org/wiki/AoS_and_SoA

RELIABILITY: This is probably applicable to the embedded realm too, but if you can't rely on virtual memory (because the console's OS/CPU doesn't support it, or once again you don't want the speed impact) then you need to be sure that allocations always succeed from a fixed pool of memory. Pre-allocated object pools, memory arenas, ring buffers etc. are a few of ways to ensure this.

There's probably a lot more, but those are the reasons that jump out at me.

you can turn off the gc in Go and run it manually. you can also write cache-aligned arrays of structs in Go if you want to. you can allocate a slab and pull from it if you want to. the existence of a GC doesn't preclude these possibilities.

The existence of a GC, even when it can be turned off, does preclude a great many other possibilities, in practice. One issue which is nearly universe, and extra bad in Go, is the extra cost of FFI with C libs, which is necessary in games to talk to opengl, or sdl2, or similar.

If you aren't going to use the GC, then you open up a lot of other performance opportunities by just using a language that didn't have one in the first place.

>you can turn off the gc in Go and run it manually

And if you run the GC manually, you really don't know how long it will take - read: determinism.

> you can also write cache-aligned arrays of structs in Go if you want to

Wasn't this thread about why people don't use GC, not about go? I don't remember.

If you're using an object pool, you're dodging garbage collection, as you don't need to deallocate from that pool, you could just maintain a free-list.

> you can allocate a slab and pull from it if you want to. the existence of a GC doesn't preclude these possibilities

To take it further, you could just allocate one large chunk of memory from a garbage collected allocator and use a custom allocator - you can do this with any language. But you're not using the GC then.

Why pick a language that has a feature you need to immediately turn off? Some people probably want to, but ... why?

Many programmers are very emotional about their language of choice, especially if it is their only one.

I guess you use every feature (insert any language here) offers for every program you write with it?

The answer to your question is probably: because they like the language, are productive in it, know the libraries and the feature can be turned off so it's an option.

If I consistently don't use all the features, I pretty quickly start looking at simpler languages.

> If you have a garbage collector that runs in 200us

The problem is GCs for popular languages are nowhere near this good. People will claim their GC runs in 200us, but it's misleading.

For example, they'll say they have a 200us "stop the world" time, but then individual threads can still be blocked for 10ms+. Or they'll quote an average or median GC time, when what matters is the 99.9th percentile time. If you run GC at every 120 Hz frame then you hit the 99.9th percentile time every minute.

Finally, even if your GC runs in parallel and doesn't block your game threads it still takes an unpredictable amount of CPU time and memory bandwidth while it's running, and can have other costs like write barriers.

Benchmarking a full sweep with 0 objects to free in Julia:

  julia> @benchmark GC.gc()
    memory estimate:  0 bytes
    allocs estimate:  0
    minimum time:     64.959 ms (100.00% GC)
    median time:      66.848 ms (100.00% GC)
    mean time:        67.062 ms (100.00% GC)
    maximum time:     73.149 ms (100.00% GC)
    samples:          75
    evals/sample:     1
Julia's not a language normally used for real time programs (and it is common to work around the GC / avoid allocating), but it is the language I'm most familiar with.

Julia's GC is generational; relatively few sweeps will be full. But seeing that 65ms -- more than 300 times slower than 200us -- makes me wonder.

Test was on an i9 7900X.

Yep. (You know this, but) just as another data point, an incremental pass takes more like 75 microseconds, and a 'lightly' allocating program probably won't trigger a full sweep (no guarantees though).

these aren't theoretical numbers, they're the numbers that people are hitting in production. See this thread wrt gc pause times on a production service at twitter https://twitter.com/brianhatfield/status/804355831080751104 also referenced here, which talks at length about gc pause time distributions and pause times at the 99.99th percentile https://blog.golang.org/ismmkeynote

> they'll say they have a 200us "stop the world" time, but then individual threads can still be blocked for 10ms+

As far as I know this is still true of the Go GC. Write barriers are also there and impact performance vs. a fixed size arena allocator that games often use that has basically zero cost.

It's also important to consider how often the GC runs. GC that runs in 200us but does so ten times within a frame deadline might as well be a GC that runs in 2ms. Then there are issues of contention, cache thrashing, GC-associated data structure overhead, etc. The impact of GC is a lot more than how long one pass takes, and focusing on that ignores many other reasons why many kinds of systems might avoid GC.

>The reality is you still have to free resources, so it's not like the garbage collector is doing work that doesn't need to be done.

Does it? In most game I expect resource management to be fairly straightforward, allocation and freeing of resources will mostly be tied to game events which already require explicit code. If you already have code for "this enemy is outside the area and disappears" is it really that much work to add "oh and by the way you might also free the associated resources while you're at it". I don't need a GC thread checking at random intervals "hey let's iterate over an arbitrary portion of memory for an arbitrary amount of time to see if stuff needs dropping yo!".

I realize that I'm quite biased though because I'm fairly staunchly in the "garbage collection is a bad idea that causes more problems than it solves" camp. It's a fairly extremist point of view and probably not the most pragmatic stance.

One place where it might not be quite as trivial would be for instance resource caching in the graphic pipeline. Figuring out when you don't need a certain texture or model anymore. But that often involves APIs such as OpenGL which won't be directly handled by the GC anyway, so you'll still need explicit code to manage that.

That being said I'd still chose (a reasonable subset of) C++ over C for game programming, if only to have access to ergonomic generic containers and RAII.

I write quite a lot of C. When I do I often miss generics, type inference, an alternative to antiquated header files and a few other things. I never miss garbage collection though (because it's a bad idea that causes more problems than it solves).

It's NOT "a bad idea that causes more problems than it solves", but still may be a bad idea for the (games, real-time, etc) industry.

GC means not only that memory management is simpler, it means that it goes away for most of the programs out there.

Most programmers I know aren't writing code that run Twitter-like servers or avionics, nor are they programming the next Doom game. These people are writing apps and doing back end coding for some big company where real-time isn't an issue and the focus is on getting code out fast, with good average quality and "cheap" labor.

In this case, having a high level language/runtime that doesn't requirs a programmer that can reason about allocations is key.

I can't even begin to tell the kind of codebase that I have seen. Two years ago I was working on a C/C++ legacy system that was thousands of lines of codes and almost every file had memory leaks (that cppcheck could find itself, mind you). Some of them where caused by delivery deadlines, most of them where caused by unskilled employees.

(oh, in case you're wondering: all those leaks where solved by having ten times the hardware and a scheduled restart of the servers)

> In most game I expect resource management to be fairly straightforward, allocation and freeing of resources will mostly be tied to game events which already require explicit code.

Even if you are perfect, you will still fragment over time. At some point, you must move a dynamically allocated resource and incur all the complexity that goes with that--effectively ad hoc garbage collection.

There are only two ways around this:

1) You have the ability to do something like a "Loading" screen where you can simply blow away all the previous resources and re-create new ones.

2) Statically allocate all memory up front and never change it.

I think your (because it's a bad idea that causes more problems than it solves) needs qualification via "in some cases". It certainly is handy and well used in other applications and speeds up development time as well as prevents various developer bugs.

Unity uses/used the standard boehm garbage collector [1] for over a decade, and it has been notorious for causing GC lag in games produced from that engine, noticeable in occasional sudden spikes of dropped framerate while the GC does a sweep at a layer of abstraction higher than Unity game developers can control directly.

People went to extreme measures to avoid allocating memory in their games: manually pooling every in-game object & particle, not using string comparisons in C#, etc https://danielilett.com/2019-08-05-unity-tips-1-garbage-coll...

Unity itself finally has a new system they're previewing to average out the GC spikes over time, so a game, say, never drops below 60fps: https://blogs.unity3d.com/2018/11/26/feature-preview-increme...

As well, there is a new way of writing C# code for Unity called ECS that will avoid producing GC sweeps https://docs.unity3d.com/Packages/com.unity.entities@0.1/man...

[1[ https://github.com/ivmai/bdwgc

That Unity was using that ancient collector was the main complaint. Switching to an incremental GC is long overdue.

As for pooling objects, you'd go to those "extreme measures" as a matter of course in any other language as well. You wouldn't want to alloc and free every frame no matter the language.

Yes. the boehm collector is the root of the problem for Unity's runtime. It's just the verbatim code from that repository in fact. But in other languages, such as C with standard libraries, you can still do common things like comparing stings and calling printf() without inadvertently triggering a dreaded GC sweep. Not so in Unity's C#.

And allocating memory is fine during runtime when you are in control of the allocation and the cleanup, whereas in Unity, the sweeps are fully out of your control, expensive, and will just sometimes happen in the middle of the action

You can just set the GC mode to disabled and run it manually.


That API is new and was just added to 2018.3 in December, around the same time they started previewing the incremental garbage collector and promoting ECS.

From Unity:

> Being able to control the garbage collector (GC) in some way has been requested by many customers. Until now there has been no way to avoid CPU spikes when the GC decided to run. This API allows you disable the GC and avoid the CPU spikes, but at the cost of managing memory more carefully yourself.

It only took them 14 years and much hand wringing from both players and developers to address :)

A similar thing is going on with their nested scene hierarchy troubles, also releasing in 2018.3 with their overhaul to the prefab system, to sort of support what they call "prefabs" having different "prefabs" in their hierarchy without completely obliterating the children. What they have now is not ideal, but they're working on it.

Prior to that, if you made. say, a wheel prefab and a car prefab, as soon as you put the wheels into your car prefab, they lost all relation to their being a wheel, such that if you updated the wheel prefab later, the car would still just have whatever you had put into the car hierarchy originally, which naturally has been the source of endless headaches and hacky workarounds for many developers.

All true. However, even before you could disable the GC, you could ignore it as long as you weren't allocating and hit framerate. Unity isn't magic but the way people talk it's like Unity games don't exist.

But things like this would happen, even from acclaimed developers, giving the engine a bad reputation among players:

> The frame-rate difficulties found in version 1.01 are further compounded by an issue common with many Unity titles - stuttering and hitching. In Firewatch, this is often caused by the auto-save feature, which can be disabled, but there are plenty of other instances where it pops up on its own while drawing in new assets. When combined with the inconsistent frame-rate, the game can start to feel rather jerky at times.

That studio is part of Valve now!

> Games built in Unity have a long history of suffering from performance issues. Unstable frame-rates, loading issues, hitching, and more plague a huge range of titles. Console games are most often impacted but PC games can often suffer as well. Games such as Galak-Z, Roundabout, The Adventures of Pip, and more operate with an inherent stutter that results in scrolling motion that feels less fluid than it should. In other cases, games such as Grow Home, Oddworld: New 'n' Tasty on PS4, and The Last Tinker operate at highly variable levels of performance that can impact playability. It's reached a point where Unity games which do run well on consoles, such as Ori and the Blind Forest or Counter Spy, are a rare breed.



Hopefully as the engine continues to improve dramatically, this kind of thing will be left in the past

If you continue reading that article, they go on to say Unity is not to blame...

There's a lot of reasons shipping Unity games is hard but the GC and C# are not among them or at least much lower than, say, dealing with how many aspects of the engine will start to run terribly as soon as an artist checks a random checkbox.

> In its current iteration, console developers familiar with the tech tell us that the engine struggles with proper threading, which is very important on a multi-core platform like PlayStation 4.

> This refers to the engine's ability to exploit multiple streams of instructions simultaneously. Given the relative lack of power in each of the PS4's CPU cores, this is crucial to obtaining smooth performance. We understand that there are also issues with garbage collection, which is responsible for moving data into and out of memory - something that can also lead to stuttering. When your game concept starts to increase in complexity the things Unity handles automatically may not be sufficient when resources are limited.

Almost a decade ago I was writing a desktop toolkit in Lua 5.1 and gdk/cairo bindings. Let’s say for fun, because it never seen the light in planned business. But it had animations and geometry dynamics (all soft, no hw accel). While GC seems to be fast and data/widget count was small, it suddenly froze every dozen of seconds for a substantial amount of time. First thing I tried was to trigger a [partial] collection after every event, but what happened was (I believe) incremental^ steps still accumulated into one big stop. Also I followed all critical paths and made allocations static-y. It got better, but never resolved completely. I didn’t investigate it further, and my hw was far from bleeding edge, thus no big conclusion. But since then I’ll think twice before doing something frametime-critical with GC. Each specific implementation requires a handful of wild tests at least before considering as an option in my book.

As others probably already mentioned, worst side of gc is that it is unpredictable and hard to be forced in a way that matches your specific pattern. With manual/raii mm you can make pauses predictable and non-accumulating collection debt and fit before “retrace” perfectly. Also simply relying on automatic storage in time critical paths is usually “can not” in gc envs.

^ if any, I can’t recall if 5.1 actually implemented true incremental gc back then

>worst side of gc is that it is unpredictable

This is simply not the case. Its still just code after all. The problem was you were fighting the GC but that's just the symptom. The clear problem was leaking something every frame. With all the tooling these days its pretty easy to see exactly what is getting allocated and garbage collected so you know where to focus your efforts.

I started with C actually, but good ui is hard, and I decided to use Lua for a reason of not writing it all in C and hand-optimized style. I was forced to write ugly non-allocating code anyway, which defeated that idea completely. To focus your efforts, there must be a reason for these efforts in the first place.

But you must understand that lots of games are able to make this work. Call of Duty, for example, uses Lua for its UI scripting and is able to hit framerate. WoW uses Lua as well. Its not impossible or even impractical. There's plenty of reason to use a higher level language even if your core loops are hand optimized.

>> The reality is you still have to free resources,

Not exactly. Here is how the early PC 3D games I worked on did that: They would have a fixed size data buffer initialized for each particular thing you needed a lot of, such as physics info, polygons, path data, in sort of a ring buffer. A game object would have a pointer to each segment of that data it used. If you removed a game object you would mark the segment the game object pointed to as unused. When a new object was created you would just have a manager that would return a pointer to a segment from the buffer that was dirty that the new object would overwrite with data. Memory was initialized at load and remained constant.

One problem with doing things like that is that you would have fixed pool. So there were like 256 possible projectiles in Battlezone(1998) in the world at any time and if something fired 257th an old one just ceased to exist. Particles systems worked that way as well.

What was good about that was that you could perform certain calculations relatively fast because all the data was the same size and inline, so it was easy to optimize. I worked on a recent game in C# and the path finding was actually kind of slow even though the processor the game ran on was probably like 100 times (or more) faster. I understand there are ways to get C# code to create and search through a big data structure as fast as the programmers had to do it in C in the 90's. However it would probably involve creating your own structures rather than using standard libraries, so no one did it like that.

> So there were like 256 possible projectiles in Battlezone(1998)

Just have to say, loved that game to bits.

I think that, the thing I know for sure was there was 1,024 total physics objects, which was tanks and pilots active in the world at any time. So if you built a bunch of APCs and launched them all at a target at the same time, at some point you wouldn't be able to spawn soldiers. No one seemed to mind because in those days the bar was lower.

Not a game developer, but I used to write UI addons for World of Warcraft. WoW allows you to customize your UI heavily with these Lua plugins ("addons") and Lua is garbage collected. It's a reasonable incremental GC so it shouldn't be too bad in theory.

But in practice it can be horrible. You end up writing all kinds of weird code just to avoid allocations in certain situations. And yeah, GC pauses due to having lots of addons is definitely very noticable for players.

Also leads to fun witch hunts on addons using "too much" memory, people consider a few megabytes a lot because they confuse high memory usage with high allocation rate... Our stuff started out as "lightweight" but it grew over the years. We are probably at over 5 MB of memory with Deadly Boss Mods (many WoW players can attest that it's certainly not considered lightweight nowadays, but I did try to keep it low back then). But I think we still do a reasonable job at avoiding allocations and recycling objects...

The point is: I had to spent a lot of time thinking about memory allocations and avoiding them in a system that promised me to handle all that stuff for me. Frame drops are very noticable. But there are languages with better GCs than Lua out there...

Somewhat related: I recently wrote about garbage collectors and latency in network drivers in high-level languages: https://github.com/ixy-languages/ixy-languages Discussed here: https://news.ycombinator.com/item?id=20945819

That's a scenario were short GC pauses matter even more as the hardware usually only buffers a few milliseconds worth of data at high speeds.

I’ve shipped C++ games and Unity games. I have never spent more time managing memory than in C#. You have to jump through twisted, painful hoops to avoid the GC.

Memory management is ultimately far simpler and easier in C++ than in C#.

I do gamedev in Unity and avoiding garbage creation feels easy enough to me for the most part.

It could be worse. Once upon a time a common "trick" was to avoid using foreach loops because they generated garbage. LINQ also generated garbage. I'm not sure if the .net 4.5 upgrade fixed that one.

Ultimately you wind up doing the exact same thing in C# that you'd do in C++. Aggressive pooling, pre-allocation, etc. A handful of Unity constructs generate garbage and can't be avoided. I believe enable/disabling a Unity animator component, which you'd do when pooling, is one such example.

It's all just a little extra painful because you to spend all this time avoiding something the language is built around. Trying to hit 0 GC is annoying.

It also means, somewhat ironically, is that GC makes the cost of allocation significantly higher than a C++ allocation. In C++ your going to pay a small price the moment you allocate. But you know what? That's fine! Allocate some memory, free some memory. It ain't free, but it is affordable.

In Unity generating garbage means at some random point in the future you are going to hitch and, most likely, miss a frame. And that's if you're lucky! If you're unlucky you may miss two frames.

yeah the Go garbage collector is a LOT more sophisticated than Lua. I might be playing WoW Classic right now ...

> I love this opinion from games programmers because they never qualify it and talk about what their latency budgets are and what they do in lieu of a garbage collector.

Bro, game devs talk about this non-stop. There are probably 1000 GDC talks about memory management.

Game devs don't spell the fine details because they are generally talking to other game devs and there is assumed knowledge. Everyone in games knows about memory management and frame budgets.

> If you have a garbage collector that runs in 200us, you could run a GC on every single frame and use less than 3% of your frame budget on the GC pause.

And if pigs could fly the world would be different. ;)

Go has a super fast GC now. It had a trash GC (teehee) for many many years. But Go is not used for game dev client front-ends. If C# / Unity had an ultra fast GC that used only 3% of a frame budget that would be interesting. But Unity has a trash GC that can easily take 10+ milliseconds. (Their incremental GC is still experimental.) It's a problem that literally every Unity dev has to spend considerable resources to manage.

For 50+ years GC has been a poor choice for game dev. Maybe at some point that will change! The status quo is that GC sucks for games. The onus is on GC to prove that it doesn't suck.

Unity is frustrating because the regular C# CLR (and Mono) have a much better generational GC with fast Gen 0 and Gen 1 collections. Unity's crappy GC is always slow.

For games generally you want zero allocations or frees during gameplay. For a GC'd language you want to avoid the collector running during gameplay.

Having done both they are remarkably similar. Lots of pooling and pre-allocation. Most GC based languages are a bit harder both because they tend not to give you a chunk of memory you can do whatever with and require a lot of knowledge about which parts will allocate behind your back. It's also harder to achieve because you typically get no say when the collector will run.

There are loads of commercial games that pay no attention to this though written in both kinds of language.

Here are some common alternatives I've seen and used:

1) Don't allocate or free resources during gameplay. Push it to load time instead. This works for things like assets that don't change much.

2) Use an arena that you reset regularly. This works well for scratch data in things like job systems or renderers.

3) Pick a fixed maximum number of entities to handle, based on hardware capacity, and use an object pool. This works well for things that actually do get created and destroyed during gameplay, on a user-visible timescale.

Together, these get you really far without any latency budget going toward freeing resources. And there is always something else you could put that budget toward instead, which is why any amount of GC is often seen as a waste.

I used to work on realtime graphics code which was used in flight simulators for the FAA and USAF, then moved into the games industry and led the design of a couple of game engines, to give you an idea where I'm coming from. The military grade flight sims actually had contract requirements like, "you may never miss a single frame in a week of usage", etc, so solving this problem was critical.

When working on code such as game engines, it's not simply a matter of how long something takes to do, but also when it happens and how predictable it is. If I knew that GC takes 1ms, I could schedule it just after I call swapBuffers and before I begin processing data for the next frame, which isn't a latency critical portion.

The problem is that GC is unpredictable because the mark and sweep can take a long time, and this will prevent you from meeting your 60fps requirement.

In practice, we hardly ever do any kind of GC for games because we don't really need to. We use block memory allocators like Doug Lea's malloc (dlmalloc) in fixed memory pools if we must, but generally, you keep separate regions for things like object descriptors, textures, vertex arrays, etc. There's a ton of data to manage which can't be interleaved together in a generic allocator, so once you've gone that deep, there's really no point in using system malloc.

Malloc itself isn't a problem either, it's quick. It's adding virtual space via sbrk() and friends which can pause you, so we don't. On consoles, we have a fixed memory footprint, and on a bigger system, we pre-allocate buffers which we will never grow, and that's our entire footprint. We then chunk it up into sections for different purposes, short lived objects, long lived objects, etc. Frequently, we never even deallocate objects one by one. A common tactic is to have a per-frame scratch buffer, and once you're done rendering a frame, you simply consider that memory free - boom, you've just freed thousands of things without doing anything.

There are many things you can do instead of generic GC which are far better for games.

I disagree with the author of the original article about C++. C++ is as complex as you want to make it, you don't have to use all the rope it hands you to hang yourself. However, having the std library with data structures, smart pointers, the ability to do RAII, is invaluable to me. Smart usage of std::ref gets you automatic GC, what you don't have is cycle detection, so you take care never to have cycles by using weak pointers where necessary, and you have all the behavior of auto-GC without stop the world.

> C++ is as complex as you want to make it

I second that. I always find the attitude of “C++ is bad so I’m going to stick to C” really bizarre. You can use C++ as a better C.

- use type inference and references instead of pointers. Writing C style code with these features makes it more readable.

- Don’t like OO programming. Stick to struct with all members public. It’s going to be a lot better than doing the same thing in C and you shall not have void* casts.

- use exceptions instead of return code for errors, so that your code is not peppered with if statements at every line you call. With an exception you’ll get additional info about crashes in dumps.

I wrote for an embedded system where the C++ standard library was not available, in a previous life. I ended up writing my code in C++ and “re-inventing” a couple of useful classes like std::string and std::vector. For the most part my code was very C like...

> use exceptions instead of return code for errors

I happen to have the exact opposite opinion. Exception handling tends to feel too "magical" (read: non-deterministic, hard to behaviorally predict, etc.) relative to just returning an error code.

Exception handling doesn't work very well. It requires exception frames on the stack, and those go missing sometimes, leading to leaked exceptions and unexpected crashes. For example, if you pass a C++ callback function into a C callback, say in libcurl, the C library doesn't have exception handler frames in the stack, so exceptions are lost. If your C++ code throws from within that C callback, any catch above the C language boundary won't work. This is just one of countless gotchas. Don't use exceptions.

Don't pass functions to libcurl that throw.

Otherwise, exceptions make code simpler, cleaner, and more reliable. They will never "go missing" unless you do something to make them go missing. Code that does those things is bad code. Don't write bad code. Do use exceptions.

Well, of course, that's how you fix such a bug, however, when using third party libraries and other people's code, you don't necessarily know that this is happening until you have missing exceptions.

In my opinion, exceptions also complicate code by taking error handling out of the scope of the code you are writing, since you can't know whether anything you call throws, so you may not catch, and some higher level code will catch an exception, where it's lost context on how to handle it. If exceptions work well in a language; Python, Java, by all means use them, but in C++, they've caused too many problems for me to continue doing so. Even Google bans them in their C++ style guide.

If you don't know what functions you are passing to third-party libraries, you have a much bigger problem than any language can help you with.

Google's proscription on exceptions is purely historical. Once they had a lot of exception-unsafe code, it was too late to change. Now they spend 15-20% of their CPU cycles working around not being able to use RAII.

When you write modern software, most of your executable ends up being third party code for commonly done things, it's both wonderful since you can move fast, and a curse, because you inherit others' bugs.

My example with libcurl was very simple, but in real systems, things are less clean. You may be using a network tool that hides libcurl from you, and your code is being passed into a callback without you even knowing it until it causes a problem. Other places where C++ exceptions fail would be when you dlopen() a module, or one of your binary C++ dependencies is compiled without exception handlers. The code will compile just fine, but exceptions will be lost, there's not so much as a linker warning.

Google uses RAII just fine, it's just you have to be careful with constructors/desctrucors since they can't return an error. There's no way it burns 15-20% CPU not using RAII - where are you getting these numbers? I used to work there, in C++ primarily, so I'm well familiar with their code base.

In other words, you can't do RAII.

Instead you call a constructor that sets a dumb default state, and then another, "init()" function, that sets the state to something else, and returns a status result. But first it has to check if it is already init'd, and maybe deinit() first, if so, or maybe fail. Then you check the result of init(), and handle whatever that is, and return some failure. Otherwise, you do some work, and then call deinit(), which has to check if it is init'd, and then clean up.

I knew someone else who worked at Google. 15-20% was his estimate. Bjarne ran experiments and found overheads approaching that.

> Don't pass functions to libcurl that throw.

Unless you know the details of how every function (and every function that function calls, and so on) handles errors, the easiest way to not pass functions to libcurl that throw is to not write those functions in a programming language with exceptions :)

That is easy only if you are writing nothing but callbacks. Every single other thing gets harder.

> Exception handling tends to feel

do you program a lot with your feelings ?


> The reality is you still have to free resources, so it's not like the garbage collector is doing work that doesn't need to be done.

As others have pointed out, this isn't true for many video games. People write things so they're allocated upfront as much as possible and reuse the memory. A lot of old games had fixed addresses for everything as they used all available memory (this allowed for things like Gameshark cheats).

But there's a secondary issue. A GC imposes costs for the memory you never free. It doesn't know the data will never be freed, so it'll periodically check to see if the data is still reachable. Some applications have work arounds for this like using array offsets instead of pointers/references (fewer pointers to follow).

The last time I encountered GC issues in gamedev, we were seeing 100ms GC pauses (that's 6 dropped frames at 60fps!) every 30 seconds or so. We had poor insight into why, and what we could glean of "why" was that it was a widespread problem with how our designers were prototyping stuff in JS (more specifically, they were writing it as regular JS, instead of jumping through hoops to try and relieve GC pressure through pooling objects) with no clear single cause to blame.

Other issues I've seen include nondeterministic GC sometimes failling to garbage collect one level before we loaded another, OOMing the game. Have you ever forced a GC three times in a row to try and ensure you're not doubling your memory use on a memory constrained platform? I have. (This was exacerbated by cycles between the GCed objects and traditionally refcounted objects - you'd GC, which would run finalizers that would decref, which in turn would unroot some GCed objects allowing them to be collected, which in turn run more finalizers, which would decref more objects, ...)

The last time I encountered a similar (de)alloc spike in a non-GC gamedev system was much longer ago, and it was a particular gameplay system freeing a ton of graphics resources in a single frame, when doing something akin to a scene transition or reskinning of the world - which was easily identified with standard profiling tools, and easily fixed in under a day's work by simply amortizing the cost of freeing the resources over a few frames by delaying deallocs with a queue. More commonly there were memory leaks from refcounted pointer cycles, but those also generally pretty easily identified and fixed with standard memory leak detection tools.

The problem isn't GC per se. It's that most/all off-the-shelf language-intrinsic GCs are opaque, unhookable, uncustomizable, unreplacable black boxes which everything is shoved into. malloc/free? Easily replaced, often hookable. C++'s new/delete? Overloadable, easily replaced, you have the tools to invoke ctors/dtors yourself from your own allocators. Localized GC for your lock-free containers ala crossbeam_epoch? Sure, perfectly fine. Graphics resource managers often end up becoming something similar to GCed systems on steroids, carefully managing when resources are created and freed to avoid latency spikes or collecting things about to be used again soon.

But the GCs built into actual languages? Almost always a pain in the ass.

> The problem isn't GC per se.

In other words, the problem is GC, full stop.

Thing is, I've helped ship my share of GCs in games, and they're not always a problem.

UI solutions often/usually use some kind of GCed language for scripting. But the scale and scope of what UIs are dealing with are small enough that we don't see 100ms GC spikes.

Our tools and editors leverage GCed languages a lot - python, C#, lua, you name it - and as long as their logic isn't spilling into the core per-frame per-entity loops, the result is usually tollerable if not outright perfectly fine. We can afford enough RAM for our dev machines that some excess uncollected data isn't a problem.

And with the right devs you can ship a Unity title without GC related hitching. https://docs.unity3d.com/Manual/UnderstandingAutomaticMemory... references GC pauses in the 5-7ms range for heap sizes in the 200KB-1MB range for the iPhone 3 [1]. That's monstrously expensive - 1/3rd of my frame budget in one go at 60fps, when I frequently go after spikes as small as 1ms for optimization when they cause me to miss vsync - but possibly managable, especially if the game is more GPU-bound than CPU-bound. It certainly helps that Unity actually has some decent tools for figuring out what's going on with your GC, and that C# has value types you can use to reduce GC pressure for bulk data much more easily.

[1] Okay, these numbers are pretty clearly well out of date if we're talking about the iPhone 3, so take those numbers with a giant grain of salt, but at the same time they sound waaay more accurate than the <1ms for 18GB numbers I'm hearing elsewhere in the thread, based on more recent personal experience.

Right, well-contained GC in non-critical threads is not a problem. Once you get it trying to scan all of memory, your caches get corrupted, and only global measurements can tell you what the performance impact really is.

GC enthusiasts always lie about their performance impact, almost always unwittingly, because their measurements only tell them about time actually spent by the garbage collector, and not about its impact on the rest of the system. But their inability to measure actual impact should not inspire confidence in their numbers.

One of the key other things that surprisingly hasn't been mentioned is it's super important to control memory layout.

GC pause times are relevant but not critical. What's critical is that I can lay out my objects to maximize locality, prefetching, and SIMD compatibility. Look up Array of Structs vs Struct of Arrays for discussions on aspects of this.

This is not strictly incompatible with a GC in theory, however it's common that in GC'd languages this is either very difficult or borderline impossible. The JVM's lack of value types, for example, makes it pretty much game over. Combining a GC with good cache locality and SoA layout is possible, but it doesn't really exist, either. Unity's HPC# is probably the closest, but they banned GC allocations to do it.

You somewhat explained exactly why a garbage collector can't work. Just because you need to free up memory, doesn't mean a garbage collector is the only solution. There's a big difference over having control of when to release the memory, or having no control of that. For instance, if you know that there are no user inputs or any kind of interactive things going on, you can afford to take a couple millisecond hit freeing memory.

By game programmers you probably mean AAA console game programmers. I was once one, and I know that those games have to account for every byte of memory. Garbage collection has not only cpu overhead which can be unpredictable, it also requires memory to potentially sit around unused waiting for the next cycle. Your question is what is our budget for gc in terms of cpu work and memory overhead, and the answer is zero for both.

If you cannot be certain how long garbage collection will run, it is almost certainly a bad idea to have it in a game.

Most times it may run in 200 micro-seconds, but the one time it takes 20ms the user suffers from unacceptable stutter.

I can see how people would want to bypass the problem entirely by being a bit more careful up front.

> The reality is you still have to free resources, so it's not like the garbage collector is doing work that doesn't need to be done.

The garbage collector also needs to track resources, which is an additional cost over just freeing them. You have little control over how the memory is allocated, which is an additional cost over a design that intelligently uses different allocation strategies for different types of resources. Then, even if you can control when the garbage collector is invoked, you have little control over what actually gets freed. What good are those 200µs if the stuff eating up memory isn't actually getting freed fast enough?

Maybe people often overestimate their performance needs. A garbage collector may be fast enough for more purposes than anticipated. Even then, managing memory intelligently may seem like a small price to pay compared to the prospect of eventually fighting a garbage collector to get out of a memory bottleneck.

I just want to say that no, you don’t need to free the resources. It's very possible to use a fix amount of heap memory during the whole lifecycle of your game. In fact I think that’s what people should aim for in 90% of the cases.

You can do the same thing in GC'd languages, though.

Here's the easy question: can you do manual GC in a GC'd environment? Yes, of course you can.

Here's the harder question: how hard is it to do manual GC in a GC'd environment vs. a non-GC'd environment?

"Environment" is the key word here. Because if you're writing in a GC'd environment, there's a good chance that existing code - third party, first party, wherever it comes from - assumes that it can heap allocate objects and discard them without too much thought.

So for small scale optimizations where you own it all, it can work out fine. But if that optimization needs to bust through an abstraction layer, all of a sudden the accounting structure of the whole program has changed, and the optimization has turned into a major surgical operation.

This was almost exactly my experience when I wrote a network simulator in Go and tried to improve the performance. The gc pauses reached tens of seconds haha, I made a lot of garbage.

I'm not completely sure if this is true but I think that doing things in an "OO" style where for example, every different event was it's own type which satisfies the Event interface basically means that different events can't occupy an array together and that I think that each one may hold a pointer to some heap allocated memory, so you really can't optimise this away without ripping up the entire program.

Rather than do so, I ended up running my Sim on a server with >100 cores, it was single threaded but they would all spin up and chomp the GC, a beautiful sight.

Another factor is just the general lack of transparency or knowability of where and how objects occupy memory in these languages.

If memory management is likely to be a concern it is absolutely much easier in an environment where it is prioritized than one where it is ignored.

In which the garbage collector serves no advantage; only as a trap you have to remind yourself to step over.

It serves the huge purpose of doing 100% of the tracking work. Whats easier? Manually tracking all of your allocations and frees as well as worrying about double frees and pointer ownership or calling GC.Collect() during a load screen?

What's easier? Knowing that you can use all your memory again after you've explicitly freed the resources you used, or knowing that you can use all your memory again after a call to GC.Collect() does god-knows-what?

I think the garbage collector is easier over all, obviously.

> The reality is you still have to free resources

Err not so fast there. It's quite common in C to allocate an array of objects (not pointers) and reuse them as they expire. Memset is enough to reinitialize them.

And the the main point, lot's of game dev is "C wrapped in C++", game devs tend to rewrite everything for performance and predictability, relying on STL is usually a nope.

I had the very same thought. It is unsubstantiated. But if you consider how a game actually uses memory, it is really not suited for your run-of-the-mill GC: Most assets are allocated once per level/scene/whatever and have a pretty well-defined lifetime. What's left happily lives on the stack frame. There is may some room for reference counting of some structures (think multiplayer et. al. where the engine does not have full control of the flow). But in general, a GC can provide very little here, at quite some cost.

In particular I know that Go's GC is optimized for very low latency (rather than throughput), and is not a stop the world GC. So I'm wondering why it doesn't work for games? Are the pauses still too high for games, or is he missing something? Benchmarks or concrete results in Go would be great here.

Edit: specifics from https://blog.golang.org/ismmkeynote - Go hugely improved GC latency from 300ms before version 1.5, down to 30ms, then again down to well under 1ms but usually 100-200us for the stop the world pause. So I guess it is stop the world but the pauses seem ridiculously small to me. They guarantee less than 500 microseconds "or report a but", which seems more than fast enough for game framerates (16ms for 60Hz frames). Am I missing something?

The thing with games compared to regular services is that they start with the bar of rigor a bit higher than normal in all respects.

GCs really let you trivially not care about a LOT of things... Until you need to care. Things like where and when you allocate, the memory complexity of a function call, etc. With games you need to care about all that stuff so much sooner.

Once you're taking the time to count/track allocations anyway you might as well just do it manually. It just codifies something your thinking about anyway.

Short stop the world is well and good, but some of the performance impact is pushed to the mutator via write barriers. That can be another can of worms.

maybe this is an ignorant question, but why do people always cite GC latencies in seconds? is there an implied reference machine where the latency is 300ms? I would expect it to vary a lot in the wild based on cpu frequency and cache/memory latency. is there some reason why this doesn't matter as much as I think it does?

How else is it going to be cited?

People understand seconds, and any other measurement would require specifying a lot of computer-specific stuff, and if you're going to do that, then you might as well fully specify the workload, too, to answer those questions before they come.

It isn't meant to be a precise answer, it's meant to put the GC performance broadly in context.

People don’t write games in GC languages, so GC language developers don’t optimize for games, so people don’t write games in GC languages. It’s just a vicious cycle. It never breaks because for big projects there is too much financial risk.

Also, because everybody who tries optimizing a GC language for games fails, badly. And, GC language developers know that if they promote their language for games, any game developers (temporarily) taken in will hate them forever, and bad-mouth their language.

Languages not used for anything anyone cares much about don't attract much bad press. There are reasons why games are written the way they are. It is not masochism.

> I love this opinion from games programmers because they never qualify it and talk about what their latency budgets are and what they do in lieu of a garbage collector.


You don't garbage collect. Aaah game I worked on had p much every object type pre allocated at boot. Things ran until you hit a loading point. At which point you wrote new object data over the existing objects. This isn't garbage collection technically but it's the same thing. Ever wonder why some games have shit load times? Cause that's when the equivalent of gc is happening. Except like everything else in games it's coded by the engine team instead of the language implementer and so is hooked into and looks different. The game knows when it can go to shit and all of the optimizations have been done pre launch usually (if it's performant).

Basically think about how you can either build a function as recursive and leverage the stack that is built into the programming language or you can write the exact same algorithm in a for loop with a stack object and manipulate the flow of the recursion yourself. they're essentially the exact same thing however one is in complete control of the application writer and the other one is in control of the language. if you use the for lube you can actually say I'm going to run out of memory before you hit the stack limit and actually do something smart it's much harder to do this in a programming language with a built-in recursion. for instance you might approach the stacked limits in the for loop and just returned the current answer.

essentially this analogy goes for the difference between garbage collection in many games and garbage collection in garbage collected languages

The problem with GC is that releasing memory is not deterministic. With a tracing GC, you don't know exactly when a pause will happen, and you don't know how long it will take.

Static lifetime management, by contrast, gives the same benefits to memory errors that static typing does to type-mismatch errors: you know, at compile time, how long an object will live and when it will be released. And armed with this knowledge, you can then more effectively profile and optimize.

There are, however, much better ways to go about it than C provides. RAII in C++ and Rust and even ARC in the iOS runtimes allow you to get most of the benefits of automatic memory management while still providing strict, deterministic lifetime guarantees.

> The reality is you still have to free resources, so it's not like the garbage collector is doing work that doesn't need to be done.

This is quite blatantly false. The reality is that every single time your GC pass traverses an object but doesn't free it, it's wasting time doing work that didn't need to be done.

> 200us

Is the claim that 200us would be the worst case time for the GC to complete its work?

Is this worst case measurement one that the language itself guarantees across all platforms targeted by the game?

If you haven't measured the worst case times, and the system you are using wasn't designed by measuring worst case times, then we're not yet speaking the same language.

You can choose when and how and if to free resources rather than having it be rather opaque with the GC, that is the key.

Though people who make games sometimes overstate the importance of this. Minecraft is arguably the most successful game of all time despite GC pauses. A competitive shooter like CS:Go, however, can't have that.

You're right, this is a good discussion! Unity introduced an incremental GC that splits the work over several frames, essentially performing an entire sweep "for free"


I have mad respect for all using low level C / Assembly / Rust in gamedev. But I have a hard time recommending anything other than C# / Unity for secondary school 14-15 year olds just starting their journey. The wealth of resources, tutorials and community online is astounding. And as an entrypoint into VR / AR it's difficult to top ;)

"The reality is you still have to free resources, so it's not like the garbage collector is doing work that doesn't need to be done."

It's common to use stack and pool memory allocators, where this becomes no longer expensive work.

The two most popular engines (Unity and Unreal) both have a GC for “front end” gameplay code. The trade-offs are real: manual resource management would be incredibly complex for large modern games (think open world, multiplayer), but GC spikes are a common issue in both engines.

The places where you generally don’t want a GC are with low level engine code, like the renderer - code that might execute hundreds or thousands of times a frame, where things like data locality become paramount. GC does you zero favors there.

"What they do instead of GC" - with some languages there is very little need for GC. Sometimes it's just 0, sometimes it's only certain limited kinds of allocations, and those can have scheduled releasing/cleaning/GC-like activity.

See also: https://stackoverflow.com/q/147130/1593077

You can just use a GC that doesn't collect until you tell it to and just call it during load screens and preallocate all your object pools. Its not that bad.

The trick is using libraries and techniques that only stack allocate or use your pools. These are techniques you'd almost certainly use without a GCed language but somehow people consider using C before using these techniques is a higher level language.

I’m not a game programmer but with RAII for example (I realize this is not a C idiom), it really becomes a non-issue if you have things scoped properly. For everything that doesn’t fit into this box you probably wouldn’t be relying on a GC anyway, the GC may do the final memory reap but scope control is still often manual in memory-managed languages.

That's not entirely true. If you only rely on RAII you may end up with millions of shared pointers, each of which needs to be managed independently and having its own overhead. They may also be independently allocated, which means they will be fragmented in memory. Not so simple.

I think it’s safe to say for the general case that if you have millions of widely-shared smart pointers you are not following RAII principles. At that point you are left with poorly-implemented reference counting, with a ton of synchronization overhead.

Which RAII principle would you say would be broken in this scenario?

If the lifetime of your objects are not bound to any scope, I’d say you’re well out of the realm of any real benefit that RAII would provide over GC. This is a continual problem that arises with a shared ownership model. The specific principle this violates is encapsulation - there is still encapsulation at the level of the smart pointer, but not with its consumers.

Indeed, if you have a noticeable number of shared pointers, you are Doing It Wrong. A few here and there at top level are harmless, but if they have leaked into public interfaces or low-level infrastructure, you might better pitch the whole codebase and start over.

Java has a pauseless GC called Shenandoah, and Zing JVM from Azul systems also has pauseless ZGC (I believe Zing implemented it first). What most people don't realize is that eg. Shenandoah can slow down the allocation rate to keep GC in the time budget.

I think people prefer C and C++ because they can control the memory layout of the app, and specially in console games they can do much more low-level optimisations than using a higher level language. Besides, I'm guessing the pain in C++ doesn't come from memory allocation, you probably follow a well defined ruleset for freeing objects and you know you will be OK.

> what I find so deeply frustrating is that the argument that GC can't work in a game engine is never qualified

The program having a say in how the GC behaves would be one requirement IMO.

> you could run a GC on every single frame and use less than 3% of your frame budget on the GC pause

Could I? In what languages?

not sure about other managed runtimes but in Go you can disable the automated GC scheduling and schedule it yourself. The GC runs in sub-millisecond time. https://blog.golang.org/ismmkeynote

the point isn't "Go is good enough", because I don't know if it is because the requirements are not stated particularly clearly. If the GC took zero time, duh, it would work and people could use it, it would save them some error cases and make their programming lives easier. If the GC took 3 seconds, it doesn't work. There exists some latency value under which a GC is fast enough that there's no sane argument for -not- using it. What is that value? That's the question at hand.

The question is control; not just total latency, but when exactly it occurs.

Well, most game engines tend to have a scripting language of some sort and that's usually GC'd. The reason that works ok is usually if you're embedding a language you can control how much time is spent on GC (and in the language itself). Doing an entire game from scratch in a memory managed language, it's not so much that GC is slow, it's that it lacks adequate control. (I don't think it's a coincidence that a lot of the popular runtimes -- python for instance -- tend to be reference counted with GC mostly reserved for cleaning up cycles.) If you're concerned with performance, it's not so much that GC _can't_ be fast, it's that you really don't want to give up control on that sort of thing because GC spikes are really annoying to fix.

> latency budgets

The problem with gc is that it can blow out a frame's budget, and perhaps extend for more than one frame, taking away precious nanosecs, dropping a 60fps down to 15.

> what they do in lieu of a garbage collector.

Use reference counting. If in the rare case you need circular references then you use a weak pointer, or reclassify your structure so that it's not circular.

That said, where and how you manage the heap can be critical. I recall a few places where I've used the stack itself as a temporary (small) heap, which all gets cleaned up when you exit the function, taking up almost zero clocks to do so.

Reference counting is a form of gc though. It even comes with some nondeterminism in the form of release cascades. Usually performs worse than heap analyzers too, due to hitting atomic counters a lot. The lack of heap overhead and simplicity of implementation are the strong points.

As a gamedev i would LOVE to give the gc 1ms per frame if it would be hard bounded and allocs are guaranteed not to stall.

Go is honestly probably not a great language for game development for other reasons but the GC could likely clear that goalpost. https://blog.golang.org/ismmkeynote

Just a quick note, in such contexts, what is meant actually is a non-deterministic GC.

Could you maybe phrase this in a less snarky and condescending tone? It's a great comment if you take out only the first two sentences.

Indeed, and also what about the effect of multiple cores? You could have 7 cores working on game logic and one doing GC.

> Indeed, and also what about the effect of multiple cores? You could have 7 cores working on game logic and one doing GC.

Note that those independent cores all share common resources. L3 cache & memory bandwidth are both shared, and a GC workload slams both of those resources pretty heavily. So it's still going to have an impact even though it's running on its own core.

That sounds like a massive synchronization nightmare and performance overhead that'll probably not be worth it.

not sure that cpu affinity is trivially solveable to avoid mutex contention here, since game entities can often mutate one another when they interact. how do you determine which entities are computed by which cores to minimize the cost of synchronizing the work between the cores?

One strategy might be to correlate CPU affinity with spatial proximity. That is: two entities that are close together are more likely to mutate one another than two entities far apart, so the two close together should be on the same CPU/core/thread and the other should be on a different thread (probably with other entities closer to it).

Even Unreal Engine has garbage collection for UObjects. I love C but I don't think it's necessarily the best language for games programming anymore, I believe C++ is more suited since games fit into the OO paradigm so much better. I am only a hobbyist though so my opinion probably doesn't matter.

EDIT: Unreal Engine is primarily written in C++, and has garbage collection. I'm not salty about the downvote, but I would like to understand why what I have said is apparently incorrect.

You're not wrong about GC becoming more common in games, as you say Unreal has a GC and so does Unity.

Unreal Engine's GC at least is pretty custom. A lot of effort has been put into it to allow developers to control he worst case run time. It has a system that allows incremental sweeps (and multi-threaded parallel sweeping) and will pause its sweeps as it approaches its allotted time limits.

That said, in my experience with Unreal most large projects choose to turn off the per-frame GC checks and manually control it. For example one project I've seen only invoked the GC when the player dies, a level transition happens or once every 10 minutes (as a fallback).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact