Why I Write Games in C (yes, C)

zemo · on Sept 21, 2019

> The stop-the-world garbage collection is a big pain for games, stopping the world is something you can't really afford to do.

I love this opinion from games programmers because they never qualify it and talk about what their latency budgets are and what they do in lieu of a garbage collector. They just hand wave and say "GC can't work". The reality is you still have to free resources, so it's not like the garbage collector is doing work that doesn't need to be done. What latency budgets are you working with? How often do you do work to free resources? What are the latency requirements there? Even at 144 fps, that's 7ms per frame. If you have a garbage collector that runs in 200us, you could run a GC on every single frame and use less than 3% of your frame budget on the GC pause. I'm -not- suggesting that running a GC on every frame is a good idea or that it should be done, but what I find so deeply frustrating is that the argument that GC can't work in a game engine is never qualified.

edit: wow for once the replies are actually good, very pleased with this discussion.

bcrescimanno · on Sept 21, 2019

Disclaimer: I'm not a game developer; but, I've worked on a lot of projects with tight frame time requirements in my time at Netflix on the TVUI team. I also have no experience in Go so I can't comment on the specifics of that garbage collector vs. V8.

I don't think it's necessarily that it "can't" work as much as it takes away a critical element of control from the game developers and the times you find yourself "at the mercy" of the garbage collector is pretty damn frustrating.

The problem we ran into with garbage collection was that it was generally non-deterministic both in terms of when it would happen and how long it would take. We actually added an API hook in our JS to manually trigger GC (something you can do when you ship a custom JS runtime) so we could take at least the "when it happens" out of the picture.

That said, there were often fairly large variances in how long it would take and, while frame time budgets may seem to accommodate things, if you end up in a situation where one of your "heavy computation" frames coincides with a GC that runs long, you're going to get a nasty frame time spike.

We struggled enough that we discussed exposing more direct memory management through a custom JS API so we could more directly control things. We ultimately abandoned that idea for various reasons (though I left over 6 years ago so I have no idea how things are today).

notacoward · on Sept 21, 2019

> it was generally non-deterministic

This is basically it. People who have never worked on actual real-time systems just never seem to get that in those environments determinism often matters more than raw performance. I don't know about "soft" real-time (e.g. games, or audio/video) but in "hard" real-time (e.g. avionics, industrial control) it's pretty routine to do things like disable caches and take a huge performance hit for the sake of determinism. If you can run 10% faster but miss deadlines 0.1% more often, that's a fail. It's too easy for tyros to say pauses don't matter. In many environments they do.

Skunkleton · on Sept 21, 2019

Its not just pauses either. Non-deterministic memory usage is a big deal too.

GuB-42 · on Sept 21, 2019

I have actually worked on a system where malloc() was forbidden. In fact it always returned null. Buffers were all statically allocated and stack usage was kept to a minimum (it was only a few kB anyways).

The software was shipped with a memory map file so you know exactly what each memory address is used for. A lot of test procedures involved reading and writing at specific memory locations.

It was for avionics BTW. As you may have guessed, it was certified code with hard real time constraints. Exceeding the cycle time is equivalent to a crash, causing the watchdog to trigger a reset.

Impossible · on Sept 22, 2019

Malloc was forbidden in a AAA title I worked on. Interestingly enough we also had two (!!!) embedded garbage collected scripting languages that could only allocate in small (32MB, 16 MB) arenas and would assert if they over allocated.

zippie · on Sept 22, 2019

In modern C libraries malloc is mmap’ed anon pages underneath for an increasingly larger set of conditions these days.

hackits · on Sept 21, 2019

Sounds very similar to NASA C programming guidelines. Each module during the initialization period would allocate its static memory size. Every loop had a upper bound's max iteration to prevent infinite loops.

I think they posted the guidelines and it was a wonderful read about how they developed real time systems.

mike00632 · on Sept 22, 2019

Do you happen to have a link to the guidelines?

kimown · on Sept 22, 2019

https://ntrs.nasa.gov/search.jsp?R=19950022400

http://mechatronics.me.wisc.edu/labresources/DataSheets/NASA...

hackits · on Sept 22, 2019

http://spinroot.com/gerard/pdf/P10.pdf

neop1x · on Sept 23, 2019

I was wondering why I couldn't open this pdf on my android pdf viewer and this link gives me cloudflare's captcha. It uses google's ReCaptcha - storefronts, bicycles, semaphores. What a terrible practice. People stop overusing cloudflare!

loeg · on Sept 22, 2019

Why even define a malloc() in this environment if it always returns NULL?

seanmcdirmid · on Sept 22, 2019

Because even if you don't use it directly, you might use some code that uses it. And even then, it might not ever be called given the way you are reusing the code, so...a dynamic check is the best way to ensure it never actually gets used.

loeg · on Sept 22, 2019

> Because even if you don't use it directly, you might use some code that uses it.

It seems extremely unlikely that any general purpose code you might adopt, which happens to invoke malloc() at all, would be fit for purpose in such a restricted environment without substantial modification; in which case you would just remove the malloc() calls as well.

> And even then, it might not ever be called given the way you are reusing the code, so...a dynamic check is the best way to ensure it never actually gets used.

In such a restricted environment, it is unlikely you just have unknown deadcode in your project. "Oh, those parts that call malloc()? They're probably not live code and we'll find out via a crash at runtime." That's like the opposite of what you want in a hard realtime system.

So, no — a static, compile/link time check is a strictly superior way to ensure it never gets used.

lonelappde · on Sept 22, 2019

Spoken like someone who never had a 3rd party binary blob as a critical dependency.

jorams · on Sept 22, 2019

If you need your system to never dynamically allocate memory, an opaque 3rd party binary blob which might still do that doesn't seem like a valid inclusion.

loeg · on Sept 22, 2019

Again, we're talking about a very restricted hard realtime environment. How do you trust, let alone qualify, a 3rd party binary blob that you know calls malloc(), which is a direct violation of your own design requirements?

alxlaz · on Sept 22, 2019

Not OP but the most common reason I've seen is that this was an additional restriction that they imposed for their project. The standard library includes a perfectly working malloc. You override it so that you can't end up calling it accidentally, either explicitly (i.e. your brain farts and you do end up malloc()-ing something) or implicitly (i.e by calling a function that ends up calling malloc down the line). The latter is what happens, and surprisingly easy and often. Not all libraries have the kind of documentation that glibc has, nor can you always see the code.

the_mitsuhiko · on Sept 22, 2019

Some things are okay to allocate at some point in time. So you can disable malloc later when you no longer want it to provide memory.

loeg · on Sept 22, 2019

That's plausible, but not how OP defined it:

> a system where malloc() was forbidden. In fact it always returned null.

drblast · on Sept 22, 2019

So you crash immediately if someone tries to use it or a library gets added that does?

Never experienced the malloc() thing but do throw exceptions and fail fast under conditions like these so they're caught in testing.

a_t48 · on Sept 22, 2019

I think what he's asking to do is to make it a linker failure by not adding it to the standard lib at all. Compile fails are nicer than runtime fails.

loeg · on Sept 22, 2019

Like sibling commenter says, a compile/link error is even better and faster than finding out at runtime with a crash.

aetherspawn · on Sept 21, 2019

I’d be interested in knowing more if you care to write more about this. I’m currently costing/scoping out what’s required in writing software at that level but don’t really have some real industry insight into what other people do

Static memory is something we already do but we’re pretty interested whether industry actually adopts redundant code paths, monitors, to what extent watchdogs (how many cycles can be missed?), etc.

rramadass · on Sept 22, 2019

I suggest you take a look at the books by Michael Pont and the offerings through his company "SafeTTy Systems". His/Company's work deals with hard real-time systems with all the associated standards.

80x86 · on Sept 21, 2019

Wow - disabling the caches is an extreme measure. I get it though. After doing that, you can (probably) do cycle counting on routines again. Just like the 80s or earlier.

kabdib · on Sept 22, 2019

There's probably enough jitter in the memory system and in instruction parallelism that accurate cycle counting will still be challenging.

Also, you probably want some padding so that newer versions of the CPU can be used without too much worry. It's possible for cycle counts of some routines to increase, depending on how new chips implement things under the hood.

[says a guy who was counting cycles, in the 1980s :-)]

viraptor · on Sept 22, 2019

I don't. Maybe we're thinking about different kind of caches, but if these are transparent, no-performance-impact caches, then why wouldn't you prove the system works well with caches off (guarantee deadlines are met), then enable caches for opportunistic power gains?

notacoward · on Sept 22, 2019

> if these are transparent, no-performance-impact caches

If there were no performance impact, there would be no point. I'm not just being snarky; there's an important point here. Caches exist to have a performance impact. In many domains it's OK to think about caching as a normal case, and to consider cache hit ratio during designs. When you say "no performance impact" you mean no negative performance impact, and that might be technically true (or it might not), but...

But that's not how a hard real-time system is designed. In that world, uncached has to be treated as the normal case. Zero cache hit ratio, all the time. That's what you have to design against, even counting cycles and so on if you need to. If you're designing and testing a system to do X work in Y time every time under worst-case assumptions, then any positive impact of caches doesn't buy you anything. Does completing a task before deadline because of caching allow you to do more work? No, because it's generally considered preferable to keep those idle cycles as a buffer against unforeseen conditions (or future system changes) than try to use them. Anything optional should have been passed off to the non-realtime parts of the larger system anyway. There should be nothing to fill that space. If that means the system was overbuilt, so be it.

The only thing caches can do in such a system is mask the rare cases where a code path really is over budget, so it slips through testing and then misses a deadline in a live system where the data-dependent cache behavior is less favorable. Oops. That's a good way for your product and your company to lose trust in that market. Once you're designing for predictability in the worst case, it's actually safer for the worst case to be the every-time case.

It's really different than how most people in other domains think about performance, I know, but within context it actually makes a lot of sense. I for one am glad that computing's not all the same, that there are little pockets of exotic or arcane practice like this, and kudos to all those brave enough to work in them.

orlp · on Sept 22, 2019

While you might test your hard real-time requirements with caches disabled, there's still reason to run the code with caches afterwards.

E.g. errors that didn't match a branch or input scenario during testing which would go over budget without cache, but with cache might prevent a crash.

Another could be power consumption, latency optimization, or improvement of accuracy. E.g. some signal analysis doesn't work at all if the real-time code is above some required Nyquist threshold, but faster performance improves the maximum frequency that can be handled, improving accuracy.

notacoward · on Sept 22, 2019

You could be right on some of those. That didn't seem to be the prevailing attitude when I worked in that area, but as I said that was a long time ago - and it was in only a few specific sub-domains as well.

notacoward · on Sept 22, 2019

Forgot to mention: cache misses can be more expensive than uncached accesses, so testing with caches off and then turning them on in production can be a disaster if you hit a cache-busting access pattern. Always run what you tested.

Baeocystin · on Sept 22, 2019

Because you lose deterministic behavior, and there are cases where that is non-negotiable, regardless of performance cost.

piaste · on Sept 22, 2019

Who will provide a guarantee that the caches are truly transparent and will not trigger any new bugs?

Essentially, you would need to prove the statement "if a system works well with caches off, then it works well with caches on" to the satisfaction of whatever authority is giving you such stringent requirements.

squeaky-clean · on Sept 22, 2019

You'd have to very solidly prove that in the wors-case a cache only ever make execution time equal to or faster than a processor not using cache and never causes anything to be slower.

darnir · on Sept 24, 2019

Even that is not enough. The cache may make everything faster, but it could lead to higher contention on a different physical resource slowing things down there. The cache cannot be guaranteed to prevent that.

PorterDuff · on Sept 21, 2019

""soft" real-time (e.g. games, or audio/video) "

Hah. Who sez that audio and video products have 'soft' real time? Go on now.

notacoward · on Sept 22, 2019

I didn't make up the terminology. It was already in common use at least thirty years ago when I was actively working in that domain. To simplify, it's basically about whether a missed deadline is considered fatal or recoverable. That difference leads to very different design choices. Perhaps some kinds of video software is hard real-time by that definition, but for sure a lot of it isn't. I'd apologize, but it was never meant to be pejorative in the first place and being on one side of the line is cause for neither pride nor shame. They're just different kinds of systems.

detaro · on Sept 22, 2019

What percentage of the market is A/V build to actual hard real-time standards, and not expected to run on devices that can't provide it (so no PCs with normal OSes, no smartphones)? For the vast majority, soft real-time is fine, since an occasional deadline-miss results in minor inconvenience, not property damage, injury or death.

I assume some dedicated devices are more or less hard real time, due to running way simpler software stacks on dedicated hardware.

PorterDuff · on Sept 22, 2019

I take it that scarcely anyone here has written software for video switchers, routers, DVEs, linear editors, audio mixers, glue products, master control systems, character generators, etc. etc. Missing a RT schedule rarely results in death, but you'd think so given the attitude from the customer. That's a silly definition for it.

There's a whole world out there of hard real time, the world is not simply made up of streaming video and cell phones.

The cool thing on HN is you can get down voted for simply making that observation. It's a sign of the times I'm afraid.

AnimalMuppet · on Sept 22, 2019

I actually have written software for video routers and character generators. We didn't consider them hard real time, though I wouldn't claim that such was standard industry usage.

For example, if you're doing a take, you have to complete it during the blanking interval, but usually the hardware guarantees that. In the software, you want you take to happen in one particular vertical blanking interval (and yes, it really is a frame-accurate industry). But if you miss, you're only going to miss by one. We didn't (so far as I know) specify guarantees to the customer ("If you get your command to the router X ms before the vertical interval, your take will happen in that vertical"), so we could always claim that the customer didn't get the command to the router in time. Again, so far as I know - there may have been guarantees given to the customer, but I didn't know about them.

But that was 20 years ago, back in the NTSC 525 days.

Nice name, by the way. Do you know of any video cards that will do a true Porter & Duff composite these days? I recall looking (again, 20 years ago) at off-the-shelf video cards, and while they could do an alpha composite, it wasn't right (and therefore wasn't useful to us).

ajanuary · on Sept 22, 2019

I currently work on software controlling the hardware like video routers, and this is definitely my experience. It’s all very much soft real-time.

In terms of customers and how much they care, the North American market seems to care less than Europe.

jcelerier · on Sept 22, 2019

I work on an open-source music sequencer (https://ossia.io) and no later than two days ago I had a fair amount of mails with someone who wanted to know the best settings for his machine to not have any clicks during the show (which are the audio symptoms of "missed deadline"). I've met some users who did not care, but the overwhelming majority does, even for a single click in a 1-hour long concert.

littlestymaar · on Sept 22, 2019

If it's running in a consumer OS (not a RT one) and it counts on having enough CPU available to avoid missing the deadline, that's exactly what soft-realtime is.

Compare your “not a single click in an hour [for quality reason]” to a “not a single missed deadline in 30 years of the life expectancy of a plane, on a fleet of a few thousands planes [for safety reasons]”. That's the difference of requirements between hard and soft RT.

I did some soft real-time (video decoding) and I have a friend working on hard real-time (avionics) and we clearly didn't worked in the same world.

AnimalMuppet · on Sept 22, 2019

Yeah. To me, hard real time is when you count cycle (or have a tool that does it for you), to guarantee that you make your timing requirements. We never did that.

fastball · on Sept 22, 2019

You just agreed with the OC.

RT video/audio failing never results in death. Where as failures in "avionics, industrial control" absolutely can / do. That seems to be where OC was drawing the line.

detaro · on Sept 22, 2019

Seems to be a common distinction, although GP is right with the addition that the production side of things is more demanding (and at least would suffer financial damage if problems occur to often) than the playback side formed by random consumer gear, and has some, especially low-level/synchronization-related, gear to hard standards. But often soft is enough, as long as it's reliable enough on average.

coldtea · on Sept 22, 2019

>For the vast majority, soft real-time is fine, since an occasional deadline-miss results in minor inconvenience, not property damage, injury or death.

A "minor inconvenience" like a recording session going wrong, a live show with stuttering audio, skipped frames in a live TV show, and so on?

squeaky-clean · on Sept 22, 2019

Most professional recording studios are using consumer computer hardware that can't do hard realtime with software that doesn't support hard realtime.

People like deadmau5, Daft Punk, Lady Gaga all perform with Ableton Live and a laptop or desktop behind their rig. If it were anything more than a minor inconvenience, these people wouldn't use this.

It's very unlikely to have audio drop outs, a proper setup will basically never have them. But still if you have one audio dropout in your life, you're not dead, your audience isn't dead, a fire doesn't start, a medical device doesn't fail to pump, and so on.

And yes you can badly configure and system, but the point is you can't configure these to be 100% guaranteed, 99.99% is perfectly fine.

Edit: Sometimes people call these "firm" realtime systems. Implying the deadline cannot be missed for it to operate, but also that failure to meet deadlines doesn't result in something serious like death (e.g in a video game you can display frames slower than realtime and it kind of works but feels laggy, however you cannot also slow down the audio processing because you'll a lowered pitch, so you have to drop the audio.)

detaro · on Sept 22, 2019

As long as the individual event happens seldom enough few of these actually are a big problem. Soft real-time being allowed to blow deadlines doesn't mean it can't be expected to have a very high rate of success (at least that's the definitions I've learned), and clearly a sufficiently low rate of failure is tolerated. There's a vast difference between "there's an audio stutter every day/week/month/..." and "noticeably stuttering audio". The production side is obviously a lot more sensitive about this than playback, but will still run parts e.g. on relatively normal desktop systems because the failure rate is low enough.

tripzilch · on Sept 23, 2019

The production side usually renders the final audio mix off-line, so no real-time requirements there for getting optimum sound quality. I'd say the occasional rare pop or stutter is worse to have during a live performance than when mixing and producing music.

greggman2 · on Sept 22, 2019

There's probably 500+ successful GC based games on the Steam store another 100000 to a million hobby games doing just fine with GC.

I started game programming on the Atari 800, Apple 2, TRS-80. Wrote NES games with 2k of ram. I wrote games in C throughout the 90s including games on 3DO and PS1 and at the arcade.

I was a GC hater forever and I'm not saying you can ignore it but the fact that Unity runs in C# with GC and that so many quality and popular shipping games exist using it is really proof that GC is not the demon that people make it out to be

Some games made with GC include Cuphead, Kerbal Space Program, Just Shapes & Beats, Subnautica, Ghost of a Tale, Beat Saber, Hollow Knight, Cities: Skylines, Broforce, Ori and the Blind Forest

sgtnoodle · on Sept 22, 2019

Kerbal Space Program is a lot of fun, but for me it freezes for half a second every 10 seconds... due to garbage collection. Drives me crazy.

exDM69 · on Sept 22, 2019

Even on a fast PC, Kerbal Space Program audio is choppy because of GC pauses.

It's successful in spite of that, but that doesn't make it any better.

kungito · on Sept 22, 2019

But maybe it had to do with them being able to use a higher level library, not worry about gc, and focus on other things

mcdevilkiller · on Sept 22, 2019

And a very important one, Minecraft, atleast the Java version. It doesn't matter until you are running a couple hundred mods with lots of blocks and textures, when the max memory allocated gets saturated and it stutters like hell.

ClumsyPilot · on Sept 22, 2019

I don't think the argument is "you can't ship succesfull game with GC"

You might spend more time fighting the GC than benefitting from it. And that seems to be the experience for large games - simpler ones might not care.

Unity offers a lot more than just a language, and developers have to choose, are they willing to put up with GC to get the rest of what Unity offers.

EdwardDiego · on Sept 22, 2019

Minecraft is the only Java based popular game I can think of. And damn did I love Subnautica.

Daiz · on Sept 22, 2019

The recent roguelite hit Slay the Spire (over a million copies sold on Steam) is also made in Java.

pdpi · on Sept 21, 2019

> and the times you find yourself "at the mercy" of the garbage collector is pretty damn frustrating.

You're still at the mercy of the malloc implementation. I've seen some fairly nasty behaviour involving memory leaks and weird pauses on free coming from a really hostile allocation pattern causing fragmentation in jemalloc's internal data.

Gene_Parmesan · on Sept 21, 2019

Which is why you generally almost never use the standard malloc to do your piecemeal allocations. A fair number of codebases I've seen allocate their big memory pools at startup, and then have custom allocators which provide memory for (often little-'o') objects out of that pool. You really aren't continually asking the OS for memory on the heap.

In fact, doing that is often a really bad idea in general because of the extreme importance of cache effects. In a high-performance game engine, you need to have a fine degree of control over where your game objects get placed, because you need to ensure your iterations are blazingly fast.

mlthoughts2018 · on Sept 21, 2019

Doesn’t this just change semantics? Whatever custom handlers you wrote for manipulating that big chunk of memory are now the garbage collector. You’re just asking for finer grained control than what the native garbage collection implementation supports, but you are not omitting garbage collection.

Ostensibly you could do the exact same thing in e.g. Python if you wanted, by disabling with the gc module and just writing custom allocation and cleanup in e.g. Cython. Probably similar in many different managed environment languages.

therein · on Sept 21, 2019

I mean, nobody is suggesting they leave the garbage around and not clean up after themselves.

But instead what you can do is to reuse the "slots" you are handing out from your allocator's memory arena for allocations of some specific type/kind/size/lifetime. If you are controlling how that arena is managed, you will find yourself coming across many opportunities to avoid doing things a general purpose GC/allocator would choose to do in favor of the needs dictated by your specific use case.

For instance you can choose to draw the frame and throw away all the resources you used to draw that frame in one go.

mntmoss · on Sept 21, 2019

The semantics matter. A lot of game engines use a mark-and-release per-frame allocation buffer. It is temporary throwaway data for that frame's computation. It does not get tracked or freed piecemeal - it gets blown away.

Garbage collection emulates the intent of this method with generational collection strategies, but it has to use a heuristic to do so. And you can optimize your code to behave very similarly within a GC, but the UI to the strategy is full of workarounds. It is more invasive to your code than applying an actual manual allocator.

PrototypeNM1 · on Sept 22, 2019

> A lot of game engines use a mark-and-release per-frame allocation buffer.

I've heard of this concept but a search for "mark-and-release per-frame allocation buffer" returned this thread. Is there something else I could search?

theresistor · on Sept 22, 2019

It’s just a variation of arena allocation. You allocate everything for the current frame in an arena. When the frame is complete. You free the entire arena, without needing any heap walking.

A generational GC achieves a similar end result, but has to heuristically discover the generations, whereas an arena allocator achieves the same result deterministically And without extra heap walking.

meheleventyone · on Sept 22, 2019

Linear or stack allocator are other common terms. Just a memory arena where an allocation is just a pointer bump and you free the whole buffer at once by returning the pointer to the start of the arena.

asveikau · on Sept 22, 2019

Getting rid of this buffer is literally nothing. There is no free upon the individual objects needed. You just forget there was anything there and use the same buffer for the next frame. Vs. Waiting for a GC to detect thousands of unused objects in that buffer and discard them, meanwhile creating a new batch of thousands of objects and having to figure out where to put those.

pyrale · on Sept 21, 2019

You can do many things in many languages. You may realize in the process that doing useful things is made harder when your use case is not a common concern in the language.

correct_horse · on Sept 21, 2019

C's free() gives memory back to the operating system(1), whereas, as a performance optimization, many GCd languages don't give memory back after they run a garbage collection (see https://stackoverflow.com/questions/324499/java-still-uses-s...). Every Python program is using a "custom allocator," only it is built in to the Python runtime. You may argue that this is a dishonest use of the term custom allocator, but custom is difficult to define (It could be defined as any allocator used in only one project, but that definition has multiple problems). The way I see it, there are allocators that free to the OS and those that don't or usually don't (hereafter referred to as custom). In C, a custom allocator conceivably could be built into, say, a game engine. You might call ge_free(ptr) which would signal to the custom allocator that chunk of memory is available and ge_malloc() would use the first biggest chunk of internally allocated memory, calling normal malloc() if necessary. Custom allocators in C are a bit more than just semantics, and affect performance (for allocation-heavy code). Furthermore, they are distinct from GC, as they can work with allocate/free semantics, rather than allocate/forget (standard GC) semantics. Yes, one could technically change any GCd language to use a custom allocator written by one's self. But Python can't use allocate/free semantics (so don't expect any speedup). Python code never attempts manual memory management, (i.e. 3rd party functions allocate on the heap all the time without calling free()) because that is how Python is supposed to work. To use manual memory management semantics in Python, you would need to rewrite every Python method with a string or any user defined type in it to properly free.

(1) malloc implementations generally allocate a page at a time and give the page back to the OS when all objects in the page are gone. ptr = malloc(1); malloc(1); free(ptr); doesn't give the single allocated page back to the OS.

takeda · on Sept 22, 2019

Python is a bad example to talk about gc, because it uses different garbage collector than most of languages. It is also the primary reason why getting rid of GIL and retaing performance is so hard. Python uses reference counters and as soon as the reference count drops to 0 it immediately frees the object, so in a way it is more predictable. It has also a traditional GC and I guess that's what was mentioned you can disable it. The reason for it is that reference count won't free memory of there is a loop (e.g. object A references B and B references A, in that case both have reference count 1 even though nothing is using them), do that's where the traditional GC steps in.

ncmncm · on Sept 22, 2019

Freeing memory to the OS causes TLB cache stalls in all other threads in the process.

If the program runs for any length of time, it will probably need the same memory again, so freeing it is a pessimization.

Standard C library free() implementations very, very rarely free memory back to the OS.

angry_octet · on Sept 22, 2019

It's not a performance optimisation not to give space back. GCs could easily give space back after a GC if they know a range (bigger than a page) is empty, it's just that they rarely know it is empty unless they GC everything, and even then there is likely to be a few bytes used. Hence the various experiments with generational GC, to try to deal with fragmentation.

Many C/C++ allocators don't release to the OS often or ever.

Rusky · on Sept 21, 2019

That's true, and it's why the alternative to GC is generally not "malloc and free" or "RAII" but "custom allocators."

Games are very friendly to that approach- with a bit of thought you can use arenas and object pools to cover 99% of what you need, and cut out all of the failure modes of a general purpose GC or malloc implementation.

xyproto · on Sept 21, 2019

Interestingly, it's fully possible to disable the automatic garbage collection in Go to achieve this.

Disable the garbage collector:

  debug.SetGCPercent(-1)

Trigger garbage collection:

  runtime.GC()

It is also possible to allocate a large block of memory and then manage it yourself.

littlestymaar · on Sept 21, 2019

Due the low throughput of Go's GC (which trades a lot of it in favor of short pause duration), you risk running out if memory if you have a lot of allocations and you don't run your GC enough times.

xyproto · on Sept 21, 2019

For a computer game, if you start out by allocating a large block of memory, then manage it yourself, I don't see how this would be a problem.

littlestymaar · on Sept 21, 2019

You're not using the GC at all then. Why use Go (and praise its GC) in that case?

Skunkleton · on Sept 21, 2019

For the joy of getting the round peg through the square hole of course.

xyproto · on Sept 21, 2019

Go has many advantages over C that are not related to GC.

littlestymaar · on Sept 22, 2019

In a context where you don't allocate memory, you lose a lot of those (for instance, you almost cannot use interfaces, because indirect calls cause parameters to those calls to be judged escaping and unconditionally allocated on the heap).

Go is a good language for web backend and other network services, but it's not a C replacement.

xyproto · on Sept 25, 2019

If you allocate a large block of memory manually at the start of the program, then trigger the GC manually when it suits you, won't you get the best of both worlds?

Valmar · on Sept 22, 2019

Go also has many disadvantages, compared to plain C.

xyproto · on Sept 22, 2019

Can't think of any. Do you have an example?

littlestymaar · on Sept 22, 2019

You can't call native libraries without going through cgo. So unless you don't want to have audio, draw text and have access to the graphic APIs, you'll need cgo, which is really slow due to Go's runtime. For game dev, that's a no go (pun intended).

Additionally, the Go compiler isn't trying really hard at optimizing your code, which makes it several times slower on a CPU-bound task. That's for a good reason: because for Go's usecase, compile-time is a priority over performances.

Saying that there is no drawbacks in Go is just irrational fandom…

xyproto · on Sept 22, 2019

You are only talking about the Go compiler from Google.

GCC also supports Go (gccgo) and can call native libraries just like from C.

I'm not saying there are no drawbacks in Go, just that I can't think of any advantages of C over Go.

takeda · on Sept 22, 2019

Go was pushed as a C replacement, but very few C programmers switched to it, it seems like it took hearts of some of Python, Ruby, Java etc programmers.

freyr · on Sept 22, 2019

Nonetheless, Go has many advantages over C that are not related to GC.

takeda · on Sept 22, 2019

So does Python or Ruby, that doesn't mean it is a C replacement.

majewsky · on Sept 21, 2019

> It is also possible to allocate a large block of memory and then manage it yourself.

At which point you're mostly just writing C in Go.

pm90 · on Sept 21, 2019

Actually you're not.

I would very much prefer a stripped down version of Go used for these situations rather than throwing more C at it. The main benefits of using Go are not the garbage collection, its the tooling, the readability (and thus maintainability) of the code base, the large number of folks who are versatile in using it.

kerkeslager · on Sept 21, 2019

Readability is subjective.

Large user base? C is number 2. Go isn't even in the top 10.[1]

Tooling? C has decades of being one of the most commonly used languages, and a general culture of building on existing tools instead of jumping onto the latest hotness every few months. As a result, C has a very mature tool set.

[1] https://www.tiobe.com/tiobe-index/

strken · on Sept 21, 2019

Unfortunately the excellent standard library is a major benefit of Go, and it uses the GC, so if you set GOGC=off you're left to write your own standard library.

I would also like to see a stripped-down version of Go that disables most heap allocations, but I have no idea what it would look like.

metiscus · on Sept 21, 2019

Are you saying that there are more go developers than c developers? Is there a user survey that shows such things? I'm curious what the ratio is.

rdbell · on Sept 21, 2019

I'd be willing to wager that C programmers would be more comfortable working with a Golang codebase than Golang programmers would be working with a C codebase.

There may be more "C programmers" by number but a Golang codebase is going to be more accessible to a wider pool of applicants.

weberc2 · on Sept 21, 2019

In my experience it takes a few days for a moderate programmer to come up to speed on Go, whereas it takes several months for C. You need to hire C programmers for a C position, you can hire any programmers for a Go position.

cozzyd · on Sept 21, 2019

If they don't already know C though, how well will they cope with manual memory management?

nicoburns · on Sept 22, 2019

How do people learn C without knowing about manual memory management? They learn about it as they learn the language. This can be done in any language that allows for manual memory management (and most have much better safeguards and documentation than C, which has a million ways to shoot yourself in the foot)

weberc2 · on Sept 22, 2019

It will be a learning curve, but a much, much smaller one than learning C.

kerkeslager · on Sept 23, 2019

But the entire point of this line of questioning is that there are more programmers who already know C.

kerkeslager · on Sept 21, 2019

He's wrong: https://www.tiobe.com/tiobe-index/

weberc2 · on Sept 21, 2019

You’re writing in a much improved C. Strong type system (including closures/interfaces/arrays/slices/maps), sane build tooling (including dead simple cross compilation), no null-terminated strings, solid standard library, portability, top notch parallelism/concurrency implementation, memory safety (with far fewer caveats, anyway), etc. Go has it’s own issues and C is still better for many things, but “Go with manually-triggered GC” is still far better than C for 99.9% of use cases.

bufo · on Sept 21, 2019

Go’s compiler is not at all optimized for generating fast floating point instructions like AVX and its very cumbersome to add any kind of intrinsics. This might not matter for light games but an issue when you want to simply switch to wide floating point operations to optimize some math.

weberc2 · on Sept 22, 2019

Yeah, C compilers optimize much more than Go compilers. Performance is C’s most noteworthy advantage over Go.

xyproto · on Sept 22, 2019

GCC can compile both C and Go. I searched for benchmarks but found none for GCC 9 that compares the performance of C and Go. Do you have any sources on this?

weberc2 · on Sept 23, 2019

I don’t have a source, but it’s common knowledge in the Go community. Not sure how GCC works, but it definitely produces slower binaries than gc (the standard Go compiler). There are probably some benchmarks where this is not the case, but the general rule is that gcc is slower. gc purposefully doesn’t optimize as aggressively in order to keep compile times low.

Personally I would love for a —release mode that had longer compile times in exchange for C-like performance, but I use Python by day (about 3 orders of magnitude slower than C) so I’d be happy to have speeds that were half as fast C. :)

xyproto · on Sept 22, 2019

Which compiler? The one from Google, GCC (gccgo) or TinyGo?

omaranto · on Sept 21, 2019

Does Go really let you use closures, arrays, slices and maps when you disable the garbage collector? If so, does that just leak memory?

weberc2 · on Sept 22, 2019

Yes, the idea is that you must invoke the GC when you’re not in a critical section. Alternatively you can just avoid allocations using arenas or similar. (You can use arrays and slices without the GC).

omaranto · on Sept 22, 2019

To make sure I understand, is this an accurate expansion of your comment?

Yes it would leak, to avoid leaking you could invoke the GC when you’re not in a critical section. Alternatively, if you don't use maps and instead structure all your data into arrays, slices and structs, you can just avoid allocations using arenas or similar. (You can use arrays and slices without the GC, but maps require it).

weberc2 · on Sept 22, 2019

Yes, that is correct. Anything that allocates on the heap requires GC or it will leak memory. Go doesn’t have formal semantics about what allocates on the heap and what allocates on the stack, but it’s more or less intuitive and the toolchain can tell you where your allocations are so you can optimize them away. If you’re putting effort into minimizing allocations, you can probably even leave the GC on and the pause times will likely be well under 1ms.

Insanity · on Sept 21, 2019

And not to forget that using Go correctly you'd end up doing mostly stack pushes and pops

z3t4 · on Sept 21, 2019

With JS the trick is to avoid creating new objects and instead have a pool of objects that are always referenced.

bcrescimanno · on Sept 21, 2019

Definitely! Object pools are common in game dev too from what I know. We used them extensively to reduce the amount of GC we needed.

eutropia · on Sept 21, 2019

Speedrunners thank you for reusing objects! I'm certain that decisions like this are what lead to interesting teleportation techniques and item duplications. Games wouldn't be the same without these fun Easter eggs!

tomxor · on Sept 21, 2019

Once I wrote a very small vector library in JS for this very reason: almost all JS vector libraries out there tend to dynamically create new vectors for every vector binary operation, this makes JS GC go nuts. It's also prohibitively expensive to dynamically instantiate typedarray based vectors on the fly, even though they are generally faster to operate on... most people focus on fixing the latter in order to be able to use typedarrays by creating vector pools (often as part of the same library), but this creates a not-insignificant overhead.

Instead my miniature library obviated pools by simply having binary operators operate directly on one of the vector objects passed to it, if more than one vector was required for the operation internally they would be "statically allocated" by defining them in the function definition's context (some variants i would also return one of these internal vectors - which was only safe to use until a subsequent call of the same operator!).

The result this had on the calling code looked quite out of place for JS, because you would effectively end up doing a bunch of static memory allocation by assigning a bunch of persistent vectors for each function in it's definition context, and then you would often need to explicitly reinitialize the vectors if they were expected to be zero.

... it was however super fast and always smooth - I wish it was possible to turn the GC off in cases like this when you know it's not necessary. It was more of a toy as a library, but i did write some small production simulations with it - i'm not sure how well the method would extend to comprehensive vector and matrix libraries, I think the main problem is that most users would not be willing to use a vector library this way, because they want to focus on the math and not have to think about memory.

RobertKerans · on Sept 21, 2019

You wouldn't have this still up somewhere like GH would you? I'm currently writing a toy ECS implementation and have somewhat similar needs, and I've been trying to build up a reference library of sorts covering novel ways of dealing with these kind of JS issues

tomxor · on Sept 21, 2019

No sorry, this is from more than 5 years ago and it was never FOSS, but I only implemented rudimentary operators anyway, you could easily adapt existing libraries or write your own, the above concept is more valuable than any specific implementation details... the core concept being, never implicitly generate objects, e.g operate directly on parameter objects, or re-use them for return value, or return persistent internal objects (potentially dangerously since they will continue to be referenced and used internally).

All of these ideas require more care when using the library though.

RobertKerans · on Sept 22, 2019

Thank you though, what you've written here is very useful -- you're describing things I'm immediately recognising in what I'm doing. As I say, it's just a small toy (and I think one that would definitely be easier in a different language, but anyway...). At the minute I'm actually at the point where I'm preallocating and persisting a set of internal objects, and at a very very small scale it's ok, but each exploration in structure starts to become manageable pretty quickly.

meheleventyone · on Sept 22, 2019

Three.js implements their math this way.

https://github.com/mrdoob/three.js/blob/master/src/math/Vect...

You can see most operations act on the Vector and there are some shared temporary variables that have been preallocated. If you look through some of the other parts you can see closures used to capture pre-allocated temporaries per function as well.

RobertKerans · on Sept 22, 2019

Ah brilliant, that's definitely useful. Thank you for the pointer

rienbdj · on Sept 21, 2019

This kind of place orientated programming can make the actual algorithm very hard to follow. I really hope that JS gets custom value types in the future.

chrisseaton · on Sept 21, 2019

But then that damages performance because your objects are always globally visible rather than being able to be optimised away.

weberc2 · on Sept 21, 2019

Go’s GC is low latency and it allows you to explicitly trigger a collection as well as prevent the collector from running for a time. I would wager that the developer time/frustration spent taming the GC would be more than made up for by the dramatic improvement in ergonomics everywhere else. Of course, the game dev libraries for Go would have to catch up before the comparison is valid.

tomxor · on Sept 21, 2019

Why is this downvoted? is it factually wrong? (I don't know Go so I'm asking)

weberc2 · on Sept 22, 2019

It’s factually correct. My other post got downvoted for pointing out that not every GC is optimized for throughout. Seems like I’m touching on some cherished beliefs, but not sure exactly which ones.

stephc_int13 · on Sept 21, 2019

I know this subject quite well and I will later publish a detailed article.

The real run-time cost of memory management done well in a modern game engine written without OOP features is extremely low.

We usually use a few very simple specialized memory allocators, you'd probably be surprised by how simple memory management can be.

The trick is to not use the same allocator when the lifetime is different.

Some resources are allocated once and basically never freed.

Some resources are allocated per level and freed all at once at the end.

Some resources are allocated during a frame and freed all at once when a new frame starts.

And lastly, a few resources are allocated and freed randomly, and here the cost of fragmentation is manageable because we're talking about a few small chunks (like network packets)

fluffything · on Sept 21, 2019

+1. We have a large Rust code base, and we forbid Vec and the other collections.

Instead, we have different types of global arenas, bump allocators, etc. that you can use. These all pre-allocate memory once at start up, and... that's it.

When you have well defined allocation patterns, allocating a new "object" is just a "last += 1;` and once you are done you deallocate thousands of objects by just doing `last -= size();`.

That's ~0.3 nanoseconds per allocation, and 0.x nano-seconds to "free" a lot of memory.

For comparison, using jemalloc instead puts you at 15-25 ns per allocation and per deallocation, with "spikes" that go up to 200ns depending on size and alignment requirements. So we are talking here a 100-1000x improvement, and very often the improvement is larger because these custom allocators are more predictable, smaller, etc. than a general purpose malloc, so you get better branch prediction, less I-cache misses, etc.

Matthias247 · on Sept 21, 2019

Do you use any public available crate for those allocators? Would love to take a look. I'm currently trying to write a library for no-std which requires something like that. I currently have written a pool of bump allocators. For each transaction you grab an allocator from the pool, allocate as many objects from it as necessary, and then everything gets freed back to the pool. However it's a bit hacky right now, so I'm wondering whether there is already something better out there.

fluffything · on Sept 22, 2019

> Do you use any public available crate for those allocators?

Not really, our bump allocator is ~50 LOC, it just allocates a `Box<[u8]>` with a fixed size on initialization, and stores the index of the currently used memory, and that's it.

We then have a `BumpVec<T>` type that uses this allocator (`ptr`, `len`, `cap`). This type has a fixed-capacity, it cannot be moved or cloned, etc. so it ends up being much simpler than `Vec`.

summarity · on Sept 21, 2019

Seems like lifeguard could solve this: https://github.com/zslayton/lifeguard

llogiq · on Sept 21, 2019

What arena types do you use?

If you need to store pointers and want to conserve a bit of memory, perhaps my compact_arena crate can help you.

gpm · on Sept 21, 2019

Is this code base open source? Or do you know of open source rust code like it?

kjeetgill · on Sept 21, 2019

Exactly this. Writing a basic allocator can give you a significant bang for your buck. Hard-coding an application specific arena allocator is trivial. The harder part is being diligent enough to avoid use after free/dangling pointers.

I'm a huge Java nerd. I love me some G1/Shenandoah/ZGC/Zing goodness. But once you're writing a program that to the point that you're tuning memory latency in many games anyway, baking in your application's generational hypothesis is pretty easy. Even in Java services you'll often want to pool objects that have odd lifetimes.

yellowapple · on Sept 21, 2019

This is exactly what's continually drawing me to Zig for this sort of thing. Any function that requires memory allocation has to either create a new allocator for itself or explicitly accept an existing one as an argument, which seems to make it a lot easier/clearer to manage multiple allocators for this exact use case.

markus_zhang · on Sept 21, 2019

This sounds very interesting. Please do write an article abput the topic! Seems like you would introduce specific allocators for maybe different games.

talkingtab · on Sept 21, 2019

I buy this - specialized memory allocators, after all I've used mbufs. So which languages would be good or recommended for coercing types when using memory allocators?

alexhutcheson · on Sept 22, 2019

C++ or Rust

voldacar · on Sept 22, 2019

> a modern game engine written without OOP features

Is there a particular codebase you are thinking of here?

debatem1 · on Sept 21, 2019

It would be interesting to see what the impact of adding hints about allocation lifetime to malloc would be. I have to suspect someone has already written that paper and I just don't know the terminology to look for it.

Gibbon1 · on Sept 22, 2019

I'm reminded of an ancient 8031 C compiler. It didn't support reentrancy. Which seems like a bad thing until you realize that the program could be analyzed as a acyclic graph. The compiler then statically allocated and packed variables based on the call tree. Programs used tiny amounts of memory.

spc476 · on Sept 21, 2019

Given the problems people have with Rust and the borrow checker, I would guess that people will not accurately predict how long an allocation will live (it's a similar issue with profiling---programmers usually guess wrong about where the time is being spent).

big_chungus · on Sept 23, 2019

+1 for that article, especially the part about OOP downsides at scale and batch memory allocation. Love to see more!

zemo · on Sept 21, 2019

as an example point, the Go garbage collector clears heaps of 18gb in sub-millisecond latencies. If I'm understanding the problem at hand (maybe I'm not!), given an engine running at a target framerate of 144 frames per second, you're working with a latency budget of about 7ms per frame. Do you always use all 7ms, or do you sometimes sleep or spin until the next frame to await user input?

We can also look at it from the other direction: if your engine is adjusting its framerate dynamically based on the time it takes to process each frame, and you can do the entire work for a frame in 10ms, does that give you a target of 100 fps? If you tack on another half millisecond to run a GC pause, would your target framerate just be 95 fps?

And what do you do when the set of assets to be displayed isn't deterministic? E.g., an open world game with no loading times, or a game with user-defined assets?

stephc_int13 · on Sept 21, 2019

There is no general answer to this question. Frame latency, timing and synchronization is a difficult subject.

Some games are double or triple buffered.

Rendering is not always running at the same frequency as the game update.

The game update is sometimes fixed, but not always.

I've had very awful experience with GC in the past, on Android, the game code was full C/C++ with a bit of Java to talk to the system APIs, I had to use the camera stream. At the time (2010) Android was still a hot mess full of badly over engineered code.

The Camera API was simply triggering a garbage collect about every 3-4 frames, it locked the camera stream for 100ms (not nanoseconds, milliseconds!) The Android port of this game was cancelled because of that, it was simply impossible to disable the "feature".

weberc2 · on Sept 21, 2019

You didn't have a bad experience with GC in the past, you had a bad experience with a single GC implementation, one which was almost certainly optimized for throughput and not latency and in a language that pushes you toward GC pressure by default. :)

stephc_int13 · on Sept 21, 2019

This is an example.

I never worked with Unity myself but I worked with people using Unity as their game engine, they all had problems with stuttering caused by the GC at some point.

You can try to search Unity forums about this subject, you'll find hundreds or thousands of topics.

What really bothers me with GC is that it solves a pain I never felt, and creates a lot of problems that are usually more difficult to solve.

This is a typical case of the cure being (much) worse than the illness.

weberc2 · on Sept 22, 2019

What is Unity’s STW time? Is it optimized for latency? If not, you’re using the wrong collector. The pain point it solves is time spent debugging memory issues and generally a slower pace of development, but of course if you’re using a collector optimized for throughout, you’ll have a bad time. Further, a low-latency GC will pay off more for people who haven’t been using C or C++ for 10 years than those who have.

gameswithgo · on Sept 22, 2019

It is very nice to say that, in theory, a GC could work very well for performance demanding games. But until someone builds that GC, in an environment where everything else is suitable for games also, it is academic. We can't actually build a game with a theoretical ecosystem.

weberc2 · on Sept 22, 2019

Of course, this is a theoretical conversation. My point is that you don’t refute the claim, “low latency GCs are suitable for game dev” with “I used a high latency GC and had a bad time”. Nor do you conclude that GCs inherently introduce latency issues based on a bad experience with a high latency GC. There is neither theory nor experiment that supports the claim that GCs are inherently unsuitable for game dev.

tomp · on Sept 21, 2019

"Clearing an 18GB heap" that's full of 100MB objects that are 99% dead is different than clearing an 18GB heap of 1KB objects that are 30% (not co-allocated, but randomly distributed across the whole heap).

zemo · on Sept 21, 2019

this is exactly the kind of dismissive hand-waving that is frustrating. The 18gb heap example is from data collected on production servers at Twitter. Go servers routinely juggle tens of thousands of connections, tens of thousands of simultaneous user sessions, or hundreds of thousands of concurrently running call stacks. We're essentially never talking about 100mb objects since the vast majority of what Go applications are doing is handling huge numbers of small, short-lived requests.

https://blog.golang.org/ismmkeynote

tomp · on Sept 21, 2019

I'm not a game developer, just a programming language enthusiast with probably above average understanding of how difficult the problem this is.

Can you point out in the post where they expand on my point? The only this I see is this:

> Again we kept knocking off these O(heap size) stop the world processes. We are talking about an 18Gbyte heap here.

which is exactly my point - even if you remove all O(heap size) locks, depending on the exact algorithm it might still be O(number of objects) or O(number of live objects) - e.g. arena allocators are O(1) (instant), generational copying collectors are O(live objects), while mark-and-sweep GCs (including Go's if I understand correctly after skimming over your link) are O(dead objects) (the sweeping part). Go's GC seems to push most of that out of Stop-The-World pause, instead it offloads it to mutator threads instead... Also, "server with short-lived requests" is AFAIK a fairly good usecase for a GC - most objects are very short-lived, so it would be mostly garbage with simple live object graph...

Still, a commendable effort. Could probably be applied to games as well, though likely different specific optimisations would be required for their particular usecase. I think communication would be better if you expanded on this (or at least included the link) in your original post.

aidenn0 · on Sept 22, 2019

There are plenty of known GC algorithms with arbitrarily bounded pause times. They do limit the speed at which the mutator threads can allocate, but per-frame allocations can be done the same way they are done in a non-GC'd environment (just allocate a pool and return everything to it at the end of each frame), and any allocations that are less frequent will likely be sufficiently rare for such algorithms.

I know people who have written games in common-lisp despite no implementations having anything near a low-latency GC. They do so by not using the allocator at all during the middle of levels (or using it with the GC disabled, then manually GCing during level-load times).

littlestymaar · on Sept 21, 2019

> Go garbage collector clears heaps of 18gb in sub-millisecond latencies

At the expense of a lot of work from the GC. In production we had 30% of the usage coming from the GC alone…

But for game development the problem isn't going to be the GC anyway: cgo is just not adapted to this kind of tasks (and you cannot avoid cgo here).

fileeditview · on Sept 22, 2019

Best comment here. I have spent some time evaluating Go for making games. Considering the fact that for 95% of games it doesn't really matter if the language has a GC it could have been a very nice option.

Currently CGO just sucks. It adds a lot of overhead. The next problem is target platforms. I don't have access to console SDKs but compiling for these targets with Go should be a major concern.

Go could be a nice language for making games but with the state it is in the only thing is good for is hobby desktop games.

pixelhorse · on Sept 21, 2019

> as an example point, the Go garbage collector clears heaps of 18gb in sub-millisecond latencies.

The trick here is to double the heap size, which would be completely unacceptable for a game that's making use of what the hardware offers.

gameswithgo · on Sept 22, 2019

Go's GC has latencies that would be good enough for a great many games, but its throughput might become problematic, and in general, as a language it is problematic for the kinds of games that would worry about GC pauses, since FFI with c libraries is very expensive. If that could be solved Go might be very appealing for many many games.

h0l0cube · on Sept 21, 2019

I worked as a games programmer for 8 years in C/C++, and spent an accumulated 2 years just doing optimisation, during the time of 6th and 7th generation consoles. Freeing resources in a deterministic manner is important for the following reasons:

FRAME RATE: having a GC collect at random frames makes for jerky rendering

SPEED: Object pools allow reuse of objects without allocing/deallocing, and can be a cache-aligned array-of-structs. Structs-of-arrays can be used for batch processing large volumes of primitives. https://en.wikipedia.org/wiki/AoS_and_SoA

RELIABILITY: This is probably applicable to the embedded realm too, but if you can't rely on virtual memory (because the console's OS/CPU doesn't support it, or once again you don't want the speed impact) then you need to be sure that allocations always succeed from a fixed pool of memory. Pre-allocated object pools, memory arenas, ring buffers etc. are a few of ways to ensure this.

There's probably a lot more, but those are the reasons that jump out at me.

zemo · on Sept 21, 2019

you can turn off the gc in Go and run it manually. you can also write cache-aligned arrays of structs in Go if you want to. you can allocate a slab and pull from it if you want to. the existence of a GC doesn't preclude these possibilities.

gameswithgo · on Sept 22, 2019

The existence of a GC, even when it can be turned off, does preclude a great many other possibilities, in practice. One issue which is nearly universe, and extra bad in Go, is the extra cost of FFI with C libs, which is necessary in games to talk to opengl, or sdl2, or similar.

If you aren't going to use the GC, then you open up a lot of other performance opportunities by just using a language that didn't have one in the first place.

h0l0cube · on Sept 21, 2019

>you can turn off the gc in Go and run it manually

And if you run the GC manually, you really don't know how long it will take - read: determinism.

> you can also write cache-aligned arrays of structs in Go if you want to

Wasn't this thread about why people don't use GC, not about go? I don't remember.

If you're using an object pool, you're dodging garbage collection, as you don't need to deallocate from that pool, you could just maintain a free-list.

> you can allocate a slab and pull from it if you want to. the existence of a GC doesn't preclude these possibilities

To take it further, you could just allocate one large chunk of memory from a garbage collected allocator and use a custom allocator - you can do this with any language. But you're not using the GC then.

inimino · on Sept 21, 2019

Why pick a language that has a feature you need to immediately turn off? Some people probably want to, but ... why?

vnorilo · on Sept 22, 2019

Many programmers are very emotional about their language of choice, especially if it is their only one.

fileeditview · on Sept 22, 2019

I guess you use every feature (insert any language here) offers for every program you write with it?

The answer to your question is probably: because they like the language, are productive in it, know the libraries and the feature can be turned off so it's an option.

inimino · on Sept 23, 2019

If I consistently don't use all the features, I pretty quickly start looking at simpler languages.

modeless · on Sept 21, 2019

> If you have a garbage collector that runs in 200us

The problem is GCs for popular languages are nowhere near this good. People will claim their GC runs in 200us, but it's misleading.

For example, they'll say they have a 200us "stop the world" time, but then individual threads can still be blocked for 10ms+. Or they'll quote an average or median GC time, when what matters is the 99.9th percentile time. If you run GC at every 120 Hz frame then you hit the 99.9th percentile time every minute.

Finally, even if your GC runs in parallel and doesn't block your game threads it still takes an unpredictable amount of CPU time and memory bandwidth while it's running, and can have other costs like write barriers.

celrod · on Sept 22, 2019

Benchmarking a full sweep with 0 objects to free in Julia:

  julia> @benchmark GC.gc()
  BenchmarkTools.Trial: 
    memory estimate:  0 bytes
    allocs estimate:  0
    --------------
    minimum time:     64.959 ms (100.00% GC)
    median time:      66.848 ms (100.00% GC)
    mean time:        67.062 ms (100.00% GC)
    maximum time:     73.149 ms (100.00% GC)
    --------------
    samples:          75
    evals/sample:     1

Julia's not a language normally used for real time programs (and it is common to work around the GC / avoid allocating), but it is the language I'm most familiar with.

Julia's GC is generational; relatively few sweeps will be full. But seeing that 65ms -- more than 300 times slower than 200us -- makes me wonder.

Test was on an i9 7900X.

tkoolen · on Sept 22, 2019

Yep. (You know this, but) just as another data point, an incremental pass takes more like 75 microseconds, and a 'lightly' allocating program probably won't trigger a full sweep (no guarantees though).

zemo · on Sept 21, 2019

these aren't theoretical numbers, they're the numbers that people are hitting in production. See this thread wrt gc pause times on a production service at twitter https://twitter.com/brianhatfield/status/804355831080751104 also referenced here, which talks at length about gc pause time distributions and pause times at the 99.99th percentile https://blog.golang.org/ismmkeynote

modeless · on Sept 21, 2019

> they'll say they have a 200us "stop the world" time, but then individual threads can still be blocked for 10ms+

As far as I know this is still true of the Go GC. Write barriers are also there and impact performance vs. a fixed size arena allocator that games often use that has basically zero cost.

notacoward · on Sept 22, 2019

It's also important to consider how often the GC runs. GC that runs in 200us but does so ten times within a frame deadline might as well be a GC that runs in 2ms. Then there are issues of contention, cache thrashing, GC-associated data structure overhead, etc. The impact of GC is a lot more than how long one pass takes, and focusing on that ignores many other reasons why many kinds of systems might avoid GC.

simias · on Sept 21, 2019

>The reality is you still have to free resources, so it's not like the garbage collector is doing work that doesn't need to be done.

Does it? In most game I expect resource management to be fairly straightforward, allocation and freeing of resources will mostly be tied to game events which already require explicit code. If you already have code for "this enemy is outside the area and disappears" is it really that much work to add "oh and by the way you might also free the associated resources while you're at it". I don't need a GC thread checking at random intervals "hey let's iterate over an arbitrary portion of memory for an arbitrary amount of time to see if stuff needs dropping yo!".

I realize that I'm quite biased though because I'm fairly staunchly in the "garbage collection is a bad idea that causes more problems than it solves" camp. It's a fairly extremist point of view and probably not the most pragmatic stance.

One place where it might not be quite as trivial would be for instance resource caching in the graphic pipeline. Figuring out when you don't need a certain texture or model anymore. But that often involves APIs such as OpenGL which won't be directly handled by the GC anyway, so you'll still need explicit code to manage that.

That being said I'd still chose (a reasonable subset of) C++ over C for game programming, if only to have access to ergonomic generic containers and RAII.

I write quite a lot of C. When I do I often miss generics, type inference, an alternative to antiquated header files and a few other things. I never miss garbage collection though (because it's a bad idea that causes more problems than it solves).

HeavyStorm · on Sept 22, 2019

It's NOT "a bad idea that causes more problems than it solves", but still may be a bad idea for the (games, real-time, etc) industry.

GC means not only that memory management is simpler, it means that it goes away for most of the programs out there.

Most programmers I know aren't writing code that run Twitter-like servers or avionics, nor are they programming the next Doom game. These people are writing apps and doing back end coding for some big company where real-time isn't an issue and the focus is on getting code out fast, with good average quality and "cheap" labor.

In this case, having a high level language/runtime that doesn't requirs a programmer that can reason about allocations is key.

I can't even begin to tell the kind of codebase that I have seen. Two years ago I was working on a C/C++ legacy system that was thousands of lines of codes and almost every file had memory leaks (that cppcheck could find itself, mind you). Some of them where caused by delivery deadlines, most of them where caused by unskilled employees.

(oh, in case you're wondering: all those leaks where solved by having ten times the hardware and a scheduled restart of the servers)

bsder · on Sept 22, 2019

> In most game I expect resource management to be fairly straightforward, allocation and freeing of resources will mostly be tied to game events which already require explicit code.

Even if you are perfect, you will still fragment over time. At some point, you must move a dynamically allocated resource and incur all the complexity that goes with that--effectively ad hoc garbage collection.

There are only two ways around this:

1) You have the ability to do something like a "Loading" screen where you can simply blow away all the previous resources and re-create new ones.

2) Statically allocate all memory up front and never change it.

stjohnswarts · on Sept 21, 2019

I think your (because it's a bad idea that causes more problems than it solves) needs qualification via "in some cases". It certainly is handy and well used in other applications and speeds up development time as well as prevents various developer bugs.

doomlaser · on Sept 21, 2019

Unity uses/used the standard boehm garbage collector [1] for over a decade, and it has been notorious for causing GC lag in games produced from that engine, noticeable in occasional sudden spikes of dropped framerate while the GC does a sweep at a layer of abstraction higher than Unity game developers can control directly.

People went to extreme measures to avoid allocating memory in their games: manually pooling every in-game object & particle, not using string comparisons in C#, etc https://danielilett.com/2019-08-05-unity-tips-1-garbage-coll...

Unity itself finally has a new system they're previewing to average out the GC spikes over time, so a game, say, never drops below 60fps: https://blogs.unity3d.com/2018/11/26/feature-preview-increme...

As well, there is a new way of writing C# code for Unity called ECS that will avoid producing GC sweeps https://docs.unity3d.com/Packages/com.unity.entities@0.1/man...

[1[ https://github.com/ivmai/bdwgc

jayd16 · on Sept 22, 2019

That Unity was using that ancient collector was the main complaint. Switching to an incremental GC is long overdue.

As for pooling objects, you'd go to those "extreme measures" as a matter of course in any other language as well. You wouldn't want to alloc and free every frame no matter the language.

doomlaser · on Sept 22, 2019

Yes. the boehm collector is the root of the problem for Unity's runtime. It's just the verbatim code from that repository in fact. But in other languages, such as C with standard libraries, you can still do common things like comparing stings and calling printf() without inadvertently triggering a dreaded GC sweep. Not so in Unity's C#.

And allocating memory is fine during runtime when you are in control of the allocation and the cleanup, whereas in Unity, the sweeps are fully out of your control, expensive, and will just sometimes happen in the middle of the action

jayd16 · on Sept 22, 2019

You can just set the GC mode to disabled and run it manually.

https://docs.unity3d.com/ScriptReference/Scripting.GarbageCo...

doomlaser · on Sept 23, 2019

That API is new and was just added to 2018.3 in December, around the same time they started previewing the incremental garbage collector and promoting ECS.

From Unity:

> Being able to control the garbage collector (GC) in some way has been requested by many customers. Until now there has been no way to avoid CPU spikes when the GC decided to run. This API allows you disable the GC and avoid the CPU spikes, but at the cost of managing memory more carefully yourself.

It only took them 14 years and much hand wringing from both players and developers to address :)

A similar thing is going on with their nested scene hierarchy troubles, also releasing in 2018.3 with their overhaul to the prefab system, to sort of support what they call "prefabs" having different "prefabs" in their hierarchy without completely obliterating the children. What they have now is not ideal, but they're working on it.

Prior to that, if you made. say, a wheel prefab and a car prefab, as soon as you put the wheels into your car prefab, they lost all relation to their being a wheel, such that if you updated the wheel prefab later, the car would still just have whatever you had put into the car hierarchy originally, which naturally has been the source of endless headaches and hacky workarounds for many developers.

jayd16 · on Sept 23, 2019

All true. However, even before you could disable the GC, you could ignore it as long as you weren't allocating and hit framerate. Unity isn't magic but the way people talk it's like Unity games don't exist.

doomlaser · on Sept 23, 2019

But things like this would happen, even from acclaimed developers, giving the engine a bad reputation among players:

> The frame-rate difficulties found in version 1.01 are further compounded by an issue common with many Unity titles - stuttering and hitching. In Firewatch, this is often caused by the auto-save feature, which can be disabled, but there are plenty of other instances where it pops up on its own while drawing in new assets. When combined with the inconsistent frame-rate, the game can start to feel rather jerky at times.

That studio is part of Valve now!

> Games built in Unity have a long history of suffering from performance issues. Unstable frame-rates, loading issues, hitching, and more plague a huge range of titles. Console games are most often impacted but PC games can often suffer as well. Games such as Galak-Z, Roundabout, The Adventures of Pip, and more operate with an inherent stutter that results in scrolling motion that feels less fluid than it should. In other cases, games such as Grow Home, Oddworld: New 'n' Tasty on PS4, and The Last Tinker operate at highly variable levels of performance that can impact playability. It's reached a point where Unity games which do run well on consoles, such as Ori and the Blind Forest or Counter Spy, are a rare breed.

https://www.eurogamer.net/articles/digitalfoundry-2016-firew...

https://www.eurogamer.net/articles/2016-02-10-firewatch-dev-...

Hopefully as the engine continues to improve dramatically, this kind of thing will be left in the past

jayd16 · on Sept 23, 2019

If you continue reading that article, they go on to say Unity is not to blame...

There's a lot of reasons shipping Unity games is hard but the GC and C# are not among them or at least much lower than, say, dealing with how many aspects of the engine will start to run terribly as soon as an artist checks a random checkbox.

doomlaser · on Sept 25, 2019

> In its current iteration, console developers familiar with the tech tell us that the engine struggles with proper threading, which is very important on a multi-core platform like PlayStation 4.

> This refers to the engine's ability to exploit multiple streams of instructions simultaneously. Given the relative lack of power in each of the PS4's CPU cores, this is crucial to obtaining smooth performance. We understand that there are also issues with garbage collection, which is responsible for moving data into and out of memory - something that can also lead to stuttering. When your game concept starts to increase in complexity the things Unity handles automatically may not be sufficient when resources are limited.

wruza · on Sept 21, 2019

Almost a decade ago I was writing a desktop toolkit in Lua 5.1 and gdk/cairo bindings. Let’s say for fun, because it never seen the light in planned business. But it had animations and geometry dynamics (all soft, no hw accel). While GC seems to be fast and data/widget count was small, it suddenly froze every dozen of seconds for a substantial amount of time. First thing I tried was to trigger a [partial] collection after every event, but what happened was (I believe) incremental^ steps still accumulated into one big stop. Also I followed all critical paths and made allocations static-y. It got better, but never resolved completely. I didn’t investigate it further, and my hw was far from bleeding edge, thus no big conclusion. But since then I’ll think twice before doing something frametime-critical with GC. Each specific implementation requires a handful of wild tests at least before considering as an option in my book.

As others probably already mentioned, worst side of gc is that it is unpredictable and hard to be forced in a way that matches your specific pattern. With manual/raii mm you can make pauses predictable and non-accumulating collection debt and fit before “retrace” perfectly. Also simply relying on automatic storage in time critical paths is usually “can not” in gc envs.

^ if any, I can’t recall if 5.1 actually implemented true incremental gc back then

jayd16 · on Sept 22, 2019

>worst side of gc is that it is unpredictable

This is simply not the case. Its still just code after all. The problem was you were fighting the GC but that's just the symptom. The clear problem was leaking something every frame. With all the tooling these days its pretty easy to see exactly what is getting allocated and garbage collected so you know where to focus your efforts.

wruza · on Sept 22, 2019

I started with C actually, but good ui is hard, and I decided to use Lua for a reason of not writing it all in C and hand-optimized style. I was forced to write ugly non-allocating code anyway, which defeated that idea completely. To focus your efforts, there must be a reason for these efforts in the first place.

jayd16 · on Sept 22, 2019

But you must understand that lots of games are able to make this work. Call of Duty, for example, uses Lua for its UI scripting and is able to hit framerate. WoW uses Lua as well. Its not impossible or even impractical. There's plenty of reason to use a higher level language even if your core loops are hand optimized.

georgeecollins · on Sept 21, 2019

>> The reality is you still have to free resources,

Not exactly. Here is how the early PC 3D games I worked on did that: They would have a fixed size data buffer initialized for each particular thing you needed a lot of, such as physics info, polygons, path data, in sort of a ring buffer. A game object would have a pointer to each segment of that data it used. If you removed a game object you would mark the segment the game object pointed to as unused. When a new object was created you would just have a manager that would return a pointer to a segment from the buffer that was dirty that the new object would overwrite with data. Memory was initialized at load and remained constant.

One problem with doing things like that is that you would have fixed pool. So there were like 256 possible projectiles in Battlezone(1998) in the world at any time and if something fired 257th an old one just ceased to exist. Particles systems worked that way as well.

What was good about that was that you could perform certain calculations relatively fast because all the data was the same size and inline, so it was easy to optimize. I worked on a recent game in C# and the path finding was actually kind of slow even though the processor the game ran on was probably like 100 times (or more) faster. I understand there are ways to get C# code to create and search through a big data structure as fast as the programmers had to do it in C in the 90's. However it would probably involve creating your own structures rather than using standard libraries, so no one did it like that.

EdwardDiego · on Sept 22, 2019

> So there were like 256 possible projectiles in Battlezone(1998)

Just have to say, loved that game to bits.

georgeecollins · on Sept 22, 2019

I think that, the thing I know for sure was there was 1,024 total physics objects, which was tanks and pilots active in the world at any time. So if you built a bunch of APCs and launched them all at a target at the same time, at some point you wouldn't be able to spawn soldiers. No one seemed to mind because in those days the bar was lower.

emmericp · on Sept 21, 2019

Not a game developer, but I used to write UI addons for World of Warcraft. WoW allows you to customize your UI heavily with these Lua plugins ("addons") and Lua is garbage collected. It's a reasonable incremental GC so it shouldn't be too bad in theory.

But in practice it can be horrible. You end up writing all kinds of weird code just to avoid allocations in certain situations. And yeah, GC pauses due to having lots of addons is definitely very noticable for players.

Also leads to fun witch hunts on addons using "too much" memory, people consider a few megabytes a lot because they confuse high memory usage with high allocation rate... Our stuff started out as "lightweight" but it grew over the years. We are probably at over 5 MB of memory with Deadly Boss Mods (many WoW players can attest that it's certainly not considered lightweight nowadays, but I did try to keep it low back then). But I think we still do a reasonable job at avoiding allocations and recycling objects...

The point is: I had to spent a lot of time thinking about memory allocations and avoiding them in a system that promised me to handle all that stuff for me. Frame drops are very noticable. But there are languages with better GCs than Lua out there...

Somewhat related: I recently wrote about garbage collectors and latency in network drivers in high-level languages: https://github.com/ixy-languages/ixy-languages Discussed here: https://news.ycombinator.com/item?id=20945819

That's a scenario were short GC pauses matter even more as the hardware usually only buffers a few milliseconds worth of data at high speeds.

forrestthewoods · on Sept 21, 2019

I’ve shipped C++ games and Unity games. I have never spent more time managing memory than in C#. You have to jump through twisted, painful hoops to avoid the GC.

Memory management is ultimately far simpler and easier in C++ than in C#.

jimmaswell · on Sept 21, 2019

I do gamedev in Unity and avoiding garbage creation feels easy enough to me for the most part.

forrestthewoods · on Sept 21, 2019

It could be worse. Once upon a time a common "trick" was to avoid using foreach loops because they generated garbage. LINQ also generated garbage. I'm not sure if the .net 4.5 upgrade fixed that one.

Ultimately you wind up doing the exact same thing in C# that you'd do in C++. Aggressive pooling, pre-allocation, etc. A handful of Unity constructs generate garbage and can't be avoided. I believe enable/disabling a Unity animator component, which you'd do when pooling, is one such example.

It's all just a little extra painful because you to spend all this time avoiding something the language is built around. Trying to hit 0 GC is annoying.

It also means, somewhat ironically, is that GC makes the cost of allocation significantly higher than a C++ allocation. In C++ your going to pay a small price the moment you allocate. But you know what? That's fine! Allocate some memory, free some memory. It ain't free, but it is affordable.

In Unity generating garbage means at some random point in the future you are going to hitch and, most likely, miss a frame. And that's if you're lucky! If you're unlucky you may miss two frames.

zemo · on Sept 21, 2019

yeah the Go garbage collector is a LOT more sophisticated than Lua. I might be playing WoW Classic right now ...

forrestthewoods · on Sept 21, 2019

> I love this opinion from games programmers because they never qualify it and talk about what their latency budgets are and what they do in lieu of a garbage collector.

Bro, game devs talk about this non-stop. There are probably 1000 GDC talks about memory management.

Game devs don't spell the fine details because they are generally talking to other game devs and there is assumed knowledge. Everyone in games knows about memory management and frame budgets.

> If you have a garbage collector that runs in 200us, you could run a GC on every single frame and use less than 3% of your frame budget on the GC pause.

And if pigs could fly the world would be different. ;)

Go has a super fast GC now. It had a trash GC (teehee) for many many years. But Go is not used for game dev client front-ends. If C# / Unity had an ultra fast GC that used only 3% of a frame budget that would be interesting. But Unity has a trash GC that can easily take 10+ milliseconds. (Their incremental GC is still experimental.) It's a problem that literally every Unity dev has to spend considerable resources to manage.

For 50+ years GC has been a poor choice for game dev. Maybe at some point that will change! The status quo is that GC sucks for games. The onus is on GC to prove that it doesn't suck.

nearbuy · on Sept 22, 2019

Unity is frustrating because the regular C# CLR (and Mono) have a much better generational GC with fast Gen 0 and Gen 1 collections. Unity's crappy GC is always slow.