Hey, I am going to show my stupidity for a moment but I have to ask: Why is garb...

danbolt · on Jan 31, 2020

I write C++ for video games, and garbage collection is often shunned in performance-critical systems since it's often nondeterministic and GC pauses can become too long. I think garbage collection could work in games, but it's commonly implemented for use cases that are not video games (eg: servers, desktop applications, etc.)

A programmer writing managed C# in an engine like Unity will often spend a lot of programming time ensuring that the code will not allocate every frame, as the additions to the heap will eventually trigger a pause.

That said, every game and its requirements are different, and some game development might not mind that as much. A C++ engine programmer on a Nintendo Switch is in a very different situation than a hobbyist in JavaScript or a server backend programmer on a mobile game.

pjmlp · on Feb 1, 2020

Just like doing virtual calls or using OS memory allocator used to be shunned by early C++ adopters on the game industry.

I still remember when doing games in Basic/Pascal/C was considered to be like Unity nowadays, real games had to be written in Assembly.

As you say, every game and its requirements are different, and many 8/16 bit games were perfectly doable in C, Pascal, Basic and eventually the community moved along, just like it happened with C vs C++ a couple of years later.

I see the use of GC enabled languages the same way, and C# belongs to those languages that also offer other means of memory allocation, not everything needs to live on the GC heap.

danbolt · on Feb 1, 2020

I think you’re absolutely right here. The reason people often dismiss garbage collection in game programming is the pauses, but if the pauses aren’t noticeable then the reason to dismiss goes away. Computer performance gains over time can totally help dismiss that, much akin to virtual calls or default allocators. The writing on the wall was there after games like Minecraft became a huge hit.

billforsternz · on Jan 31, 2020

Garbage collection is not so much considered a negative thing, but a thing that's inappropriate for the embedded domain. The problem is that garbage collection entails some system code periodically scanning lists of memory allocations to identify stuff that's now garbage that can be recycled. Embedded Devs worry about the scheduling of that code, and how long it could take to run worst case, and whether it will spoil their real time guarantees. There are various mitigation strategies, but for good or evil many individuals and organisations apply a simple "no, we're not going to use GC ever" policy.

gavinray · on Feb 1, 2020

@danbolt @billforsternz

Thank you guys for the response, super appreciate it.

I guess, I can understand from an abstract perspective that you can manually tune performance and optimize to a higher degree if you can control memory allocation yourself.

And for a lot of purposes where performance is imperative, like games or embedded devices it can make or break the ability of software to function properly.

But my question then is, if languages like Crystal, Nim, or D (or any other GC lang with similar speed) can operate either at/near the performance of C, why exactly do you need manual memory management?

And if you do need it, I assume many languages that cater to this audience provide some sort of symbolic annotation that allow you to manually control GC where you feel you need it, aye?

scott00 · on Feb 1, 2020

I think you are correct in your basic assertion that no one wants manual memory management for its own sake. What they really want is sufficient performance for their use case. The benchmarks you usually see are throughput oriented, and on small heaps. If you have tight latency budgets and/or huge heaps, the performance is not close.

Optional manual memory management sounds great, but I'm skeptical it would work well in practice. The reason is that if the language default is GC, libraries won't be designed for manual memory management, meaning it will be hard for your manual code to interact with data structures created by non-manual parts.

dotbmp · on Feb 1, 2020

"Near C" performance is often not good enough, and usually misleading. You can write poorly performing applications in C, and certain benchmarks may favor or disfavor certain elements of a language. Generally they're created to be "similarly written" in all benchmarked languages, which may seem like the fairest comparison at face value. But what that means is that they are often naively written in one or more of the languages. Expertly written, hand-tailored-to-the-problem-domain C code is almost always going to outperform other languages by a significant margin, especially languages without manual memory management. You can do things in C like use arena allocators to significantly reduce memory performance overhead - things which require low-level control and a non-naive understanding of the problem domain. Garbage collectors can be quite performant, but they aren't capable of this kind of insight. Code that is written in C similarly to a garbage collected language will be similarly naive (another malloc call for each and every allocated thing, versus allocating out of an arena, for instance).

billforsternz · on Feb 1, 2020

As I said, mitigation strategies exist, including manual control of GC etc. It's not true that using GC is universally impossible in embedded / real-time situations. It is true that it can cause performance and non-determinism issues (which are potentially solvable), and it's also true that some developers avoid GC so they don't have to deal with those potential issues. They would prefer to deal with the issues associated with manual memory management.

Who's to say who's right and who's wrong? Ultimately life (and the subset of life that is software development) is a massively complex strategy and tactics game with a myriad of possible playing strategies and no agreed perfect solution.

Reelin · on Feb 1, 2020

> if languages like ... can operate either at/near the performance of C

That depends entirely on how you define and measure performance. If total throughput is your metric, then it's no problem - for example, Go is perfectly acceptable for web services.

Predictability of latency, however, is absolutely _not_ on par with C code. For example, 3D rendering with a GC can easily result in perceptible stuttering if care isn't taken to minimize allocations and manually trigger the GC at appropriate times.

> some sort of symbolic annotation that allow you to manually control GC

It's not that simple. D tried to sell this at one point, but it just doesn't work for large multithreaded programs and things aren't single threaded these days. Manually controlling a global GC means manually balancing, for example, one block of threads that perform lots of allocations and deallocations (and will starve if the GC doesn't run regularly) with soft real time networking code and hard real time VR rendering code. And (for example) you certainly don't want your rendering loop pausing to scan the _entire heap_ (likely multiple gigabytes) on each frame! Alternatively, in the case of Go (and depending on your particular workload) you might not appreciate the concurrent GC constantly trashing the caches.

Custom allocators and non-atomic reference counting are fantastic though.

pjmlp · on Feb 1, 2020

Religon against GC and cargo cult.

Several companies have been selling Java, Oberon and now Go runtimes targeted to bare metal deployment on embedded scenarios.

Some of them are more than 20 years old, so apparently they might have one or two customers keeping them alive.

The hate against GC feels like the hate against high level languages on 8 and 16 bit platforms back in the day, because anyone doing "serious" stuff naturally could only consider Assembly as a viable option.

renox · on Feb 1, 2020

Being able to use a GC in some embedded cases (not too hard constraints on memory use or latency), doesn't mean that you're able to use GC in every embedded cases. I work on telecoms just above the FPGA/DSP even a 1ms pause would be a big issue.

pjmlp · on Feb 1, 2020

Agreed, however there is a big difference between stating that it doesn't work at all, and accepting that there are plenty of use cases where having a soft real time GC tailored for embedded development is perfectly fine, and actually does improve productivity.

Since you mention telecommunications, I would consider network switches running Erlang a use case of embedded development.

Other examples would be the Gemalto M2M routers for messaging processing, or some of the NSN base station reporting platform.

So while it doesn't fit your scenario, it does fit other ones, this is what some in anti-GC field need to realise.

This isn't an all or nothing equation.

pkolaczk · on Feb 1, 2020

Because garbage collection, and in particular tracing garbage collection, adds significant overhead both in CPU cycles and memory. This overhead is also very unpredictable and depends heavily on memory allocation and object lifecycle patterns. Simple GCs can pause the program for a very long time, proportional to the size of the used memory, and this may be several tens of seconds for large heaps, so quite unacceptable. There are ways to mitigate these long pauses with incremental or concurrent GC, but they increase complexity of the runtime system and have even more average overhead, and although in the average case they may perform acceptably, they tend to have very complex failure modes. In addition to that, a tracing GC typically needs some additional memory "room" to operate, so programs using GC tend to use much more memory than really needed.

There is also a common misbelief that compacting GC helps make heap allocations faster than malloc. While technically true - the allocation itself is simple and fast, because it is only a pointer bump, a problem occurs immediately afterwards - this new heap memory hasn't been touched since the last GC, and it is very likely not cached. Therefore you get a cache miss immediately after the allocation (managed runtimes initialize memory on allocation for safety). Because of that, even allocating plenty of short-lived objects, which is the best case for GC, is not actually faster than a pair of malloc+free.

There are also other overheads:

* Managed runtimes typically use heap for most allocations and make stack allocation harder or not possible in all cases - e.g. it is much harder to write Java code with no heap allocations than C.

* To facilitate GC, objects need additional word or two words of memory - e.g. for mark flags or reference counts. This makes cache locality worse and increases memory consumption.

* During heap scanning, a lot of memory bandwidth is utilized. Even if GC does that concurrently and doesn't pause the app, this process has significant impact on performance.

* Tracing GC prevents rarely used parts of the heap to be swapped out.

typ · on Feb 1, 2020

At least for my use scenario in embedded systems, performance is not necessarily worse with GC and nondeterminism is not a showstopper either. The problem is avoidable by proactively minimizing allocations in the hot paths or arranging 'critical sections' that disable GC temporarily. The deal-breaker is the memory footprint.