Hacker News new | past | comments | ask | show | jobs | submit login
Generics can make your Go code slower (planetscale.com)
449 points by tanoku on March 30, 2022 | hide | past | favorite | 408 comments



I'd argue that golang is inherently not a systems language, with its mandatory GC managed memory. I think it's a poor choice for anything performance or memory sensitive, especially a database. I know people would disagree (hence all the DBs written in golang these days, and Java before it), but I think C/C++/Rust/D are all superior for that kind of application.

All of which is to say, I don't think it matters. Use the right tool for the job - if you care about generic overhead, golang is not the right thing to use in the first place.


I have a slightly contrary opinion. Systems software is a very large umbrella, and much under that umbrella is not encumbered by a garbage collector whatsoever. (To add insult to injury, the term's definition isn't even broadly agreed upon, similar to the definition of a "high-level language".) Yes, there are some systems applications where a GC can be a hindrance in practice, but these days I'm not even sure it's a majority of systems software.

I think what's more important for the systems programmer is (1) the ability to inspect the low-level behavior of functions, like through their disassembly; (2) be reasonably confident how code will compile; and (3) have some dials and levers to control aspects of compiled code and memory usage. All of these things can and are present, not only in some garbage collected languages, but also garbage-collected languages with a dynamic type system!

Yes, there are environments so spartan and so precision-oriented that even a language's built-in allocator cannot be used (e.g., malloc), in which case using a GC'd language is going to be an unwinnable fight for control. But if you only need to do precision management of a memory that isn't pervasive in all of your allocation patterns, then using a language like C feels like throwing the baby out with the bath water. It's very rarely "all or nothing" in a modern, garbage-collected language.


The article is from a database company, so I'll assume that approximates the scope. My scope for the GC discussion would include other parts that could be considered similar software: cluster-control plane (Kubernetes), other databases, and possibly the first level of API services to implement a service like an internal users/profiles or auth endpoints.

The tricky thing is GC works most of the time, but if you are working at scale you really can't predict user behavior, and so all of those GC-tuning parameters that were set six months ago no longer work properly. A good portion of production outages are likely related to cascading failures due to too long GC pauses, and a good portion of developer time is spent testing and tuning GC parameters. It is easier to remove and/or just not allow GC languages at these levels in the first place.

On the other hand IMO GC-languages at the frontend level are OK since you'd just need to scale horizontally.


> A good portion of production outages are likely related to cascading failures due to too long GC pauses, and a good portion of developer time is spent testing and tuning GC parameters

After 14 years in JVM dev in areas where latency and reliability are business critical, I disagree.

Yes, excessive GC stop the world pauses can cause latency spikes, and excessive GC time is bad, and yes, when a new GC algorithm is released that you think might offer improvements, you test it thoroughly to determine if it's better or worse for your workload.

But a "good portion" of outages and developer time?

Nope. Most outages occur for the same old boring reasons - someone smashed the DB with an update that hits a pathological case and deadlocks processes using the same table, a DC caught fire, someone committed code with a very bad logical bug, someone considered a guru heard that gRPC was cool and used it without adequate code review and didn't understand that gRPC's load balancing defaults to pick first, etc. etc.

The outages caused by GC were very very few.

Outages caused by screw-ups or lack of understanding of subtleties of a piece of tech, as common as they are in every other field of development.

Then there's the question of what outages GCed languages _don't_ suffer.

I've never had to debug corrupted memory, or how a use after free bug let people exfiltrate data.


> I've never had to debug corrupted memory

You're lucky! When OpenJDK was still closed-sourced Hotspot from Sun, we have chased bugs that Sun confirmed was a defect on how Hotspot handle memory (and this is on a ECC'd system of course), although these days I can't remind of anything remotely related.

> or how a use after free bug let people exfiltrate data.

Technically you're just outsourcing it :)


Yeah, have only ever hit one or two JVM bugs in very rare circumstances - which we usually fixed by upgrading.

> Technically you're just outsourcing it :)

Haha, very true. Luckily, to developers who are far better at that stuff than the average bear.

The recent log4j rigmarole is a great example of what I was describing in JVM dev though - no complicated memory issues involved, definitely not GC related, just developers making decisions using technologies that had very subtle footguns they didn't understand (the capacity to load arbitrary code via LDAP was, AFAIK, very poorly known, if not forgotten, until Log4Shell).


> You're lucky! When OpenJDK was still closed-sourced Hotspot from Sun, we have chased bugs that Sun confirmed was a defect on how Hotspot handle memory (and this is on a ECC'd system of course), although these days I can't remind of anything remotely related.

I mean sure. I remember having similar issues with early (< 2.3) Python builds as well. But in the last decade of my career, only a handful of outages were caused by Java GC issues. Most of them happened for a myriad of other architectural reasons.


> After 14 years in JVM dev in areas where latency and reliability are business critical

What sort of industry/use cases are we talking here? There is business critical and mission critical and if your experience is in network applications as your next paragraph seems to imply then no offence, but you have never worked with critical systems where an nondeterministic GC pause can send billions worth of metal into the sun or kill people.


Um, how did you derive from this conversation that the "outages" in question were about space missions failing?

Curious, and a tad confused.


The parent of this thread being is about systems languages and how GC languages are rarely the right tool.

Then your comment about having experience that the GC doesn't really matter in critical environments, which I personally find not to be true at all but am interested in which domain your comment is based on.

So please answer the original question.


Go doesn’t offer a bunch of GC tuning parameters. Really only one parameter, so your concerns about complex GC tuning here seem targeted at some other language like Java.

This is a drawback in some cases, since one size never truly fits all, but it dramatically simplifies things for most applications, and the Go GC has been tuned for many years to work well in most places where Go is commonly used. The developers of Go continue to fix shortcomings that are identified.

Go’s GC prioritizes very short STWs and predictable latency, instead of total GC throughput, and Go makes GC throughput more manageable by stack allocating as much as it can to reduce GC pressure.

Generally speaking, Go is also known for using very little memory compared to Java.


Java _needs_ lots of GC tuning parameters because you have practically no way of tuning the way your memory is used and organized in Java code. In Go you can actually do that. You can decide how data structures are nested, you can take pointers to the inside of a a block of memory. You could make e.g. a secondary allocator, allocating objects from a contiguous block of memory.

Java doesn't allow those things, and thus it must instead give you lots of levers to pull on to tune the GC.

It is just a different strategy of achieving the same thing:

https://itnext.io/go-does-not-need-a-java-style-gc-ac99b8d26...


That's the Go party line but not really true.

Counter-example: The Go GC is tuned for HTTP servers at latency sensitive companies like Google. It therefore prioritizes latency over throughput to an astonishing degree, which means it is extremely bad at batch jobs - like compilers.

What language is the Go compiler written in? Go.

This isn't fixable by simply writing the code differently. What you're talking about is in the limit equivalent to not using a GCd language at all, and you can do that with Java too via the Unsafe allocators. But it's not a great idea to do that too much, because then you may as well just bite the bullet and write C++.

Java doesn't actually need lots of GC tuning parameters. Actually most of the time you can ignore them, because the defaults balance latency and throughput for something reasonable for the vast majority of companies that aren't selling ad clicks. But, if you want, you can tell the JVM more about your app to get better results like whether it's latency or throughput sensitive. The parameters are there mostly to help people with unusual or obscure workloads where Go simply gives up and says "if you have this problem, Go is not for you".


> it is extremely bad at batch jobs - like compilers. > What language is the Go compiler written in? Go.

I do not see what are you trying to say?

The Go compiler is plenty fast in my experience, especially compared to say `javac`. The startup time of `javac` (and most java programs) is atrocious.


You can't compare throughput of two totally different programs to decide whether an individual subsystem is faster. The Go compiler is "fast" because it doesn't do much optimization, and because the language is designed to be simple to compile at the cost of worse ergonomics. Why can't it do much optimization? Partly due to their marketing pitch around fast compilers and partly because Go code runs slowly, because it's using a GC designed to minimize pause times at the cost of having lots of them.

The algorithmic tradeoffs here are really well known and there are throughput comparisons of similar programs written in similar styles that show Go falling well behind. As it should, given the choices made in its implementation, in particular the "only one GC knob" choice.


Yes, my comments were targeted to Java and Scala. Java has paid the bills for me for many years. I'd use Java for just about anything except for high load infrastructure systems. And if you're in, or want to be in, that situation, then why risk finding out two years later that a GC-enabled app is suboptimal?

I'd guess you'd have no choice if in order to hire developers, you had to choose a language that the people found fun to use.


Is go's GC not copying/generational? I think "stack allocation" doesn't really make sense in a generational GC, as everything sort of gets stack allocated. Of course, compile-time lifetime hints might still be useful somehow.


> Is go's GC not copying/generational?

Nope, Go does not use a copying or generational GC. Go uses a concurrent mark and sweep GC.

Even then, generational GCs are not as cheap as stack allocation.


Stack's just the youngest generation.

I can see the difference being that you have to scan a generation but the entire stack can be freed at once, but it still seems like an overly specific term. The general term elsewhere for multiple allocations you can free at the same time is "arenas".


From a conceptual point of view, I agree, but... in practice, stacks are incredibly cheap.

The entire set of local variables can be allocated with a single bump of the stack pointer upon entry into a function, and they can all be freed with another bump of the stack pointer upon exit. With heap allocations, even with the simplest bump allocator.. you still have to allocate once per object, which can easily be an order of magnitude more work than what you have to do with an equivalent number of stack allocated objects. Your program doesn't magically know where those objects are, so it still also has to pay to stack allocate the pointers to keep track of the heap objects. Then you have the additional pointer-chasing slowing things down and the decreased effectiveness of the local CPU caches due to the additional level of indirection. A contiguous stack frame is a lot more likely to stay entirely within the CPU cache than a bunch of values scattered around the heap.

Beyond that, and beyond the additional scanning you already mentioned, in the real world the heap is shared between all threads, which means there will be some amount of contention whenever you have to interact with the heap, although this is amortized some with TLABs (thread-local allocation buffers). You also have to consider the pointer rewriting that a generational GC will have to perform for the survivors of each generation, and that will not be tied strictly to function frames. The GC will run whenever it feels like it, so you may pay the cost of pointer rewriting for objects that are only used until the end of this function, just due to the coincidence of when the GC started working. I think (but could be wrong/outdated) that generational GCs almost all require both read and write barriers that perform some synchronization any time you interact with a heap allocated value, and this slows the program down even more compared to stack objects. (I believe that non-copying GCs don't need as many barriers, and that the barriers are only active during certain GC phases, which is beneficial to Go programs, but again, stack objects don't need any barriers at all, ever.)

GCs are really cool, but stack allocated values are always better when they can be used. There's a reason that C# makes developer-defined value types available; they learned from the easily visible problems that Java has wrestled with caused by not allowing developers to define their own value types. Go took it a step further and got rid of developer-defined reference types altogether, so everything is a value type (arguably with the exception of the syntax sugar for the built-in map, slice, channel, and function pointer types), and even values allocated behind a pointer will still be stack allocated if escape analysis proves it won't cause problems.


> you still have to allocate once per object

Why can’t you bump once to allocate enough space for multiple objects?


You can’t do it that way because each object has to have its own lifetime.

If you allocate them all as a single allocation, then the entire allocation would be required to live as long as the longest lived object. This would be horribly inefficient because you couldn’t collect garbage effectively at all. Memory usage would grow by a lot, as all local variables are continuously leaked for arbitrarily long periods of time whenever you return a single one, or store a single one in an array, or anything that could extend the life of any local variable beyond the current function.

If you return a stack variable, it gets copied into the stack frame of the caller, which is what allows the stack frame to be deallocated as a whole. That’s not how heap allocations work, and adding a ton of complexity to heap allocations to avoid using the stack just seems like an idea fraught with problems.

If you know at compile time that they all should be deallocated at the end of the function… the compiler should just use the stack. That’s what it is for. (The one exception is objects that are too large to comfortably fit on the stack without causing a stack overflow.)


> If you allocate them all as a single allocation, then the entire allocation would be required to live as long as the longest lived object.

No, you can copy objects out if they live longer than others. That's how a generational GC works.


I edited my comment before you posted yours. Making heap allocations super complicated just to avoid the stack is a confusing idea.

Generational GCs surely do not do a single bump allocate for all local variables. How could the GC possibly know where each object starts and ends if it did it as a single allocation? Instead, it treats them all as individual allocations within an allocation buffer, which means bumping for each one separately. Yes, they will then get copied out if they survive long enough, but that’s not the same thing as avoiding the 10+ instructions per allocation.

It’s entirely possible I’m wrong when it comes to Truffle, but at a minimum it seems like you would need arenas for each size class, and then you’d have to bump each arena by the number of local variables of that size class. The stack can do better than that.


When you allocate objects individually it looks like this:

    object_a = tlab
    tlab += 8
    check tlab limit
    object_b = tlab
    tlab += 8
    check tlab limit
When you allocate as a single allocation it looks like this:

    object_a = tlab
    object_b = tlab + 8
    tlab += 16
    check tlab limit
What about that do you see as impossible?

> How could the GC possibly know where each object starts and ends if it did it as a single allocation?

As above.


In your example, the objects are all the same size. That would certainly be easy.

If you have three local objects that are 8 bytes, 16 bytes, and 32 bytes… if you do a single 48 byte allocation on the TLAB, how can the GC possibly know that there are three distinct objects, when it comes time to collect the garbage? I can think of a few ways to kind of make it work in a single buffer, but they would all require more than the 48 bytes that the objects themselves need. Separate TLAB arenas per size class seem like the best approach, but it would still require three allocations because each object is a different size.

I understand you’re some researcher related to Truffle… this is just the first I’m hearing of multiple object allocation being done in a single block with GC expected to do something useful.


> If you have three local objects that are 8 bytes, 16 bytes, and 32 bytes… if you do a single 48 byte allocation on the TLAB, how can the GC possibly know that there are three distinct objects, when it comes time to collect the garbage?

Because the objects are self-describing - they have a class which tells you their size.

    object_a = tlab
    object_a.class = ClassA
    object_b = tlab + 8
    object_b.class = ClassB
    object_c = tlab + 16
    object_c.class = ClassC
    tlab += 48
    check tlab limit


Ok, so it’s not as simple as bumping the TLAB pointer by 48. Which was my point. You see how that’s multiple times as expensive as stack allocating that many variables? Even something as simple as assigning the class to each object still costs something per object. The stack doesn’t need self describing values because the compiler knows ahead of time exactly what every chunk means. Then the garbage collector has to scan each object’s self description… which way more expensive than stack deallocation, by definition.

You’re extremely knowledgeable on all this, so I’m sure that nothing I’m saying is surprising to you. I don’t understand why you seem to be arguing that heap allocating everything is a good thing. It is certainly more expensive than stack allocation, even if it is impressively optimized. Heap allocating as little as necessary is still beneficial.


Every stack allocation scheme I've seen creates a fully reified object though, with a class pointer as normal.

You may be confusing with scalar replacement of aggregates, which is a separate concept.

https://chrisseaton.com/truffleruby/seeing-escape-analysis/


Go does not put class pointers on stack variables. Neither does Rust or C. The objects are the objects on the stack. No additional metadata is needed.

The only time Go has anything like a class pointer for any object on the stack is in the case of something cast to an interface, because interface objects carry metadata around with them.

These days, Go doesn’t even stack allocate all non-escaping local variables… sometimes they will exist only within registers! Even better than the stack.


> These days, Go doesn’t even stack allocate all non-escaping local variables… sometimes they will exist only within registers!

Did you read the article I linked? That's what it says - and this isn't stack allocation, it's SRA. Even 'registers' is overly constrained - they're dataflow edges.


Go isn't just doing SRA, as far as I understood it from your article, though it is certainly doing that too. Go will happily allocate objects on the stack with their full in-memory representation, which does not include any kind of class pointer.

Here is a contrived example that creates a 424 byte struct: https://godbolt.org/z/5cKh7xTzq

As can be seen in the disassembly, the object created in "Process" does not leave the stack until it is copied to the heap in "ProcessOuter" because ProcessOuter is sending the value to a global variable. The on-stack representation is the full representation of that object, as you can also see by the disassembly in ProcessOuter simply copying that directly to the heap. (The way the escape analysis plays into it, the copying to the heap happens on the first line of ProcessOuter, which can be confusing, but it is only being done there because the value is known to escape to the heap later in the function on the second line. It would happily remain on the stack indefinitely if not for that.)

It's cool that Graal does SRA, but Go will actually let you do a lot of work entirely on the stack (SRA'd into registers when possible), even crossing function boundaries. In your SRA blog post example, when the Vector is returned from the function, it has to be heap allocated into a "real" object at that point. Go doesn't have to do that heap allocation, and won’t have to collect that garbage later.

Most of the time, objects are much smaller than this contrived example, so they will often SRA into registers and avoid the stack entirely… and this applies across function boundaries too, from what I’ve seen, but I haven’t put as much effort into verifying this.


I was addressing your statement that seemed to say class metadata is included in every language when dealing with stack variables. I trusted your earlier statements that this was definitely the case with Java/Truffle. Misunderstandings on my part are entirely possible.

Sorry I haven’t had time to read your article. It’s on my todo list for later.


> at a minimum it seems like you would need arenas for each size class

But TLABs are heterogenous by size. Objects of all sizes go in one linear allocation space. So allocating two objects next to each other is the same as allocating them both at the same time.


That doesn’t answer how the GC knows where the boundaries of each object are.


No, outside special code it is impossible to know at compile time how many heap allocation a function will have.

The stack has the requirement that its size must be known at compile time for each function. In oversimplified terms its size is going to be the sum of size_of of all the syntactically local variables.

So for example you cannot grow the stack with a long loop, because the same variable is reused over and over in all the iterations.

You can instead grow the heap as much as you want with a simple `while(true){malloc(...)}`.


> No, outside special code it is impossible to know at compile time how many heap allocation a function will have.

Huh?

    def foo:
      return [new Object, new Object]
That'll always bump the TLAB three times. Why not bump it one time for all objects?

> The stack has the requirement that its size must be known at compile time for each function.

Also huh?

For example a local array that has a dynamic size.


> The stack has the requirement that its size must be known at compile time for each function.

Not really. You can bump your stack pointer at any time. Even C has alloca and VLAs. In lots of languages it's dangerous and not done because you can stack overflow with horrible results, and some (but not all) performance is lost because you need to offset some loads relative to another variable value, but you can do it.

What the stack really has a requirement that any values on it will go full nasal demon after the function returns, so you'd better be absolutely certain the value can't escape - and detecting that is hard.


Not quite - stack is constantly being reused and thus always hot in the cache. Young gen is never hot because it's much larger and the frontier is constantly moving forwards, instead of forwards and backwards.


> The tricky thing is GC works most of the time, but if you are working at scale you really can't predict user behavior, and so all of those GC-tuning parameters that were set six months ago no longer work properly. A good portion of production outages are likely related to cascading failures due to too long GC pauses, and a good portion of developer time is spent testing and tuning GC parameters. It is easier to remove and/or just not allow GC languages at these levels in the first place.

Getting rid of the GC doesn't absolve you of the problem, it just means that rather than tuning GC parameters, you've encoded usage assumptions in thousands of places scattered throughout your code base.


> A good portion of production outages are likely related to cascading failures due to too long GC pauses, and a good portion of developer time is spent testing and tuning GC parameters.

Can’t really accept that without some kind of quantitative evidence.


No worries. It is not meant to be quantitative. For a few years of my career that has been my experience. For this type of software, if I'm making the decision on what technology to use, it won't be any GC-based language. I'd rather not rely on promises that GC works great, or is very tunable.

One could argue that I could just tune my services from time to time. But I'd just reduce the surface area for problems by not relying upon it at all -- both a technical and a business decision.


If you're needing to fight the GC to prevent crashes or whatever then you have a system design issue not a tooling/language/ecosystem issue. There are exceptions to this but they're rare and not worth mentioning in a broad discussion like this.

Sadly very few people take interest in learning how to design systems properly.

Instead they find comfort in tools that allow them to over-engineer the problems away. Like falling into zealotry on things like FP, zero-overhead abstractions, "design patterns", containerization, manual memory management, etc, etc. These are all nice things when properly applied in context but they're not a substitute for making good system design decisions.

Good system design starts with understanding what computers are good at and what they suck at. That's a lot more difficult than it sounds because today's abstractions try to hide what computers suck at.

Example: Computers suck at networking. We have _a lot_ of complex layers to help make it feel somewhat reliable. But as a fundamental concept, it sucks. The day you network two computers together is the day you've opened yourself up to a world of hurt (think race conditions) - so, like, don't do it if you don't absolutely have to.


It's because system design is a lot less theoretically clean than something like FP, zero-cost abstractions, GC-less coding, containerization, etc, and forces programmers to confront essential complexity head-on. Lots of engineers think that theoretically complex/messy/hacky solutions are, by their nature, lesser solutions. Networking is actually a great example.

Real life networking is really complicated and there are tons of edge cases. Connections dropping due to dropped ACKs, repeated packets, misconfigured MTU limits causing dropped packets, latency on overloaded middleboxes resulting in jitter, NAT tables getting overloaded, the list goes on. However most programmers try to view all of these things with a "clean" abstraction and most TCP abstractions let you pretend like you just get an incoming stream of bytes. In web frameworks we abstract that even further and let the "web framework" handle the underlying complexities of HTTP.

Lots of programmers see a complicated system like a network and think that a system which has so many varied failure modes is in fact a badly designed system and are just looking for that one-true-abstraction to simplify the system. You see this a lot especially with strongly-typed FP people who view FP as the clean theoretical framework which captures any potential failure in a myriad of layered types. At the end of the day though systems like IP networks have an amount of essential complexity in them and shoving them into monad transformer soup just pushes the complexity elsewhere in the stack. The real world is messy, as much as programmers want to think it's not.


> The real world is messy, as much as programmers want to think it's not.

You hit the nail on the head with the whole comment and that line in particular.

I'll add that one of the most effective ways to deal with some of the messiness/complexity is simply to avoid it. Doing that is easier said than done these days because complexity is often introduced through a dependency. Or perhaps the benefits of adopting some popular architecture (eg: containerization) is hiding the complexity within.

> It's because system design is a lot less theoretically clean

Yea this is a major problem. It's sort of a dark art.


> Computers suck at networking. We have _a lot_ of complex layers to help make it feel somewhat reliable.

I've got bad news pal: your SSD has a triple-core ARM processor and is connected to the CPU through a bus, which is basically a network, complete with error correction and exact same failure modes as your connection to the new york stock exchange. Even the connection between your CPU and it's memory can prodice errors, it's turtles all the way down.


Computer systems are imperfect. No one is claiming otherwise. What matters more is the probability of failure, rates of failure in the real world, P95 latencies, how complex it is to mitigate common real world failures, etc, etc, etc.

"Turtles all the way down" is an appeal to purity. It's exactly the kind of fallacious thinking that leads to bad system design.


the difference of distributed (networked) systems is that they are expected to keep working even in the presence of partial (maybe byzantine) failures.

The communication between actor itself is not the problem, unreliable comunication between unreliable actors is.

If any of my CPU, RAM, Motherboard has a significant failure my laptop is just dead, they all can assume that all the others mostly work and simply fail if they don't.


Come now. Nobody can sever my connection to the CPU with a pair of hedge clippers in the backyard.


>Computers suck at networking. ... The day you network two computers together is the day you've opened yourself up to a world of hurt.

This is actually a pretty insightful comment, and something I haven't thought about in a number of years since networking disparate machines together to create a system in now so second nature to any modern software that we don't think twice about the massive amount of complexity we've suddenly introduced.

Maybe the mainframe concept wasn't such a bad idea, where you just build a massive box that runs everything together so you never get http timeouts or connection failed to your DB since they're always on.


> I'd rather not rely on promises that GC works great, or is very tunable.

I'm always puzzled by statements like these. What else do you want to rely on? The best answer I can think of is "The promise that my own code will work better", but even then: I don't trust my own code, my past self has let me down too many times. The promise that code from my colleagues will do better than GC? God forbid.

It's not like not having a GC means that you're reducing the surface area. You're not. What you're doing is taking on the responsibility of the GC and gambling on the fact that you'll do the things it does better.

The only thing that I can think of that manually memory managed languages offer vs GC languages is the fact that you can "fix locally". But then again, you're fixing problems created by yourself or your colleagues.


It's impossible to spend any time tuning Go's GC parameters as they intentionally do not provide any.

Go's GC is optimized for latency, it doesn't see the same kind of 1% peak latency issues you get in languages with a long tail of high latency pauses.

Also consider API design - Java API (both in standard & third party libs) tend to be on the verbose side and build complex structures out of many nested objects. Most Go applications will have less nesting depth so it's inherently an easier GC problem.

System designs that rely on allocating a huge amount of memory to a single process exist in a weird space - big enough that perf is really important, but small enough that single-process is still a viable design. Building massive monoliths that allocate hundreds of Gb's at peak load just doesn't seem "in vogue" anymore.

If you are building a distributed system keeping any individual processes peak allocation to a reasonable size is almost automatic.


You tune Go’s GC by rewriting your code. It’s like turning a knob but slower and riskier.


You tune GC in Go by profiling allocations, CPU, and memory usage. Profiling shows you where the problems are, and Go has some surprisingly nice profiling tools built in.

Unlike turning a knob, which has wide reaching and unpredictable effects that may cause problems to just move around from one part of your application to another, you can address the actual problems with near-surgical precision in Go. You can even add tests to the code to ensure that you're meeting the expected number of allocations along a certain code path if you need to guarantee against regressions... but the GC is so rarely the problem in Go compared to Java, it's just not something to worry about 99% of the time.

If knobs had a "fix the problem" setting, they would already be set to that value. Instead, every value is a trade off, and since you have hundreds of knobs, you're playing an impossible optimization game with hundreds of parameters to try to find the set of parameter values that make your entire application perform the way you want it to. You might as well have a meta-tuner that just randomly turns the knobs to collect data on all the possible combinations of settings... and just hope that your next code change doesn't throw all that hard work out the window. Go gives you the tools to tune different parts of your code to behave in ways that are optimal for them.

It's worth pointing out that languages like Rust and C++ also require you to tune allocations and deallocations... this is not strictly a GC problem. In those languages, like in Go, you have to address the actual problems instead of spinning knobs and hoping the problem goes away.

The one time I have actually run up against Go's GC when writing code that was trying to push the absolute limits of what could be done on a fleet of rather resource constrained cloud instances, I wished I was writing Rust for this particular problem... I definitely wasn't wishing I could be spinning Java's GC knobs. But, I was still able to optimize things to work in Go the way I needed them to even in that case, even if the level of control isn't as granular as Rust would have provided.


I think I toggled with the GC for less than a week in my eight years experience including some systems stuff - maybe this is true at FANG scale but not for me!


As many have replied, the available levers for 'GC-tuning' in go is almost non-existent. However, what we do have influence on is "GC Pressure" which is a very important metric we can move in the right direction if the application requires it.


You really haven't given any supporting information for your argument other than a vague feeling that GC is somehow bad. In fact you just pointed out many counterexamples to your own argument, so I'm not sure what to take away.

I've seen this sentiment a lot, and I never see specifics. "GC is bad for systems language" is an unsupported, tribalist, firmly-held belief that is unsupported by hard data.

On the other hand, huge, memory-intensive and garbage-collected systems have been deployed in vast numbers by thousands of different companies for decades, long before Go, within acceptable latency bounds. And shoddy, poorly performing systems have been written in C/C++ and failed spectacularly for all kinds of reasons.


>I've seen this sentiment a lot, and I never see specifics. "GC is bad for systems language" is an unsupported, tribalist, firmly-held belief that is unsupported by hard data.

I would argue it's not (very) hard data that we need in this case. My opinion is that the resource usage of infrastructure code should be as low as possible so that most resources are available to run applications.

The economic viability of application development is very much determined by developer productivity. Many applications aren't even run that often if you think of in-house business software for instance. So application development is where we have to spend our resource budget.

Systems/infrastructure code on the other hand is subject to very different economics. It runs all the time. The ratio of development time to runtime is incredibly small. We should optimise the heck out of infrastructure code to drive down resource usage whenever possible.

GC has significant memory and CPU overhead. I don't want to spend double digit resource percentages on GC for software that could be written differently without being uneconomical.


> Systems/infrastructure code on the other hand is subject to very different economics. It runs all the time. The ratio of development time to runtime is incredibly small. We should optimise the heck out of infrastructure code to drive down resource usage whenever possible.

I will assume by "infrastructure code" you mean things like kernels and network stacks.

Unfortunately there are several intertwined issues here.

First, we pay in completely different ways for writing this software in manually-managed languages. Security vulnerabilities. Bugs. Development time. Slow evolution. I don't agree with the tradeoffs we have made. This software is important and needs to be memory-safe. Maybe Rust will deliver, who knows. But we currently have a lot of latent memory management bugs here that have consistently clocked in 2/3 to 3/4 of critical CVEs over several decades. That's a real problem. We aren't getting this right.

Second, infrastructure code does not consume a lot of memory. Infrastructure code mostly manages memory and buffers. The actual heap footprint of the Linux kernel is pretty small; it mostly indexes and manages memory, buffers, devices, packets, etc. That is where optimization should go; manage the most resources with the lowest overhead just in terms of data structures.

> GC has significant memory and CPU overhead. I don't want to spend double digit resource percentages on GC for software that could be written differently without being uneconomical.

Let's posit 20% CPU for all the things that a GC does. And let's posit 2X for enough heap room to keep the GC running concurrently well enough that it doesn't incur a lot of mutator pauses.

If all that infrastructure is taking 10% of CPU and 10% of memory, we are talking adding 2% CPU and 10% memory.

ABSOLUTE BARGAIN in my book!

The funny thing is, that people made these same arguments back in the heyday of Moore's law when we were getting 2X CPU performance ever 18 months. 2% CPU back then was a matter of weeks of Moore's law. Now? Maybe a couple of months. We consistently choose to spend the performance dividends of hardware performance improvements on...more performance? And nothing on safety or programmability? I seriously think we chose poorly here due to some significant confusion in priorities and real costs.


>I will assume by "infrastructure code" you mean things like kernels and network stacks.

That, and things like database systems, libraries that are used in a lot of other software or language runtimes for higher level languages.

>The actual heap footprint of the Linux kernel is pretty small

And what would that footprint be if the kernel was written in Java or Go? What would the performance of all those device drivers be?

You can of course write memory efficient code in GC languages by manually managing a bunch of buffers. But I have seen and written quite a bit of that sort of code. It's horribly unsafe and horribly unproductive to write. It's far worse than any C++ code I have ever seen. It's the only choice left when you have boxed yourself into a corner with a language that is unsuitable for the task.

>First, we pay in completely different ways for writing this software in manually-managed languages. Security vulnerabilities. Bugs. Development time

This is not a bad argument, but I think there has always been a very wide range of safety features in non-GC languages. C was never the only language choice. We had the Pascal family of languages. We had ADA. We got "modern" C++, and now we have Rust.

If safety was ever good enough reason to use GC languages for systems/infrastructure, that time is now over.


> You can of course write memory efficient code in GC languages by manually managing a bunch of buffers. But I have seen and written quite a bit of that sort of code. It's horribly unsafe and horribly unproductive to write.

Go uses buffers pretty idiomatically and they don't seem unsafe or unproductive. Maybe I'm not following your meaning?

> If safety was ever good enough reason to use GC languages for systems/infrastructure, that time is now over.

I don't know that I want GC for OS kernels and device drivers and so on, but typically people arguing against GC are assuming lots of garbage and long pause times; however, Go demonstrates that we can have low-latency GC and relatively easy control over how much garbage we generate and where that garbage is generated. It's also not hard to conceive of a language inspired by Go that is more aggressively optimized (for example, fully monomorphized generics, per this article or with a more sophisticated garbage collector).

I think the more compelling reason to avoid a GC for kernel level code is that it implies that the lowest level code depends on a fairly complex piece of software, and that feels wrong (but that's also a weak criticism and I could probably be convinced otherwise).


> It's also not hard to conceive of a language inspired by Go that is more aggressively optimized (for example, fully monomorphized generics, per this article or with a more sophisticated garbage collector).

Standard ML as implemented by MLton uses full monomorphization and a host of advanced functional optimizations. That code can be blazingly fast. MLton does take a long time to compile, though.

I've been working on a systems language that is GC'd and also does full monomorphization--Virgil. I am fairly miserly with memory allocations in my style and the compiler manages to compile itself (50KLOC) at full optimization in 300ms and ~200MB of memory, not performing a single GC. My GC is really dumb and does a Cheney-style semispace copy, so it has horrible pauses. Even so, GC is invisible so the algorithm could be swapped out at any time.

For an OS kernel, I think a GC would have to be pretty sophisticated (concurrent, highly-parallel, on-the-fly), but I think this is a problem I would love to be working on, rather than debugging the 19000th use-after-free bug.

Go's GC is very sophisticated, with very low pause times. It trades memory for those low pause times and can suffer from fragmentation because it doesn't compact memory. Concurrent copying is still a hard problem.

Again, a problem I'd rather we had than the world melting down because we chose performance rather than security.


My argument is about the economics of software development more than about any of the large number of interesting technical details we could debate for a very long time.

There are higher level features that cause higher resource consumption. GC is clearly one such feature. No one denies that. So how do we decide where it makes more sense to use these features and where does it make less sense?

What I'm saying is that we should let ourselves be guided by the ratio development_time / running_time. The smaller this ratio, the less sense it makes to use such "resource hogging" features and the more sense it makes to use every opportunity for optimisation.

This is not only true for infrastructure/systems software. This is just one case where that ratio is very small. Another case would be application software that is used by a very large number of people, such as web browsers.


I understand your argument, it's been made for decades. Put in a lot of effort to save those resources. But isn't about effort. We put in a lot of effort and still got crap, even worse, security. We put effort into the wrong things!

We've majorly screwed up our priorities. Correctness should be so much higher up the priority list, probably #1, TBH. When it is a high priority, we should be willing to sacrifice performance to actually get it. The correct question is not if we should sacrifice performance, but how much. We didn't even get that right.

But look, I know. Security doesn't sell systems, never has--benchmarks do. The competitive benchmarking marketplace is partly responsible. And there, there's been so much FUD on the subject that I feel we've all been hoodwinked and conned into putting performance at the top to all of our detriment. That was just dumb on our (collective) part.

Let me put it another way. Go back to 1980. Suppose I offered you two choices. Choice A, you get a 1000x improvement in computer performance and memory capacity, but your system software is a major pain in the ass to write and full of security vulnerabilities, to the point where the world suffers hundreds of billions of dollars of lost GDP due to software vulnerabilities. Choice B, you get an 800x improvement in computer performance, a 500x improvement in memory capacity, and two thirds of that GDP loss just doesn't happen. Also, writing that software isn't nearly as much of a pain in the ass.

Which did we choose? Yeah. That's where the disagreement lies.


> This software is important and needs to be memory-safe.

Let's say the GC is a binary... where is this binary? Probably somewhere in the filesystem? So now you need a filesystem written in a language that doesn't depend on the GC, just like the binary loader, scheduler, bootloader, which by the way, can boot from remote, so now you also need a network stack that doesn't use GC.

You might need to show something on the screen before the GC starts up, so you will need video drivers, bus drivers, etc. written in a language that doesn't use a GC.

And in which language is the GC written?

You get the idea.


My argument against GC (and which applies similary to JIT-basd runtimes) is that the problems caused by GC pauses have non-local causes. If a piece of code ran slowly because of a GC pause, the cause of the pause is in some sense _the entire rest of the system_. You can't fix the problem with a localized change.

Programs in un-managed languages can be slow too, and excessive use of malloc() is a frequent culprit. But the difference is that if I have a piece of code that is slow because it is calling malloc() too much, I can often (or at least some of the time) just remove the malloc() calls from that function. I don't have to boil the ocean and significantly reduce the rate at which my entire program allocates memory.

I think another factor that gets ignored is how much you care about tail latency. I think GC is usually fine for servers and other situations where you are targeting a good P99 or P99.9 latency number. And indeed, this is where JVM, Go, node.js, and other GCed runtimes dominate.

But, there are situations, like games, where a bad P99.9 frame time means dropping a frame every 15 seconds (at 60fps). If you've got one frame skip every 10 seconds because of garbage collection pauses and you want to get to one frame skip every minute, that is _not_ an easy problem to fix.

(Yes, I am aware that many commercial game engines have garbage collectors).


I don't want to try to bring up an exception that disproves your rule, but what about something like BEAM, where it has per-process (process = lightweight thread) heaps and GC.


I don't know anything about BEAM, but I don't think single-threading of any form really addresses the underlying problem. If you go to allocate something, and the system decides a GC is necessary in order to satisfy your allocation, then the GC has to run before your allocation returns.


You can't share objects across threads (called "processes") in BEAM, so it's very different. The GC only ever needs to pause one call stack at a time to do a full GC cycle. Memory shared across processes is generally manually managed, typically more akin to a database than an object heap.


>If you go to allocate something, and the system decides a GC is necessary in order to satisfy your allocation, then the GC has to run before your allocation returns.

Manual memory management on the heap does not solve this problem at all, it is actually worse at it than most traditional GCs. There is no magical system that allocates an object without executing code.


You're absolutely correct. If you could divide your program into multiple garbage collector heaps and then choose which garbage collector strategy (or even no garbage collector) to use then even Java could be used for soft real time applications. The problem is "stop the world including my time critical thread". If you only stop the non critical threads and let the critical threads keep running there is no issue with GCs.


Right. In my experience just taking a few steps (like pre-allocating buffers or arrays) decrease GC pressure enough where GC runs don't actually affect performance enough to matter (as long as you're looking at ~0.5-1 ms P99 response times). But there's always the strident group who says GCs are bad and never offer any circumstance where that could be true.


Indeed. What really kills is extremely high allocation rates and extremely high garbage production. I've seen internal numbers from $megacorp that show that trashy C++ programs (high allocation + deallocation rates) look pretty much the same to CPUs as trashy Java programs, but are far worse in terms of memory fragmentation. Trashy C++ programs can end up spending 20+% of their execution time in malloc/free. That's a fleetwide number I've seen on clusters > 1M cores.

I will admit that the programming culture is different for many GC'd languages' communities, sometimes encouraging a very trashy programming style, which contributes to the perception that GC itself is the problem, but based on my experience in designing and building GC'd systems, I don't blame GC itself.


> I will admit that the programming culture is different for many GC'd languages' communities, sometimes encouraging a very trashy programming style, which contributes to the perception that GC itself is the problem

For some languages (I’m looking at you, Java), there’s not much of a way to program that doesn't generate a bunch of garbage, because only primitives are treated as value types, and for Objects, heap allocations can only be avoided if escape analysis can prove the object doesn’t outlast its stack frame (which isn’t reliable in practice.) (Edit: or maybe it doesn’t happen at all. Apparently escape analysis isn’t used to put objects on the stack even if they are known to not escape: https://www.beyondjava.net/escape-analysis-java)

I honestly can’t imagine much of a way to program in Java that doesn’t result in tremendous GC pressure. You could technically allocate big static buffers and use a completely different paradigm where every function is static and takes data at a defined offsets to said buffers, but… nobody really does this and the result wouldn’t look anything like Java.

Sometimes it’s appropriate to blame the language.


> I’m looking at you, Java

> Sometimes it’s appropriate to blame the language.

Oh, I know, I was just being vague to be diplomatic. Java being generally trashy has been one of the major motivators for me to do Virgil. In Java, you can't even parse an integer without allocating memory.


Yeah not all languages are created equal. Working around Java's GC can be a huge pain (luckily the generational GC makes it okay performance-wise to generate metric tons of garbage). Luckily Go doesn't have this problem and it's a lot easier to avoid allocations in Go.


It is indeed a huge problem of Java, that it often makes it difficult to avoid generating garbage. However, one still can reduce it a lot if trying hard. And be it by reimplementing selected parts of the standard libraries.

But the job of avoiding garbage is much easier in Go :)


You can run high-volume business and mid-level infrastructure (messaging) systems day-in day-out in Java, and struggle to see more than 0.1% CPU needed for garbage collection.

The light-bulb idea here is about reliability, security and productivity. Managing memory manually a la C/ C++ is a proven disaster in all three respects.

Maybe ten or fifteen years ago GC was costly, but these days it's highly efficient and not incomparable with programmatic allocation.


Java made a huge mistake of not being like Oberon, Modula-3, Eiffel, Mesa/Cedar,....

However that doesn't make all GC languages equal and measured by Java's bad decisions at birth.


Virtually invariably, "GC is bad" assumes (1) lots of garbage (2) long pause times. Go has idiomatic value types (so it generates much less garbage) and a low-latency garbage collector. People who argue against GC are almost always arguing against some Java GC circa 2005.


What do you consider a long pause time?

In userland I consider anything above maybe 1 or 2 milliseconds to be a long pause time. The standards only get higher when it's something like a kernel.

A kernel with even a 0.2ms pause time can be unacceptable when working with ultra low-latency audio for example.


The criticisms I've heard typically reference pause times in the tens or hundreds of milliseconds. Agreed that different domains and applications have different requirements for pause times. I would be very interested to see histograms of pause times for different GCs, but I'm pretty sure the specific results would vary a lot depending on the specific corpus of applications under benchmark. If your 99%-ile pause time is tens of microseconds, is that good enough for video games? Audio kernels?


This is the no true scottsman argument. I mean, no true modern GC. And it's bullshit. Let's be topical and pick on Go since that's the language in the title:

https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...

30% of CPU spent on GC, individual GC pauses already in the milliseconds, despite a tiny half-gig heap in 2019. For gamedev, a single millisecond in the wrong place can be enough to miss vsync and have unacceptable framerate stutter. In the right place, it's "merely" 10% of your entire frame budget, for VR-friendly, nausea-avoiding framerates near 100fps. Or perhaps 90% if "single digit milliseconds" might include 9ms.

Meanwhile, the last professional project I worked on had 100ms pauses every 30 seconds because we were experimenting with duktape, which is still seeing active commits. Closer to a 32GB heap for that project, but most of that was textures. Explicit allocation would at least show where the problematic garbage/churn was in any profiler, but garbage collection meant a single opaque codepath for all garbage deallocation... without even the benefit of explicit static types to narrow down the problem.


From your link (which I remember reading at the time):

> So by simply reducing GC frequency, we saw close to a ~99% drop in mark assist work, which translated to a~45% improvement in 99th percentile API latency at peak traffic.

Did you look at the actual article? (Because it doesn't support your point). They added a 10GB memory ballast to keep the GC pacer from collecting too much. That is just a bad heuristic in the GC, and should have a tuning knob. I'd argue a tuning knob isn't so bad, compared to rewriting your entire application to manually malloc/free everything, which would likely result in oodles of bugs.

Also:

> And it's bullshit.

Please, we can keep the temperature on the conversation down a bit by just keeping to facts and leaving out a few of these words.


> Did you look at the actual article? (Because it doesn't support your point).

I did and it does for the point I intended to derive from said article:

>> However, the GC pause times before and after the change were not significantly different. Furthermore, our pause times were on the order of single digit milliseconds, not the 100s of milliseconds improvement we saw at peak load.

They were able to improve times via tuning. Individual GC pause times were still in the milliseconds. Totally acceptable for twitch's API servers (and in fact drowned out by the several hundred millisecond response times), but those numbers mean you'd want to avoid doing anything at all in a gamedev render thread that could potentially trigger a GC pause, because said GC pause will trigger a vsync miss.

> I'd argue a tuning knob isn't so bad, compared to rewriting your entire application to manually malloc/free everything, which would likely result in oodles of bugs.

Memory debuggers and RAII tools have ways to tackle this.

I've also spent my fair share of time tackling oodles of bugs from object pooling, meant to workaround performance pitfalls in GCed languages, made worse by the fact that said languages treated manual memory allocation as a second class citizen at best, providing inadequate tooling for tackling the problem vs languages that treat it as a first class option.


> but those numbers mean you'd want to avoid doing anything at all in a gamedev render thread that could potentially trigger a GC pause, because said GC pause will trigger a vsync miss.

You might want to take a look at this:

https://queue.acm.org/detail.cfm?id=2977741


I have, it's a decent read - although somewhat incoherent. E.g. they tout the benefits of GCing when idle, then trash the idea of controlling GC:

> Sin two: explicit garbage-collection invocation. JavaScript does not have a Java-style System.gc() API, but some developers would like to have that. Their motivation is proactively to invoke garbage collection during a non-time-critical phase in order to avoid it later when timing is critical. [...]

So, no explicitly GCing when a game knows it's idle. Gah. The worst part is these are entirely fine points... and somewhat coherent in the context of webapps and webpages. But then when one attempts to embed v8 - as one does - and suddenly you the developer are the one that might be attempting to time GCs correctly. At least then you have access to the appropriate native APIs:

* https://v8docs.nodesource.com/node-7.10/d5/dda/classv8_1_1_i... * https://v8docs.nodesource.com/node-7.10/d5/dda/classv8_1_1_i...

A project I worked on had a few points where it had to explicitly call GC multiple times back to back. Intertwined references from C++ -> Squirrel[1] -> C++ -> Squirrel meant the first GC would finalize some C++ objects, which would unroot some Squirrel objects, which would allow some more C++ objects fo be finalized - but only one layer at a time per GC pass.

Without the multiple explicit GC calls between unrooting one level and loading the next, the game had a tendency to "randomly"[2] ~double it's typical memory budget (thanks to uncollected dead objects and the corresponding textures they were keeping alive), crashing OOM in the process - the kind of thing that would fail console certification processes and ruin marketing plans.

[1]: http://squirrel-lang.org/

[2]: quite sensitive to the timing of "natural" allocation-triggered GCs, and what objects might've created what reference cyles etc.


> So, no explicitly GCing when a game knows it's idle.

I mean, that is literally what the idle time scheduler in Chrome does. It has a system-wide view of idleness, which includes all phases of rendering and whatever else concurrent work is going on.

> Intertwined references from C++ -> Squirrel[1] -> C++ -> Squirrel meant the first GC would finalize some C++ objects, which would unroot some Squirrel objects, which would allow some more C++ objects fo be finalized - but only one layer at a time per GC pass.

This is a nasty problem and it happens a lot interfacing two heaps, one GC'd and one not. The solution isn't less GC, it's more. That's why Chrome has GC of C++ (Oilpan) and is working towards a unified heap (this may already be done). You put the blame on the wrong component here.


> I mean, that is literally what the idle time scheduler in Chrome does. It has a system-wide view of idleness, which includes all phases of rendering and whatever else concurrent work is going on.

Roughly, but it has poor insight into a game's definition of "idle". Probably fine for most web games, but prioritizing, say, "non-idle" game-driven prefetching and background decompression over GC can be the wrong tradeoff.

> This is a nasty problem and it happens a lot interfacing two heaps, one GC'd and one not. The solution isn't less GC, it's more. That's why Chrome has GC of C++ (Oilpan) and is working towards a unified heap (this may already be done). You put the blame on the wrong component here.

Two non-GCed heaps doesn't have this problem, nor do two "GC"ed heaps if we use an expansive definition of GC that includes refcounting-only systems - it only arises when using multiple heaps, when at least one of them is the deferred scanny finalizer-laden GC style. While you're correct that "more GC" is a solution, it's not the only solution, it has it's drawbacks, and holding GC blameless here is disingenuous when it's the only common element. That these GCs compose poorly with other heaps is a drawback of said GCs.

If I try mixing Squirrel and C#, I'll run the risk of running into the same issues, despite both being GC based. I'm sure you'll agree that attempting to coax them into using the same heap is a nontrivial endeavor. I've been in the position of having two different JavaScript engines (one for "gameplay", one for UI) in the same project - but they were fortunately used for separate enough things to not create these kind of onion layers of intertwined garbage. While silly from a clean room technical standpoint, it's the kind of thing that can arise when different organizational silos end up interfacing - a preexisting reality I've been inflicted by more than once.


> Please, we can keep the temperature on the conversation down a bit by just keeping to facts and leaving out a few of these words.

Sure. Let's avoid some of these words too:

> unsupported, tribalist, firmly-held belief that is unsupported by hard data.

Asking for examples is fine and great, but painting broad strokes of the dissenting camp before they have a chance to respond does nothing to help keep things cool.


I don't think you know what "no true scotsman" means--I'm not asserting that Go's GC is the "true GC" but that it is one permutation of "GC" and it defies the conventional criticisms levied at GC. As such, it's inadequate to refute GC in general on the basis of long pauses and lots of garbage, you must refute each GC (or at least each type/class of GC) individually. Also, you can see how cherry-picking pathological, worst-case examples doesn't inform us about the normative case, right?


>> And it's bullshit.

> cherry picks worst-case examples and represents them as normative

Neither of my examples are anywhere near worst-case. All texture data bypassed the GC entirely, for example, contributing to neither live object count nor GC pressure. I'm taking numbers from a modern GC with value types that you yourself should be fine and pointed out, hey, it's actually pretty not OK for anything that might touch the render loop in modern game development, even if it's not being used as the primary language GC.


> I don't think you know what "no true scotsman" means--I'm not asserting that Go's GC is the "true GC"

At no point in invoking https://en.wikipedia.org/wiki/No_true_Scotsman does one bother to define what a true scotsman is, only what it is not by way of handwaving away any example of problems with a category by implying the category excludes them. It's exactly what you've done when you state "People who argue against GC are almost always arguing against" some ancient, nonmodern, unoptimized GC.

Modern GCs have perf issues in some categories too.

> As such, it's inadequate to refute GC in general on the basis of long pauses and lots of garbage, you must refute each GC (or at least each type/class of GC) individually.

I do not intend to refute the value of GCs in general. I will happily use GCs in some cases.

I intend to refute your overbroad generalization of the anti-GC camp, for which specific examples are sufficient.

> Also, you can see how cherry-picking pathological, worst-case examples doesn't inform us about the normative case, right?

My examples are neither pathological nor worst case. They need not be normative - but for what it's worth, they do exemplify the normative case of my own experiences in game development across multiple projects with different teams at different studios, when language level GCs were used for general purpouses, despite being bypassed for bulk data.

It's also exactly what titzer was complaining was missing upthread:

> I've seen this sentiment a lot, and I never see specifics. "GC is bad for systems language" is an unsupported, tribalist, firmly-held belief that is unsupported by hard data.


> At no point in invoking https://en.wikipedia.org/wiki/No_true_Scotsman does one bother to define what a true scotsman is, only what it is not by way of handwaving away any example of problems with a category by implying the category excludes them. It's exactly what you've done when you state "People who argue against GC are almost always arguing against" some ancient, nonmodern, unoptimized GC.

Yes, that is how a NTS works, which is why you should be able to see that my argument isn't one. If I had said the common criticism of GCs ("they have long pause times") is invalid because the only True GC is a low-latency GC, then I would have made an NTS argument. But instead I argued that the common criticism of GCs doesn't refute GCs categorically as GC critics believe, it only refutes high-latency GCs. I'm not arguing that True GCs are modern, optimized, etc, only that some GCs are modern and optimized and the conventional anti-GC arguments/assumptions ("long pause times") don't always apply.


Not to detract from your general point, but I believe that this specific situation was addressed in Go 1.18's GC pacer rework: https://github.com/golang/proposal/blob/master/design/44167-....


GC is bad if the problem domain you are working on requires you to think about the behaviour of the GC all the time. GC behaviour can be subtle, may change from version to version and some behaviours may not even be clearly documented. If one has to think about it all the time, it is likely better just to use a tool where memory management is more explicit. However, I think for many examples of "systems software" (Kubernetes for example), GC is not an issue at all but for others it is an illogical first choice (though it often can be made to work).


Even in performance-critical software, you're not thinking about GC "all the time" but only in certain hot paths. Also, value types and some allocation semantics (which Go technically lacks, but the stack analyzer is intuitive and easily profiled so it has the same effect as semantics) make the cognitive GC burden much lower.


I've said this before, and I'll say it again. People lump Go in with C/C++/Rust because they all (can) produce static binaries. I don't need to install Go's runtime like I install Java/NodeJS/Python runtimes. Honestly, I think it speaks so much to Go's accomplishments that it performs so well people intuitively categorize it with the systems languages rather than other managed languages.


Managed languages like Oberon, D, Modula-3, System C#…


Large programs require memory management.

Are you writing an application where Go's garbage collector will perform poorly relative to rolling your own memory management?

Maybe, those applications exist, but maybe not, it shouldn't be presumed.

I'm more open to the argument from definition, which might be what you mean by 'inherently': there isn't an RFC we can point to for interpreting what a systems language is, and it could be useful to have a consensus that manual memory management is a necessary property of anything we call a systems language.

No such consensus exists, and arguing that Go is a poor choice for $thing isn't a great way to establish whether it is or is not a systems language.

Go users certainly seem to think so, and it's not a molehill I wish to die upon.


this has been argued ad nauseum a decade ago and it boils down to your definition of 'systems'. at google scale, a system is a mesh of networked programs, not a kernel or low-level bit-banging tool.


By that definition, Java is a systems language as well.

I think Go makes a better trade-off than Java, but I struggle to come up with decent examples of projects one could write in Go and not in Java. Most of the “systems” problems that Java is unsuitable for, also apply to Go.


Go compiles to static binaries. Java needs the JVM. That already is a HUGE difference in "picking the right tool".

Also, the JVM is way more heavy and resource intensive than a typical Go program. Go is great for cli tools, servers, and the usual "microservices" stuff, whatever it means to you.


Java can just be AOT compiled as Go, the only difference is that until recently it wasn't a free beer option to do so.


>until recently

Actually, for a long time (almost 20 years, I think), the gcc subsystem gcj let you do AOT compilation of Java to ELF binaries. [1] I think they had to be dynamically linked, but only to a few shared objects (unless you pulled in a lot via native dependencies, but that's kind of "on you").

I don't recall any restrictions on how to use generated code, anymore than gcc-generated object code. So, I don't think the FSF/copyleft on the compiler itself nullifies a free beer classification. :-) gcj may not have done a great job of tracking Java language changes. So, there might be a "doesn't count as 'real Java'" semantic issue.

[1] https://manpages.ubuntu.com/manpages/trusty/man1/gcj.1.html


The interesting thing I always heard in the Java world is that AOT was actually a drawback as the virtual machine allowed for just in time optimizations according to hotspots. Actually if I remember correctly the word hotspot itself was even used as a tech trademark.

I was always a bit skeptical but given i was never much into Java i just assumed my skepticism was out of ignorance. Now with what I know about profile guided compilation I can see it happening; A JIT language should have a performance advantage, especially if the optimal code paths change dynamically according to workload. Not even profile guided compilation can easily handle that, unless I am ignorant of more than i thought.


I've heard the exact opposite. The supposed performance benefits of JIT compared to AOT (profile-guided optimization, run-time uarch-targeted optimization) never really materialized. There's been a lot of research since the late '90s into program transformation and it turned out that actually the most effective optimizations are architecture-independent and too expensive to be performed over and over again at startup or when the program is running. At the same time, deciding when it's worthwhile to reoptimize based on new profiling data turned out to be a much more difficult problem than expected.

So the end result is that while both AOT (GCC, LLVM) and JIT (JVM, CLR) toolchains have been making gradual progress in performance, the JIT toolchains never caught up with the AOT ones as was expected in the '90s.


Good luck with inlining and devirtualization across DLLs with AOT.

JIT caches with PGO get most of AOT benefits, that is why after the short stint with AOT on Android, Google decided to invest in JIT caches instead.

The best toolchains can do both, so it is never a matter of either AOT or JIT.

GCC and clang aren't investing in JIT features just for fun.


What's with the snappy tone?

>Good luck with inlining and devirtualization across DLLs with AOT.

An AOT compiler/linker is unable to inline calls across DLL boundaries because DLLs present a black-box interface. A JIT compiler would run into the exact same problem when presented with a DLL whose interface it doesn't understand or is incompatible with. If you really want a call inlined the solution is to link the caller and function statically (whether the native code generation happens at compile- or run-time), not to depend on unknown capabilities of the run-time.

>The best toolchains can do both, so it is never a matter of either AOT or JIT.

You're refuting a false dichotomy no one raised.


I am sarcastic by nature, that is why.

JIT compilers for Java and .NET do inline across shared libraries and devirtualization, including native calls.

When people discuss A vs B there is always a dichotomy.


Theory and practice can diverge and it's easy to over-conclude based on either with such complex systems. For example, I have seen gcc PGO make the very same training case used to measure the profile run more slowly. One might think that impossible naively, but maybe it sounds more plausible if I put it differently - "steering the many code generation heuristics with the profile failed in practice in that case". As with almost everything in computer systems, "it all depends...."


Sun was ideologically against AOT, as all commercial vendors always had some form of either AOT or JIT caches.

In fact, the JIT caches in OpenJDK come from Oracle/Bea's J/Rockit, while IBM's OpenJ9 AOT compilation is from WebSphere Real Time JVM implementation.


I seldom mention it, because gcj was abandoned in 2009 when most contributors moved into the newly released OpenJDK, eventually removed from GCC tree, and it never was as foolproof as the commercial versions.


The problem is that a lot of software in the Java world explicitly relies on it not being compiled AOT. Any standard tomcat or jboss/wildfly application for example.


Go is strictly less useful than Java because it has strictly less power. This is true for general purpose programming (though somewhat remediated through introduction of generics) it's doubly true for "systems" applications:

No access to raw threads. No ability to allocate/utilize off-heap memory (without CGo and nonsense atleast). Low throughput compared to Java JIT (unsuitable for CPU intensive tasks).

The only thing I can think of in it's favor is lower memory usage by default but this is mostly just a JVM misconception, you can totally tune it for low memory usage (in constrained env) or high memory efficiency - especially if using off-heap structures.

On a stdlib level Java mostly wins but Go has some highlights, it has an absolutely rock solid and well built HTTP and TLS/X.509/ASN1 stack for instance, also more batteries vs Java.

Overall I think if the requirement is "goes fast" I will always choose Java.

I may pick Go if the brief calls for something like a lightweight network proxy that should be I/O bound rather than CPU bound and everything I need is in stdlib and I don't need any fancy collections etc.


I think you're mistaken on nearly every count. :)

First of all, Go and Java exist at roughly the same performance tier. It will be less work to make Java beat Go for some applications and vice versa for other applications. Moreover, typical Go programs use quite a lot less memory than typical Java programs (i.e., there's more than one kind of performance).

Secondly, Go can make syscalls directly, so it absolutely can use raw-threads and off-heap memory. These are virtually never useful for the "systems" domain (as defined above).

Thirdly, I think Go's stdlib is better if only because it isn't riddled with inheritance. It also has a standard testing library that works with the standard tooling.

Lastly, I think you're ignoring other pertinent factors like maintainability (does a new dev have to learn a new framework, style, conventions, etc to start contributing?), learning curve (how long does it take to onboard someone who is unfamiliar with the language? Are there plugins for their text editor or are they going to have to learn an IDE?), tooling (do you need a DSL just to define the dependencies? do you need a DSL just to spit out a static binary? do you need a CI pipeline to publish source code or documentation packages?), runtime (do you need a GC tuning wizard to calibrate your runtime? does it "just work" in all environments?), etc.


I disagree.

Go is definitely not as fast as Java for throughput. It's gotten pretty good for latency sensitive workloads but it's simply left in the dust for straight throughput, especially if you are hammering the GC.

Sure it can make syscalls directly but if you are going to talk about a maintainability nightmare I can't think of anything worse than trying to manipulate threads directly in Go. I had to do this in a previous Go project where thread pinning was important and even that sucked.

That is just taste. Objectively collections and many other aspects of the Java stdlib completely destroy Go, I pointed out the good bits already.

Again, taste. Java has a slightly steeper and longer learning curve but that is a cost you pay once and is amortized over all the code that engineer will contribute over their tenure.

Using an IDE (especially if everyone is using the same one) is actually a productivity improvement, not an impairment but again - taste. Some people just don't like IDEs or don't like that you need to use a specific one to get the most out of a specific tech stack.

Build systems in Java by and large fall into only 3 camps, Maven, Gradle and a very small (but loud/dedicated) Bazel camp. Contrast that to Go which is almost always a huge pile of horrible Makefiles, CMake, Bazel or some other crazy homebrewed bash build system.

You don't escape CI because you used Go, if you think you did then you are probably doing Go wrong.

Java runtime trades simplicity for ability to be tuned, again taste. I personally prefer it.

So no, I don't think I am mistaken. I think you just prefer Go over Java for subjective reasons. Which is completely OK but doesn't invalidate anything I said.


> Build systems in Java by and large fall into only 3 camps, Maven, Gradle and a very small (but loud/dedicated) Bazel camp. Contrast that to Go which is almost always a huge pile of horrible Makefiles, CMake, Bazel or some other crazy homebrewed bash build system.

Well Go does not need a book of 400+ pages to understand Maven.


It needs one to understand Go modules and how they change across language versions.


Go module are easy to use, compiling is even more simpler: go build .


If you say so,

"HERO: On the Chaos When PATH Meets Modules"

https://cs.nju.edu.cn/changxu/1_publications/21/ICSE21_02.pd...


I’ve been writing go since it basically came out. Every single one of you’re comments is anecdotal so here we go. I’ve never, ever, encountered an issue compiling / building go code using modules. You also said in your first argument that you get no access to raw threads then a comment later said you do. It seems like you’re just advocating for people to use java because it’s what you’re comfortable with.


I am confortable with plenty of stuff, which makes me not the target audience for Go.

"The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt."

The threads comment wasn't from me.


Pretty sure it was a different commenter who claimed no access to raw threads, but yes, the parent user can be a bit abrasive on the subject of programming languages.


I think your use of “subjective” is avoiding discussing things that are harder to prove but matter a great deal.


> Go is definitely not as fast as Java for throughput. It's gotten pretty good for latency sensitive workloads but it's simply left in the dust for straight throughput, especially if you are hammering the GC.

Java is better for GC throughput, but your claim was about compute throughput in general. Moreover, Go doesn't lean nearly as hard on GC as Java does in the first place (idiomatic value types, less boxing, etc), so GC throughput doesn't imply overall throughput.

> Sure it can make syscalls directly but if you are going to talk about a maintainability nightmare I can't think of anything worse than trying to manipulate threads directly in Go. I had to do this in a previous Go project where thread pinning was important and even that sucked.

Thread pinning is a very rare requirement, typically you only need it when you're calling some poorly-written C library. If this is your requirement, then Go's solution will be less maintainable, but for everyone else the absence of the foot-gun is the more maintainable solution (i.e., as opposed to an ecosystem of intermingled OS threads and goroutines).

> That is just taste. Objectively collections and many other aspects of the Java stdlib completely destroy Go, I pointed out the good bits already.

Agreed that it's taste. Agreed that Java has more collections than Go, but I think it's a good thing that Go pushes people toward slices and hashmaps because those are the right tool for the job 90% of the time. I think there's some broader point here about how Java doesn't do a good job of encouraging people away from misfeatures (e.g., inheritance, raw threads, off-heap memory, etc).

> Again, taste. Java has a slightly steeper and longer learning curve but that is a cost you pay once and is amortized over all the code that engineer will contribute over their tenure.

Java has a significantly steeper/longer curve--it's not only the language that you must learn, but also the stdlib, runtime, tools, etc and these are typically considerably more complicated than Go. Moreover, it's a cost an engineer pays once, but it's a cost an organization pays over and over (either because they have to train people in Java or narrow their hiring pool).

> Build systems in Java by and large fall into only 3 camps, Maven, Gradle and a very small (but loud/dedicated) Bazel camp. Contrast that to Go which is almost always a huge pile of horrible Makefiles, CMake, Bazel or some other crazy homebrewed bash build system.

Go has one build system, `go build`. Some people will wrap those in Makefiles (typically very lightweight makefiles e.g., they just call `go build` with a few flags). A minuscule number of projects use Bazel--for all intents and purposes, Bazel is not part of the Go ecosystem. I haven't seen any "crazy homebrewed bash build system" either, I suspect this falls into the "for all intents and purposes not part of the Go ecosystem" category as well. I've been writing Go regularly since 2012.

> You don't escape CI because you used Go, if you think you did then you are probably doing Go wrong.

I claimed the CI burden is lighter for Go than Java, not that it goes away entirely.

> Java runtime trades simplicity for ability to be tuned, again taste. I personally prefer it.

I think it's difficult to accurately quantify, but I don't think it's a matter of taste. Specifically, I would wager that Go's defaults + knobs are less work than Java for something like 99% of applications.

> So no, I don't think I am mistaken. I think you just prefer Go over Java for subjective reasons. Which is completely OK but doesn't invalidate anything I said.

I agree that some questions are subjective, but I think on many objective questions you are mistaken (e.g., performance, build tool ecosystem, etc).


> On a stdlib level Java mostly wins

This isn't even true compared to other comparable platforms like .NET, let alone Go which has hands down the most useful and well constructed standard library in existence (yes, even better than Python).


Yeah I don't buy that.

Especially not when things like this exist: https://pkg.go.dev/container

And things like this don't: https://docs.oracle.com/en/java/javase/17/docs/api/java.base...

As I mentioned Go does have a great HTTP and TLS stack but that doesn't do enough to put it on the same level.


Go didn’t have any container types in the standard library because it didn’t have generics and slice plus hash map solve 90% of what you need. I think an avalanche of container libraries are coming to Go. I already wrote my own deque and have seen a few others.


This is an argument from edge case capabilities that completely ignores maintenance costs + development time. Seems very naive to me.


Totally agree. If the argument is strictly more power is always better, then C++ would always win. Why doesn't it? Exactly what you reference, dev time and maintenance.

Go was designed for simplicity. Of course it's not the fastest or most feature rich. It's strong suit is that I can pop open any Go codebase and understand what's going on fairly quickly. I went from not knowing any Go to working with it effectively in a large codebase in a couple weeks. Not the case with Java. Not the case with most languages.


That wasn't the argument though, you are attacking a strawman. The argument was much more nuanced if you bothered to read it.

Essentially it boils down to this. If I am writing -systems- software and I'm going to choose between Go or Java then the list of things I pointed out are the main differentiating features along with raw throughput which matters for things like databases which need to be able to do very fast index/bitmap/etc operations.

Go is great for being simple and easy to get going. However that is completely worthless in systems software that requires years of background knowledge to meaningfully contribute to. The startup cost of learning a new codebase (or entirely new programming language) pales in comparison to the requisite background knowledge.


> Go is strictly less useful than Java because it has strictly less power.

Literally sentence one, so calling my argument straw-man is dishonest.

> Essentially it boils down to this. If I am writing -systems- software and I'm going to choose between Go or Java then the list of things I pointed out are the main differentiating features along with raw throughput which matters for things like databases which need to be able to do very fast index/bitmap/etc operations.

All true. In my experience though, the long tail of maintenance and bug fixes tend to result in decreasing performance over time, as well as a slowing of new feature support.

All of that being said, these are all fairly pointless metrics when we can just look at the DBs being adopted and why people are adopting them. Plenty of projects use Go because of Go's strengths, so saying "that is completely worthless in systems software" is verifiably false. It's not worthless in any software, worth less maybe, but not worthless.


It's not. If you are building a database or other "systems" software these are very relevant capabilities.

Also development time of Java may be slightly longer in the early stages but I generally find refactoring of Java projects and shuffling of timelines etc is a ton easier than Go. So I think Java wins out over a longer period of time even if it starts off a bit slower.

It's far from naive. I have written a shitton of Go code (also a shitton of Java if that wasn't already apparent).


You may not personally be naive, but i was talking about your analysis, not you.

>Also development time of Java may be slightly longer in the early stages but I generally find refactoring of Java projects and shuffling of timelines etc is a ton easier than Go. So I think Java wins out over a longer period of time even if it starts off a bit slower.

I think this topic is far too large to be answered in this brief sentence. I also think it deserves a higher allocation of your words than what you spared for java's capabilities :)

But yes, I see now that you are interested purely in performance in your argument and definition of systems software, in which case what you're saying may be true.


Go has user defined value types which Java does not yet. It makes huge difference in memory density for typical data structures. This makes Go more suitable to low overhead web services, cli tools running on few MBs which Java at least needs few hundred MBs


> Go has user defined value types which Java does not yet.

C# has this. A lot of people overlook C# in this area, probably because until recently, it was not cross-platform.


The games world has understood C# to be a reasonably capable language for a while. But that world often feels disjoint with the enterprise software world where Go has thrived, eating into Java's niche.


On my enterprise world, it is all about .NET and Java.

Unless one is doing DevOps related tasks involving some form of Docker or Kubernetes, it isn't even something people bother with.

And even then, most automatition tasks are written in a mix of Python and PowerShell code.


> The games world has understood C# to be a reasonably capable language for a while.

Only in unity, the other major engine in the room still uses C++ for performance reasons.


> By that definition, Java is a systems language as well.

I would fully agree that Java is a systems language.

However, the definition of "systems language" has been a contentious issue for a very long time, and that debate seems unlikely to be resolved in this thread. I don't think the term itself is very useful, so it's probably better if everyone focuses on discussing things that actually matter to the applications they're trying to develop instead of arguing about nebulous classifications.


People do huge amounts of systems programming in Java, including in systems that are incredibly performance-sensitive.


I don't think it's useful to frame "fitness for a given domain" as a binary, but yes, Java is often used successfully for this domain (although personally I think Go is an even better fit for a variety of reasons).


> mandatory GC managed memory

It can be a right old bugger - I've been tweaking gron's memory usage down as a side project (e.g. 1M lines of my sample file into original gron uses 6G maxrss/33G peak, new tweaked uses 83M maxrss/80M peak) and there's a couple of pathological cases where the code seems to spend more time GCing than parsing, even with `runtime.GC()` forced every N lines. In C, I'd know where my memory was and what it was doing but even with things like pprof, I'm mostly in the dark with Go.


> I'd argue that golang is inherently not a systems language

First you'd have to establish what "systems" means. That, you'll find, is all over the place. Some people see systems as low level components like the kernel, others the userland that allows the user to operate the computer (the set of Unix utilizes, for example), you're suggesting databases and things like that.

The middle one, the small command line utilities that allow you to perform focused functions, is a reasonably decent fit for Go. This is one of the places it has really found a niche.

What's certain is that the Go team comes from a very different world to a lot of us. The definitions they use, across the board, are not congruent with what you'll often find elsewhere. Systems is one example that has drawn attention, but it doesn't end there. For example, what Go calls casting is the opposite of what some other languages call casting.


What is it that Go supposedly calls casting? The term (or its variations) does not show up in the language specification.

People sometimes use it for type conversions but that's in line with usage elsewhere, no?


Are you talking about type assertions? It's right there in the Go Tour. But that's a different thing than a type conversion.


I agree. golang is not really a system programming. It's more like java, a language for applications.

It does have one niche, that it includes most if not everything you need run a network-based service(or micro-service), e.g http,even https, dns...are baked in. You no longer need to install openssl on windows for example, in golang one binary will include all of those(with CGO disabled too).

I do system programming in c and c++, maybe rust later when I have time to grasp that, there is no way for using Go there.

For network related applications, Go thus far is my favorite, nothing beat it, one binary has its all, can't be easier to upgrade in the field too.


i agree with you that garbage collected languages are bad for systems programming but it's not because garbage collection is inherently bad, it's because gc doesn't handle freeing resources other than memory. for better or worse i've spent most of my professional career writing databases in java and i will not start a database product in java or any other garbage collected language again. getting the error handling and resource cleanup right is way harder in java or go than c++ or rust because raii is the only sane way to do it.


Rust itself will most likely get some form of support for local, "pluggable" garbage collection in the near future. It's needed for managing general graphs with possible cycles, which might come up even in "systems programming" scenarios where performance is a focus - especially as the need for auto-managing large, complex systems increases.


`Rc` and `Arc` have weak references which have worked just fine for me to break cycles in a graph. Not saying my use case is the most complex, but I haven't noticed this as a problem yet. YMMV


If your data is "mostly a tree but has the occasional back reference I can easily identify as a back reference", that works great and I've used it in other languages.

But in the limit, as your data becomes maximally "graphy" and there isn't much you can do about it, this ceases to be a viable option. You have to be able to carve a "canonical tree" out of your graph for this to work and that's not always possible. (Practically so, I mean. Mathematically you can always define a tree, by simple fiat if nothing else, but that doesn't help if the only available definitions are not themselves practical.)


Fair point - so far I've been lucky enough to avoid such use cases


But the whole point of weak references is that they don't get shared ownership (i.e. extend the lifetime) of their referenced values. That's not doing GC. They're fine when there's always some other (strong) reference that will ensure whatever lifetime is needed, but that's not the general case that GC is intended for.


Weak pointers don't create a strong reference, correct, and that is exactly what is needed to break a cycle. Since it is a cycle, there is some other "owning" strong reference out there. Every use case I've seen generally has an obvious strong and weak reference (usually parent vs child). I'm sure there are trickier corner cases, but that is the typical IMO.

For everything else, Rust has no need for a tracing GC, as it has "compile time" GC via static lifetime analysis which is much better IMO already and often avoid heap allocation all together.


I'd say Golang is a systems language if you consider the system to be interconnected services, sort of like a C for the HTTP and Kubernetes age. It has better performance than typical scripting languages, but it's probably not meant to write truly low-level code. I'd argue that GC doesn't matter on a typical networked service scale.


I've used a couple of DBs written in Go before and they were great. Loved using InfluxDB it was robust and performant.


Ultimately, Vitess is good so any argument that concludes that it must be bad is not using evidence.


I would say Go is a systems programming language. A systems programming language is for creating services used by actual end user applications. That is pretty much what Go is being used for. Who is writing editors or drawing applications in Go? Nobody.

Go does contain many of the things of interest to systems programmers such as pointers and the ability to specify memory layout of data structures. You can make your own secondary allocators. In short it gives you far more fine grained control over how memory is used than something like Java or Python.

https://erik-engheim.medium.com/is-go-a-systems-programming-...


I'm not saying this is wrong, because I think what "systems programming" is subjective.

But wouldn't this classify Java as a systems language? Java is used to build DBs and I believe AWS's infrastructure is mostly java. Plus, Java definitely has pointers.


Kubernetes was originally written in Java, it was ported to Go due to some new team members being Go advocates.


Let’s start at the beginning. What is a «systems language» and for what is it typically used?


Colloquial language is the worst. When I hear people say systems language I imagine that they dropped the "operating" before "systems language", meaning a language that can be used to build an operating system with commonly expected performance characteristics and low resource intensity.


A systems language is a language used (typically) to write systems :P

Jokes aside, this is kind of a fundamental problem with the term, and many terms around classifying programs. Also worth noting - "program" is a term that is a lot looser than people who typically live above the kernel tend to think.


I would argue that the area the term might apply to has widened over the years, so there is more room for languages that don't do things that require more explicit management of resources like memory.

Also, given the number of times per week that my CPU fan spins up because of some daemon that misbehaves on macOS, I would also argue that whatever makes system programmers produce higher quality code would be better than having people who struggle with C/C++/Objective-C write daemons that blow up all the time.

After ~5 years of Go, I think it is an important systems language simply because it is easier to write reasonably performant software in with a lower chance of silly mistakes that makes it blow up.

After ~35+ years of programming I also think that the notion of "fast" when used about languages is a bit silly because that isn't where most performance actually comes from. Not in actual real life. Most performance tends to come from picking the right idea and then implementing it efficiently and without too many silly mistakes. It doesn't help you if you use a language like C or C++ when you end up doing more work due to poor algorithm or design choices, or you fail to exploit the parallelism your computer is capable of. And you will make poorer choices if much of your cognitive capacity is tied up in just keeping the car on the road.

As we speak I'm working on a system where we just moved a bunch of higher order functionality out of firmware and into a server just because producing an acceptable quality implementation of that functionality in the firmware takes about 4-5 times as long. I am pretty sure a high quality implementation just wasn't going to happen regardless of how much time you sunk into it. That's the kind of real-world problems where better choices have actual impact.

We kind of have to relate to reality. The reality in which my mac keeps having daemons blow up because whoever wrote it didn't didn't manage to do a good job. It doesn't help that C/C++/Objective-C is a systems language when people write software that doesn't work in it. And that happens. With annoying regularity.

Sure an expert C/C++ programmer can probably do better than an average or even reasonably good Go programmer. But I've been at this for 35+ years and in my experience, the C/C++ programmers that can consistently write correct, understandable and performant software are much rarer than people tend to believe. It doesn't help to have more performance potential when most of that potential is outside the reach of most practitioners.

People tell themselves all manner of stories about which is better, A or B, but the only useful metric is what quality software comes out the other end of the pipeline. It isn't always what you wish it to be.

(I have hopes that Rust might replace C/C++ on embedded platforms. But for a lot of systems work, Go is just fine by me. Whatever enables people to write software that works better)


Something that is used to write an OS, low level programming, low memory footprint, high performance/optimized.


Please accept that that's your opinion. https://en.wikipedia.org/wiki/System_software


When you say OS, do you mean kernel or kernel and the daemons?


Both.

Also everything that has low memory, like embedded systems.

Anything that is real time cannot have a GC, because GC are unpredictable. So everything in aircraft, space, when it has time sensitive machinery...


Anything that can't have a gc probaly can't use a regular memory allocator either (since they aren't necessarily easy to predict). So you would be left with static allocation and stack variables.



> with its mandatory GC managed memory

is that factual, in the general case?

it seems there exists a category of Go programs for which escape analysis entirely obviates heap allocations, in which case if there is any garbage collection it originates in the statically linked runtime.


Honest question, what is golang a good choice for? It seems to inhabit the nether realm between high level productivity and low level performance, not being good at either.


> It seems to inhabit the nether realm between high level productivity

This is where you're wrong.

What would you consider "high level productivity" then? Java? Ruby? Ruby on Rails?


Yes, among others. Not golang though. I've used it and it feels lower level than those and less productive


We have been using golang for a long time without generic overhead. Why is now that it’s added we stop caring about performance. Is this the same HN that complains about memory bloat in every JavaScript thread?


Hold the phone. Where did the leap from “golang is not a systems language” to “poor choice for anything performance or memory sensitive” come from?

That is a huge leap you are making there that I don’t think is exactly justified.


> performance or memory sensitive

Depends on what you're measuring as performance.

Server request throughput? Being middleware API server to relay data between a frontend and a backend?


Typical gatekeeping. I like Go, because it lets me get stuff done. You could say the same about JavaScript, but I think Go is better because of the type system. C, C++ and Rust are faster in many cases, but man are they awful to work with.

C and C++ dont really have package management to speak of, its basically "figure it out yourself". I tried Rust a couple of times, but the Result/Option paradigm basically forces you into this deeply nested code style that I hate.


Nobody's saying you can't use Go or must use C/C++/Rust. If Go works for you, that's great.

The issue is about positioning of Go as a language. It's confusing due to being (formerly) marketed as a "systems programming language" that is typically a domain of C/C++/Rust, but technically Go fits closer to capabilities of Java or TypeScript.


Is writing a compiler, linker, kernel emulation layer, TCP/IP stack or a GPU debugger, systems programming?


If you dig deep enough into the Turing Tarpit, you can write these in JavaScript too (in fact, some of these have already been written).

It's merely a disagreement about what the label "systems programming" should mean, rather than about capabilities and performance of the Go language. Objectively and undeniably Go relies on garbage collection, and C/C++/Rust don't. Google's implementation of Go prefers fast compilation speed over maximizing run-time efficiency, and C/C++/Rust implementations aim for zero-(runtime)cost abstractions.

I hope it's clear there's no argument about the substantial technical differences between these languages. They have clearly different trade-offs, regardless of what name you think is appropriate to use for that difference.



I'm saying forks aren't for eating soup, and you're showing me soups eaten with a fork.

In case of languages' general capabilities/applicability, I don't think singular counterexamples are sufficient, because they prove "can", rather than "should". Turing Tarpit means you can push languages way beyond their appropriate use-cases.

There are also weird cases like Java smart cards and LISP machines, but I don't think it's useful to use these exceptions to classify Java and LISP as "assembly" or "machine" languages. Go does very well in the network services niche, but you can also write such network services in Bash (determined-enough people have written Bash HTTP servers). You can write small shell utilities in Go too. Does this mean Go and Bash are the same type of language?

The lines are blurry, but you have to draw a line somewhere, otherwise every language is a shell scripting systems programming assembly machine language.


I've found Go to be much simpler than Rust, especially syntax wise. However, in Rust you can use the ? operator which propagates errors. In Go you have to check err != nil.


> Rust you can use the ? operator

That doesnt work with all types:

https://stackoverflow.com/a/65085003


It works for any type that implements the `std::error::Error` trait; which is something you can easily implement for your own types. If you want your errors to be integers for some reason, you can wrap that type in a zero-sized "newtype" wrapper, and implement `Error` for that.

The Stack Overflow answer you linked seems to be claiming that it's simply easier to return strings, but I wouldn't say this is a restriction imposed by the language.


> easily implement for your own types

have you ever actually done that? I have, its not easy. Please dont try to hand wave away negatives of the Rust type system.


> have you ever actually done that? I have, its not easy.

Yes. I do it frequently. "#[derive(Error, Debug)]": https://github.com/dtolnay/thiserror#example

Much easier than implementing the error interface in go.

Rust is powerful enough to allow macros to remove annoying boiler-plate, and so most people using rust will grab one of the error-handling crates that are de-facto standard and remove the minor pain you're talking about.

In go, it's not really possible to do this because the language doesn't provide such macros (i.e. the old third-party github.com/pkg/errors wanted you to implement 'Cause', but couldn't provide sugar like 'this-error' does for it because go is simply less powerful).

I've found implementing errors in go to be much more error-prone and painful than in rust, and that's not to mention every function returning untyped errors, meaning I have no clue what callers should check for and handle new errors I add.


> Much easier than implementing the error interface in go.

is this a joke? You have to import a third party package, just to implement an error interface? Here is Go example, no imports:

    type errorString string

    func (e errorString) Error() string {
       return string(e)
    }


It was not a joke.

Let's look at a common example: you want to return two different types of errors and have the caller distinguish between them. Let me show it to you in rust and go.

Rust:

    #[derive(Error, Debug)]
    pub enum MyErrors {
       #[error("NotFound: {0}")
       NotFound(String),
       #[error("Internal error")]
       Internal(#[source] anyhow::Error),
    }
The equivalent go would be something like:

    type NotFoundErr struct {
        msg string
    }

    func (err NotFoundErr) Error() string {
        return "NotFound: " + err.msg
    }

    func (err NotFoundErr) Is(target error) bool {
        if target == nil {
            return false
        }
        // All NotFoundErrs are considered the same, regardless of msg
        _, ok := target.(NotFoundErr)
        return ok
    }

    type InternalErr struct {
        wrapped error
    }
    
    func (err InternalErr) Error() string {
        return fmt.Sprintf("Internal error: %s", err.wrapped)
    }

    func (err InternalErr) Unwrap() error {
        return err.wrapped
    }


I dont think you realize how ridiculous this comment is. Youre comparing 10 lines of Go, with 200 of Rust:

https://github.com/dtolnay/thiserror/blob/master/src/lib.rs


What's the difference between importing some hundred's of lines from thiserror in rust vs importing the "error" package in go?

If the difference is just "It's okay to use stdlib code, but not dependencies", then go's a great language by that metric.

I don't think that's what matters at all. What matters is the abstraction that the programmer deals with.

In go, the abstraction I deal with, as a programmer, is what I showed above.

In rust, the abstraction I deal with is also what I showed above. One of those things is simpler.

Further, the abstraction in go is leakier. My function returns a 'error' interface even if it can only be those two concrete types, so the caller of my function has to step into my function, reason through it to figure out all the error variants, and then check them with 'errors.Is' or such, and changes to what variants are returned aren't statically checked.

In rust, I can look at the type-signature, write a 'match' statement, and never have to manually figure out all the variants returned, since the compiler just knows.

My point here is that what matters is the abstraction that programmers interact with. Third party libraries are bad when they give a leaky abstraction that causes the programmer pain. Standard libraries likewise.

In this case, the abstraction available in rust is cleaner and requires less code for me, the user of the package, so I don't think the lines of code used to build that abstraction matter.

Why do you see this as something that matters?


> What's the difference between importing some hundred's of lines from thiserror in rust vs importing the "error" package in go?

Again, you're comparing apples and oranges. It seems you didn't see my previous example, here it is again:

    type errorString string

    func (e errorString) Error() string {
       return string(e)
    }


I addressed that with the start of my comment: "Let's look at a common example: you want to return two different types of errors and have the caller distinguish between them"

Yes, your example implements the error interface, but it's not realistic. There is real go code that does that, but that's the reason I have to do string-matching type error handling in go, and frankly it's an argument against the language that its error handling is such a mess that some people see that as reasonable.

Having code that does

    return errorString("You have to do string matching on me internal error")
is something you can do, sure, but it's not good code. Idiomatically in go, it would also be "errors.New", not this errorString type you made.

In rust, that would also be just as easy since it would be:

    bail!("oh no internal error")
using anyhow, which is the de-facto standard way to do crappy string-style erroring, like the above.

But that's not what I want to talk about since that sort of error handling isn't interesting in either language, and isn't where they differ.


You never gave a Rust example without using the 200 lines of Rust package.

You didnt do so, because implementing an error interface in Rust is a painful and extremely verbose process. Its not in Go, as I demonstrated.

> bail!("oh no internal error")

another Rust package. Are you unable to just write Rust without importing something?


As I said above:

> In this case, the abstraction available in rust is cleaner and requires less code for me, the user of the package, so I don't think the lines of code used to build that abstraction matter.

> Why do you see this as something that matters?

It's true that the abstraction in rust has more code underlying it, but why does that matter?

If you de-facto never have to implement the rust error trait by hand due to excellent library support, then it's a moot point, isn't it?

Anyway, my examples above did implement the error trait, simply by deriving it.

> Are you unable to just write Rust without importing something?

Rust does have less in the stdlib, yes. If you want a json serializer in go, that's the stdlib. In rust, it's the serde package.

Rust is intentionally a smaller standard library. I don't think there's anything wrong with that, and I personally prefer it since it allows for more innovation on things like http / json / error interfaces / etc.

I don't know why you're phrasing it like it's a bad thing.


Thats a good outlook. Dont worry about any problem, just import something an its fixed! More imports the better! Hopefully we can do away with all code, and just import everything.


Well, before Go 1.13 came out, every decent production codebase in Go had to import Dave Cheney's pkg/errors[1] unless you totally gave up the idea of having useful error traces.

Go 1.13 has incorporated most of what this package does into the standard library. It really doesn't matter much whether the code is in a de-facto standard package or folded into the standard library IMHO, but Rust could do the same.

[1]: https://github.com/pkg/errors


Rust did partially do this; one of the previous generation of packages here, failure, had most of its ideas eventually rolled into std::errror::Error, which enabled this next generation of packages (anyhow, thiserror, miette, eyre... the only one I'm not sure about is snafu, which did exist before, but I'm not sure if it got more improvements due to these changes or not.)


> Dont worry about any problem, just import something an its fixed! More imports the better!

We're talking about error boilerplate here.

Just as I do not write assembly by hand, but instead let the compiler generate that, just as I don't write http clients by hand, but let someone else do that, yes I would also rather not write error boilerplate by hand.

I would love it if I didn't have to write as much code to solve problems, but I recognize that the problems I'm solving are mostly specific enough that they require bespoke code specialized to the problem space.

However, error handling? Yeah, I don't care to hand-write that.

Said less glibly, I will happily import stuff when it makes my life easier, and write it myself when that is better.


Don't you think that's a little disingenuous?

You're absolutely right about implementing the `Error` trait yourself in Rust, it's a pain! And with many things, the Rust team felt it was better to leave error experimentation to an external library, as the community can iterate more quickly and settle on a favorite. At the moment, that favorite is `thiserror` (for defining errors in a library) and `anyhow` (for working with errors in application code)

That's been a pretty common pattern in Rust - leave some less clear parts of the standard library out for external crates to implement and experiment on. So you'll see almost all big Rust projects use one of those error handling libraries. You _could_ implement it with just the standard library, but you're just giving yourself more work for little to no gain.

Here's an example without using external libraries:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

Not particularly good or bad (and I didn't even replicate all the functionality that `thiserror` gives you).

So yeah, Go and Rust have different philosophies for their standard library, and you can argue about which is better, but I don't think one can be proved objectively better than the other.


The Rust code is doing much more than what the Go code is doing, it's making more incorrect cases impossible. I really don't understand the complaint about imports. Anything that's not built into the standard library is not valid somehow? I'm struggling to imagine what would make someone think this.


You've did notice the file you've linked to only has 7 non-comment/documentation lines, right?

Now, to be fair, there are 50 or so lines of code in other files, but I would still love to see a useful Go package which is that small.


I do it frequently. It is indeed easy


Only automatically printing something when returning it from main doesn't work with all types with the ? operator. And frankly 'handling errors by auto print and exit' is a bit of a code smell anyway, it's not much better than just .unwrap() on everything in main.


What, specifically, do you mean when you say Rust is "awful to work with"? With C and C++ I agree, but I've had a drastically better development experience in Rust than Go.


You should probably read the rest of the comment...


Are Result and Option really the only thing? Because nesting scopes based on Err/None is rarely the right choice, just like nesting a scope based on `if err == nil` isn't typically something you want to do in Go, or `if errno == 0` in C

  - You can panic trivially with `.unwrap()`
  - You can propagate the condition up trivially with `?`
    - `?` doesn't work with all types, but it does work with Option, and it does work with the vast majority of error types - making your custom error work with it is very easy (if you're whipping up an application or prototype and want your error handling very simple, `anyhow::Error` works great here)
  - You can convert None to an Err condition trivially with `.ok_or()?`
  - In cases where it makes sense, you can trivially use a default value with `.unwrap_or_default()`
And all of these use require a _lot_ less code than `if err != nil { return nil, err }`

And all of these allow you to use the Ok/Some value directly in a function call, or in a method chain, while still enabling the compiler to force you to handle the Err/None case

The common theme here being "trivial" :) Result/Option are a big piece of that better developer experience.


However in typical rust code bases this systematic wrapping causes a lot of unhandled panics.

Reading it feels just wrong.


git clone?

I mean do we need bespoke package management tooling for everything now?

Seems like an outdated systems admin meme that violates KISS, explodes dependency chains, risks security, etc. IT feels infected by sunk cost fallacy.

It’s electron state in machines. The less altogether the better.


> C and C++ dont really have package management to speak of

I hear this complaint often, but I consider it a feature of C. You end up with much less third party dependencies, and the libraries you do end up using have been battle tested for decades. I much prefer that to having to install hundreds of packages just to check if a number is even, like in JS.


I think you're contradicting yourself. You end up with fewer third party dependencies in C because C developers end up rewriting from scratch what they would otherwise import, and these rewrites have much less battle-testing than popular libraries in other languages. Moreover, they also have more opportunity for failure since C is so much less safe than other languages. Even in those few libraries which have received "decades of battle-testing" we still see critical vulnerabilities emerge. Lastly, you're implying a dichotomy between C and JS in a thread about Go, which doesn't have the same dependency sprawl as JS.


Hmm yes, why stop there? Why have functions? Just reimplement business logic all over your codebase. That way each block of code has everything you need to know. Sure, functions have been adopted by every other language/ecosystem and are universally known to be useful despite a few downsides but you could say the same about package management and that hasn't deterred you yet.


You're arguing against a straw man. I could just as easily say "why not make every line of code it's own function?"

C has libraries, and my comment made it clear that they are useful.

Argue against my actual point:

By not having a bespoke package manager, and instead relying on the system package manager, you end up with higher quality dependencies and with dramatically less bloat in C than other language ecosystems. It is all the benefit and none of the drawbacks of npm-like ecosystems.


I don't agree with your assessment that libraries in C are higher quality. Additionally I have yet to see a system package manager that enables developers to install dependencies at a specific version solely for a project without a lot of headaches. All the venv stuff in python is necessary python dependencies are installed system wide. The idea that the C/C++ ecosystem is better off because it doesn't have its own package manager is a bizarre idea at best.


I'm one of the few humans on the planet that installs his dependencies inside the project folder and uses PYTHONPATH.


> You end up with much less third party dependencies,

https://wiki.alopex.li/LetsBeRealAboutDependencies


This just proves my point. A complex, fully featured GUI application written in C needs 122 libs total.

In Rust, just installing SQLx for sqlite bindings requires 75 crates. And sqlite is one very small part of what rviz does. An rviz equivalent written in rust would require an order of magnitude more dependencies.


Ah, that is why all major OSes end up having some form of POSIX support to keep those C applications going.


You need have some OS API, what is wrong with POSIX?

And what does POSIX have to do with package management?


POSIX is UNIX rebranded as C runtime for OSes that aren't UNIX.


I don't even know how to parse that. POSIX is a standard which was based on the unix OS's of the time. POSIX is an API, not a runtime. Only Unix systems today implement posix (MacOs, Linux).


Yes it is a standard, just like ISO C one is, and traditionally they go go together, just like you never use IP on its own.

I suggest you to have a look around RTOS, mainframes, BeOS, and even NT history regarding POSIX support.


my big personal nit is poor async support; e.g. async disk IO is recent in Linux, and AFAIK all the Unices implement POSIX aio as a threadpool anyway. not being able to wait on "either this mutex/semaphore has been signaled, or this IO operation has completed" is also occasionally very annoying...


Sure, so can Modula-2 or Ada. Point is, degrees of separation in our biases, especially when it comes to C sugared languages.


Next to python it is a system language by comparison and that is what matters ultimately.


I agree.

There are just not enough statically typed languages that don't use a GC.


Go is a system (no s) language IMO.


The GC in Go is not mandatory.


In the language? In the implementation?


Both. It can be paused or disabled at runtime.


But then presumably if you disable it you have to have no heap allocation at any time in your program, or manually manage buffers. I feel like that's not worth it.


Sure. But it's not mandatory. And if it's about optimization, pausing the GC for critical regions can be useful, or just pre-allocating memory.


There are very fast DB written in Go so this comment is irrelevant, what is the equivalent of https://github.com/VictoriaMetrics/VictoriaMetrics in an other language?


There's highly tuned java software too, like Lucene, do you call java a systems language?

All in all I think the semantics debate is irrelevant. No one is going to use go for an OS only because someone on internet calls it a systems language.


F-Secure did, https://www.f-secure.com/en/consulting/foundry/usb-armory

Unless we now start the semantic debate if a Unikernel is an OS.


Gorilla which many of VM ideas are based on is in C++.

Druid is Java and very fast but not like for like as it's an event database not a timeseries database. Pinot is in the same vein.

Most of the very big and very fast databases you have used indirectly though web services like Netflix (Cassandra), etc are written in Java.


Go does has some form of monomorphization implemented in Go1.18; it is just behind a feature flag(compiler flags).

Look at the assembly difference between this two examples:

1. https://godbolt.org/z/7r84jd7Ya (without monomorphization)

2. https://godbolt.org/z/5Ecr133dz (with monomorphization)

If you don't want to use godbolt, run the command `go tool compile '-d=unified=1' -p . -S main.go`

I guess that the flag is not documented because the Go team has not committed themselves to whichever implementation.


FWIW you can have two different compilers (and outputs) for the same input: https://godbolt.org/z/bb1oG9TbP in the compiler pane just click "add new" and "clone compiler" (you can actually drag that button to immediately open the pane in the right layout instead of having your layout move to vertical thirds and needing to move the pane afterwards).

Learned that watching one of Matt's cppcon talks (A+, would do again), as you can expect this is useful to compare different versions of a compiler, or different compilers entirely, or different optimisation settings.

But wait, there's more! Using the top left Add dropdown, you can get a diff view between compilation outputs: https://godbolt.org/z/s3WxhEsKE (I maximised it because a diff view getting only a third of the layout is a bit narrow).


thanks!


I feel like a half-idiot but I'm unable to tell how exactly the "unified=1" implements monomorphization. I don't see the extra indirections which OP writes about.


This is a really long and informative article, but I would propose a change to the title here, since "Generics can make your Go code slower" seems like the expected outcome, where the conclusion of the article leans more towards "Generics don't always make your code slower", as well as enumerating some good ways to use generics, as well as some anti-patterns.


Is it the expected outcome? I was under the initial impression that the author also noted:

> Overall, this may have been a bit of a disappointment to those who expected to use Generics as a powerful option to optimize Go code, as it is done in other systems languages.

where the implementation would smartly inline code and have performance no worse than doing so manually. I quite appreciated the call to attention that there's a nonobvious embedded footgun.

(As a side note, this design choice is quite interesting, and I appreciate the author diving into their breakdown and thoughts on it!)


In C++, generics (templates) are zero-cost abstractions.

So no, generics do not de facto make code slower.


That's only 99% of the story. :) Having too many specializations of a C++ template can lead to code bloat, which can degrade cache locality, which can degrade performance.


You're definitely right. While it's not a particularly common problem, it does exist; one thing I'd really like to see enter the compiler world is an optimization step to use vtable dispatch (or something akin to Rust's enum_dispatch, since all concrete types should be knowable at compile time) in these cases.

I expect it would require a fair amount of tuning to become useful, but could be based on something analogous to the function inliner's cost model, along with number of calls per type. Could possibly be most useful as a PGO type step, where real-world call frequency with each concrete type is considered.


enum dispatch in Rust is one of my favorite tricks. Most of the time you have a limited number of implementations, and enum dispatch is often more performant and even less limiting (than say trait objects)


I'm a huge fan. It's very little work to use, as long as all variants can be known to the author, and as long as you aren't in a situation where uncommon variants drastically inflate the size of your common variants, it's a performance win, often a big one, compared to a boxed trait object.

Even when you have to box a variant to avoid inflating the size of the whole enum, that's still an improvement over a `dyn Trait` - it involves half as much pointer chasing

It'd be cool to see this added as a compiler optimization - even for cases where the author of an interface can't possibly know all variants (e.g. you have a `pub fn` that accepts a `&dyn MyTrait`), the compiler can


In my experience, code bloat from templates is overblown.

Inlining happens with or without template classes.


That's fair. I guess if you need the functionality in your program, you need the functionality: the codegen approach doesn't matter that much. And like pjmlp said, LTO can make a difference too. Thanks for your thoughts, these kinds of exchanges make me smarter. :)


It's still zero cost compared to what you would have done without them - copy and paste the code.

That's what zero cost abstraction means - it doesn't mean that whatever you're writing has no cost, it means the abstraction has no extra costs compared to what you would have to do manually without it.


Depends if LTO is used.


There are no true zero cost abstractions under all situations. In the general case they make things faster, but I've personally made C++ code faster by un templating code to relieve I$ pressure, and also allow the compiler to make smarter optimizations when it has less code to deal with. The optimizer passes practically have a finite window they can look at because of the complexity class of a lot of optimizer algorithms.


C++ can suffer from negative performance from template bloat in two ways:

Templated symbol names are gigantic. This can impact program link and load times significantly in addition to the inflated binary size.

Duplication of identical code for every type, for example the methods of std::vector<int> and std::vector<unsigned int> should compile to the same instructions. There are linker flags that allow some deduplication but those have their own drawbacks, another trick is to actively use void pointers for code parts that do not need to know the type, allowing them to be reused behind a type safe template based API.


> There are linker flags that allow some deduplication but those have their own drawbacks

As long as you use --icf=safe I don't see any drawback, and most of the time it results in almost identical reductions to --icf=all since not many real programs compare addresses of functions.


I think that requires separate function sections, which themselves may cause bloat and data duplication.


I, along with everyone in the embedded space, have been using separate function sections forever for --gc-sections and I would be very surprised if they really cause any bloat and duplication at runtime. Do you mean bloat for intermediate files?


It may be limited to intermediate files, I assumed that the downside would be bigger since it is not a default and the description mentioned that some things may not be merged as well.


At runtime maybe (although that's also not 100% true) - but I've seen a big project go from being compiled in 10 minutes in our ci, to hours due to the introduction of large features heavily relying on templates. The fix was installing a k8s cluster to run the Jenkins build jobs distributed on bare metal nodes, this wasn't exactly zero-cost.


I think his point was that they definitely won't make it faster (more abstraction means more indirection), so the expectation from most (myself included) would be that using them incurs a performance penalty, maybe not directly via their implementation, but via their use in broader terms.


Using templates in C++ can make code faster, though. Because you can write the same routine with more abstraction and less indirection.

I've used C++ templates effectively as a code generator to layer multiple levels of abstractions into completely customized code throughout the abstraction.


> Using templates in C++ can make code faster, though. Because you can write the same routine with more abstraction and less indirection

If we are talking about the same code using generics vs. not generics, one would expect similar or worse performance depending on implementation details, as you are strictly adding indirection or not. Think add two 'ints' vs add two 'T'. Depending on the implementation of generics, you're adding indirection, or not.

If we are talking about leveraging generics to write different code that is more efficient, code that is perhaps infeasible without generics, then yes, totally get what you are saying. I, and I think parent, were referencing the former however, which is maybe not the most helpful way of comparing things :)

> I've used C++ templates effectively as a code generator to layer multiple levels of abstractions into completely customized code throughout the abstraction.

Yeah, I've done the same to inline matrix operations for lidar data processing. Templates are pretty neat since they are completely expanded at compile time. I've yet to look into the details of Golang's generics as far as implementation details go, but since Go has had code generation built in for a while, and it creates static binaries, I imagine it is a very similar system.

EDIT: After reading the part of the post that goes into detail on Go's implementation of generics, it is very similar, but differs when there is indirection on the input types.


We don't know of a way to implement generic types without (vtable dispatch + boxing) cost AND without monomorphization cost. Some languages do former, some latter, some combination of 2.

Monomorphization: * code bloat * slow compiles * debug builds may be slow (esp c++)

Dynamic dispatch & boxing (Usually both are needed): * not zero cost

Pick your poison


"Zero-cost" in that context refers to runtime performance. It always refers to runtime performance.

And code bloat, as I've said elsewhere, is vastly overblown as a problem. Another commenter pointed out that link-time optimization removes most of the bloat. The rest is customized code that's optimized per-instantiation.

Slow compiles are an issue with C++ templates. They're literally a Turing-complete code-generation language of their own, and they can perform complex calculations at compile time, so yes, they tend to make compiles take longer when you're using them extensively. But the point I was making was about runtime performance. That's why C++ compilers often perform incremental compilation, which can limit the development time cost.

Debug builds can simply be slow in C++ with or without templates. C++ templates really don't affect debug build runtime performance in any material fashion; writing the code out customized for each given type should have identical performance to the template-generated version of the code, unless there's some obscure corner case I'm not considering.


> Slow compiles are an issue with C++ templates.

As far as I know

Rust has the same problem although to lesser extent. Monomorphization works well with judicious use. C++ STL is not written like that, they depend on 11111 layers of inlining to work well. Rust libraries aren't much better in this regard.

LTO removed some code bloat, but LTO itself takes more time. until thinLTO summary pass / equivalent pass in GCC WHOPR at least, middle end and early IR optimizations still have to happen, and Go wants to avoid that. I think that's a fine design choice. In Go's design, they have decided virtual calls aren't a cost they'd care anyway, pre 1.8 Go heavily used interfaces and that's not going to change.

> writing the code out customized for each given type should have identical performance to the template-generated version of the code

In theory yeah, but templates tend to generate more instantiations than strictly what you'd write by hand.

Also, obscure corner cases exist, but not big enough, thanks to those numerous man years spent on GCC and LLVM. https://travisdowns.github.io/blog/2020/01/20/zero.html


Zero cost from runtime performance, but you pay binary size for it. It's a trade off between the two...


Interestingly the original title and your proposed title imply, to me, the opposite of what I think they imply to you. This suggestion is really unclear.


For me Go has replaced Node as my preferred backend language. The reason is because of the power of static binaries, the confidence that the code I write today can still run ten years from now, and the performance.

The difference in the code I’m working with is being able to handle 250 req/s in node versus 50,000 req/s in Go without me doing any performance optimizations.

From my understanding Go was written with developer ergonomics first and performance is a lower priority. Generics undoubtedly make it a lot easier to write and maintain complex code. That may come at a performance cost but for the work I do even if it cuts the req/s in half I can always throw more servers at the problem.

Now if I was writing a database or something where performance is paramount I can understand where this can be a concern, it just isn’t for me.

I’d be very curious what orgs like CockroachDB and even K8s think about generics at the scale they’re using them.


Director of engineering for CockroachDB SQL here.

One of the major pain points we have with Go is the lack of language support for monomorphization. We rely on a hand-built monomorphizing code generator [0] to compile CockroachDB's vectorized SQL engine [1]. Vectorized SQL is about producing efficient, type and operator specific code for each SQL operator. As such, we rely on true monomorphization to produce a performant SQL query engine.

I have a hope that, eventually, Go Generics will be performant enough to support this use case. As the author points out, there is nothing in the spec that prevents this potential future! That future is not yet here, but that's okay.

There are probably some less performance-sensitive use cases within CockroachDB's code base that could benefit from generics, but we haven't spend time looking into it yet.

[0]: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql...

[1]: https://www.cockroachlabs.com/blog/how-we-built-a-vectorized...


Underrated post. Fast generics in go in under 30 lines? Thank you.

Feels like this approach could be leveraged to get default parameters / optional parameters in Golang too! The Go AST / token lib seems ridiculously flexible.

Huge CockroachDB fan btw. Thanks for the revolution in databases!

PS: Showcase of the awesome Go AST / token lib if you're a Python fan: http://igo.herokuapp.com/


Go was created with simplicity of feature set in mind, which does not translate into developer ergonomics automatically. It rather offers a least common denominator of lang features, so that most devs can handle it, who previously only handled other languages like Java and similar. This way Google aimed at attracting those devs. They'd not have to learn much to make the switch.

True developer ergonomics, as far as a programming language itself goes, stems from language features, which make goals easy to accomplish in little amount of code, in a readable way, using well crafted concepts of the language. Having to go to lengths, because your lang does not support programming language features (like generics in Go for a long time) is not developer ergonomics.

There is the aspect tooling for a language of course, but that has not necessarily to do with programmming language design. Same goes for standard library.


Rob Pike quantifies your sentiment as "Orthogonal Features"[0][1], which isn't necessarily equivalent to "simplicity of feature set". But I do understand what you meant.

I think in this context tho, developer ergonomics can mean different things to different people.

It's easy to see how "Orthogonal Features" can be interpreted as developer ergonomics, as its explicitly limiting potential (not all) anti-patterns and produces fairly idiomatic code across the ecosystem. I'm able to go to almost any Github repo that contains Go code, and easily determine whats going on, whats the flow, etc. Certainly ergonomic in that context.

[0]: https://go.dev/talks/2010/ExpressivenessOfGo-2010.pdf

[1]: https://www.informit.com/articles/article.aspx?p=1623555


250 vs. 50000 req/s seems like a too big of a difference to me. Sure Go is faster than Node but Node is no slough either, you might want to dig in some deeper why you only got 250 req/s with Node.


That could mostly be due to multithreading. That comes free with go but requires a different model in node.


50000/250=200, that is a lot of cores!


That calculation assumes that Go and Node have the same performance and the only gain is by parallelism. But I agree.. the performance difference seems strange. On a 1 core basis I'd expect something like a factor 2-5 gain.


Identical algorithms (e.g. quicksort) run ~40x faster in Go than Ruby in a single thread due to more efficient CPU usage, even when you avoid the kinds of steps that would allocate objects in Ruby. Multiply by 64 cores and it’s easy to observe a 1000x improvement with very little difference in the code.


Doesn't sound unrealistic if you have a mix load of IO and raw processing.


> The difference in the code I’m working with is being able to handle 250 req/s in node versus 50,000 req/s in Go without me doing any performance optimizations.

Your node code should be in the 2k reqs/s range trivially, with many frameworks comfortable offering 5k+.

It is never going to be as fast as go, but it will handle most cases.


How can you make these claims without information about what his application handles requests? Not everything is a trivial database read/write op.


esbuild is 100x faster than js build tools, so in general 100x speed up sounds about right.


Sucrase, written in Typescript, claims to be 2x faster than esbuild.

I'll leave you to your own benchmarks.


Sucrase does this by removing some things you might expect to be table stakes, like detecting invalid parse trees and complete JS syntax. Pragmatically, esbuild can also start faster and parallelize, which means shorter wall-clock builds.


Sure, but esbuild also drops features found in the compared tools to gain a performance edge. The language may bring some marginal difference, but it's clear that the algorithm is the most significant piece.


Really well written article. I liked that the author tried to keep a simple language around a fair amount of complex topics.

Although the article paints the Go solution for generics somewhat negative, it actually made me more positive to the Go solution.

I don't want generic code to be pushed everywhere in Go. I like Go to stay simple and it seems the choices the Go authors have made will discourage overuse of Generics. With interfaces you already avoid code duplication so why push generics? It is just a complication.

Now you can keep generics to the areas were Go didn't use to work so great.

Personally I quite like that Go is trying to find a niche somewhere between languages such as Python and C/C++. You get better performance than Python, but they are not seeking zero-overhead at any cost like C++ which dramatically increases complexity.

Given the huge amount of projects implemented with Java, C#, Python, Node etc there must be more than enough cases where Go has perfectly good performance. In the more extreme cases I suspect C++ and Rust are the better options.

Or if you do number crunching and more scientific stuff then Julia will actually outperform Go, despite being dynamically typed. Julia is a bit opposite of Go. Julia has generics (parameterized types) for performance rather than type safety.

In Julia you can create functions taking interface types and still get inlining and max performance. Just throwing it out there are many people seem to think that to achieve max performance you always need a complex statically typed language like C++/D/Rust. No you don't. There are also very high speed dynamic languages (well only Julia I guess at the moment. Possibly LuaJIT and Terra).


I expect we’re going to see most generic Go code happening at the lower levels of the stack. So cintainer libraries, utility/algo functions, and probably in some contexts around databases/ORMs. Outside of these contexts —- and because most usage will be able to simply leverage type deduction —- I’d guess most app code will look pretty similar to what we’ve seen before.


I'm excited about generics that gives you a tradeoff between monomorphization and "everything is a pointer". The "everything is a pointer" approach, like Haskell, is incredibly inefficient wrt execution time and memory usage, the "monomorphize everything" approach can explode your code size surprisingly fast.

I wouldn't be surprised if we get some control over monomorphization down the line, but if Go started with the monomorphization approach, it would be impossible to back out of it because it would cause performance regressions. Starting with the shape stenciling approach means that introducing monomorphization later can give you a performance improvement.

I'm not trying to predict whether we'll get monomorphization at some future point in Go, but I'm just saying that at least the door is open.


> "monomorphize everything" approach can explode your code size surprisingly fast.

It can in the naive implementation. Early C++ was famous for code bloat and (apparently) hasn't shaken that outdated impression.

In practice, monomorphization of templates hasn't been a serious issue in C++ for a long time. The compiler and linker technologies have advanced significantly.


> The compiler and linker technologies have advanced significantly.

AFAICT the linker de-duplicates identical pieces of machine code. You still can get multi-megabyte object files for every source file. I used to work on V8. Almost every .o is 3+MB. Times hundreds, plus bigger ones, it's more than a gigabyte of object files for a single build. That's absurd. Not V8's fault--stupid C++ compilation and linking model.


> Early C++ was famous for code bloat and (apparently) hasn't shaken that outdated impression.

It's not an outdated impression. C++ generics can and do interact very poorly with inlining and other language features to cause extremely large binary sizes, especially if you do anything complex inside them. They also harm compilation performance since each copy of the generic code needs to be optimized.

Generics in C++ are reasonably efficient when there is relatively little code generated per generic, but when this is not true, they can be a problem.


Haskell does monomorphization as well. See https://reasonablypolymorphic.com/blog/specialization/


Yes, they seem to have shipped a MVP first, which is a sensible approach. Controlling the extent of monomorphization requires changes in how the code is written, so if they had offered that exclusively it would've been a pitfall to existing users. By boxing everything, they keep their MVP closer to the previously idiomatic interface{} pattern.


A hybrid approach would be monomorphization for native types like int and pointers for records. C# is doing that if I remember correctly.


IMO that's a bad trade-off for many performance-sensitive applications, since it means that you can't rely on newtypes and structs for correctness.


Why is a speed part of the Go language contract but footprint of the executable is not? I, for one, would be quite miffed if an update of the Go compiler would mean an application would no longer fit on my mcu. That is worse then the application running slower.


> Why is a speed part of the Go language contract but footprint of the executable is not?

Because footprint of the executable has pretty literally never been, Go has always had deficient DCE and generated huge executables.


It also generates pretty slow executables. That doesn't invalidate the point.


Ideally it could be a compiler flag. Even more ideally, you could tell your compiler what the max size you want is and then it would optimize for the best speed at a given executable footprint.


My first use of Go generics has been for a concurrent "ECS" game engine. In this case, the gains are pretty obvious. I think.

I get to write one set of generic methods and data structures that operate over arbitrary "Component" structs, and I can allocate all my components of a particular type contiguously on the heap, then iterate over them with arbitrary, type-safe functions.

I can't fathom that doing this via a Component interface would be even as close as fast, because it would destroy cache performance by introducing a bunch of Interface tuples and pointer dereferencing for every single instance. Not to mention the type-unsafe code being yucky. Am I wrong?

FWIW I was able to update 2,000,000 components per (1/60s) frame per thread in a simple Game of Life prototype, which I am quite happy with. But I never bothered to evaluate if Interfaces would be as fast


Assuming your generic functions take _pointers_ to Components as input, full monomorphization does not occur and you're suffering a performance hit similar in magnitude, if not strictly greater empirically, to interface "dereferences".

On this basis, I don't believe your generic implementation is as faster than an interface implementation as you claim.


You're right, here's what one of my hot loops look like:

    func (cc *ComponentContainer[T]) ForEach(f func(*Component[T])) {
     for _, page := range cc.pool.pages {
      for i := range page {
       if page[i].IsActive() {
        f(&page[i])
       }
      }
     }
    }
Still, the interface approach is a total nightmare from a readability + runtime error perspective so I won't be going back & will just hope for some performance freebies in 1.19 or later :^)


Just to clarify things here: a pointer receiver on a generic type isn't what was covered in the article. That was covering if your `T` is a pointer type.

If T here is a value-type, you're probably getting monomorphization (at the cost of copying the value all over, and possible benefit of inlining to reduce that copying).


Thanks, that makes more sense!

My components are all small structs (coordinates, states, health values, etc) so I am happy to copy them around

Component[T] is something like

    {
        Metadata_fields...
        Data T
    }


Sweet! I've been using it for the same. Example game project (did it for a game jam): https://github.com/nikki93/raylib-5k -- in this case the Go gets transpiled to C++ and runs as WebAssembly too. Readme includes a link to play the game in the browser. game.gx.go and behaviors.gx.go kind of show the ECS style.

It's worked with 60fps performance on a naive N^2 collision algo over about 4200 entities -- but also I tend to use a broadphase for collisions in actual games (there's an "externs" system to call to other C/C++ and I use Chipmunk's broadphase).


Sounds interesting, is it available somewhere?


Still want to hit some milestones before releasing anything, so not quite


This is super interesting and well-written. Also, wow, that generated-assembly-viewer widget is slick.


yeah the formatting on this article was insanely good for a technical blog post. Good job, Planetscale marketing!


The font they use for their code examples is quite nice. Does anyone happen to know what it is?


IBM Plex Mono


Great article, just skimmed it, but will definitely dive deeper into it. I thought Go is doing full monomorphization.

As another datapoint I can add that I tried to replace the interface{}-based btree that I use as the main workhorse for grouping in OctoSQL[0] with a generic one, and got around 5% of a speedup out of it in terms of records per second.

That said, compiling with Go 1.18 vs Go 1.17 got me a 10-15% speedup by itself.

[0]:https://github.com/cube2222/octosql


> That said, compiling with Go 1.18 vs Go 1.17 got me a 10-15% speedup by itself.

Where did you see this speedup? Other than `GOAMD64` there wasn't much in the release notes about compiler or stdlib performance improvements so I didn't rush to get 1.18-compiled binaries deployed, but maybe I should...

(I do expect some nice speedups from using Cut and AvailableBuffer in a few places, but not without some rewrites.)


GOAMD64 could be significant, so I'm not sure why your comment seems to dismiss it?

Also, as the article mentions, Go 1.18 can now inline functions that contain a "range" for loop, which previously was not allowed, and this would contribute performance improvements for some programs by itself. The new register-based calling convention was extended to ARM64, so if you're running Go on something like Graviton2 or an Apple Silicon laptop, you could expect to see a measurable improvement from that too. (edit: the person you replied to confirmed they're using Apple Silicon, so definitely a major factor.)

The Go team is always working on performance improvements, so I'm sure there are others that made it into the release without being mentioned in the release notes.


> GOAMD64 could be significant, so I'm not sure why your comment seems to dismiss it?

Because it's not just upgrading from 1.17 to 1.18, you actually need to set it. Also because I ran a bunch of our code's benchmark suites, including a rather large set of custom serdes, and saw no improvements. Even some of the synthetic benchmarks used to introduce the new optimizations are below 10%.

Hopefully it'll grow in scope over the next few versions.


I've experienced that speedup on an ARM MacBook Pro. I've just checked on Linux AMD64 and there's no performance difference there.


It's probably because of the new register passing calling convention. From https://tip.golang.org/doc/go1.18

> Go 1.17 implemented a new way of passing function arguments and results using registers instead of the stack on 64-bit x86 architecture on selected operating systems. Go 1.18 expands the supported platforms to include 64-bit ARM (GOARCH=arm64), big- and little-endian 64-bit PowerPC (GOARCH=ppc64, ppc64le), as well as 64-bit x86 architecture (GOARCH=amd64) on all operating systems. On 64-bit ARM and 64-bit PowerPC systems, benchmarking shows typical performance improvements of 10% or more.


Yeah, 1.17 got register (instead of stack) calling convention on amd64; 1.18 expanded that to arm64, which should be responsible for most of that performance improvement.


Some of the issues pointed out by this (very good) article may already be fixed in tip Go, with https://go-review.googlesource.com/c/go/+/385274


The first code-to-assembly highlighting example here is beautiful. Question to the authors— is that custom just for this article?

Is there an open source CSS library or something that does this?


Hey, author here. Thanks for the kind words! This is a custom pipeline that I designed for the article. It's implemented as a Node.js library using SVG.js and it statically generates the interactive SVGs directly in the static site generator I was using (Eleventy) by calling out to the Go compiler and extracting assembly for any lines you mark as interesting. It turned out very handy for iterating, but it's not particularly reusable I'm afraid!


I came here to ask about the same thing. Very cool! I would be very interested even in a blog post just on how you did the SVG generation.


I agree with the commenter you're replying to; I'd only add that Intel syntax is much more readable than AT&T.


FWIW, I'm fairly sure this is the assembly syntax used by Go - the author may not have made a decision to use this vs another


Possibly, but objdump can disassemble into Intel syntax.


what I expect to happen now that golang has generics and reports like these will show up is golang will explore monomorphizing generics and get hard numbers. they may also choose to use some of the compilation speeds they've gained from linker optimizations and spend that on generics.

I can't imagine monomorphizing being that big of a deal during compilation if the generation is defered and results are cached.


I am unfamiliar with Go. This article discusses that they have decided to go for runtime lookup. Is there any reason why that implementation might make monomorphizing more difficult?


nope. it was an intentional trade off with respect to compilation speed. once generics have baked for a bit with real world usage said decision will almost certainly be revisited.

edit: for example one could envision the compiler generates the top n specializations per generic function based on usage and then uses the current stuff non-specialized version for the rest.


This is a great article yet with an unnecessarily sensationalist headline. Generics can be improved in performance over time, but a superstition like "generics are slow" (not the exact headline, but what it implies to reader) can remain stuck in our heads forever. I can see developers stick to the dogma of "never use generics if you want fast code", and resorting to terrible duplication, and more bugs.


Key tldr from me:

> Ah well. Overall, this may have been a bit of a disappointment to those who expected to use Generics as a powerful option to optimize Go code, as it is done in other systems languages. We have learned (I hope!) a lot of interesting details about the way the Go compiler deals with Generics. Unfortunately, we have also learned that the implementation shipped in 1.18, more often than not, makes Generic code slower than whatever it was replacing. But as we’ve seen in several examples, it needn’t be this way. Regardless of whether we consider Go as a “systems-oriented” language, it feels like runtime dictionaries was not the right technical implementation choice for a compiled language at all. Despite the low complexity of the Go compiler, it’s clear and measurable that its generated code has been steadily getting better on every release since 1.0, with very few regressions, up until now.

And remember:

> DO NOT despair and/or weep profusely, as there is no technical limitation in the language design for Go Generics that prevents an (eventual) implementation that uses monomorphization more aggressively to inline or de-virtualize method calls.


I agree. I find this snippet interestingly incorrect.

> with very few regressions, up until now.

the idea that this is a regression is silly. you can't have a regression unless old code is slower as a result. which is clearly not the case. its just a less than ideal outcome for generics. which will likely get resolved.


> Inlining code is great. Monomorphization is a total win for systems programming languages: it is, essentially, the only form of polymorphism that has zero runtime overhead

Blowing your icache can result in slowdowns. In many cases it's worth having smaller code even if it's a bit slower when microbenchmarked cache-hot, to avoid evicting other frequently used code from the cache in the real system.


The essay is missing a "usually", but it's true that monomorphisation is a gain in the vast majority of situations because of the data locality and optimisation opportunities offered by all the calls being static. Though obviously that assumes a pretty heavy optimisation pipeline (so languages like C++ or Rust benefit a lot more than a language with a lighter AOT optimisation pipeline like Java).

Much as with JITs (though probably with higher thresholds), issues occur for megamorphic callsites (when a generic function has a ton of instances), but that should be possible to dump for visibility, and there are common and pretty easy solutions for at least some cases e.g. trampolining through a small generic function (which will almost certainly be inlined) to one that's already monomorphic is pretty common when the generic bits are mostly a few conversions at the head of the function (this sort of trampolining is common in Rust, where "conversion" generics are often used for convenience purposes so e.g. a function will take an `T: AsRef<str>` so the caller doesn't have to extract an `&str` themselves).


Monomorphisation is a double-edged blade. Sometimes keeping the code smaller and hot is better than inlining everything, especially when your application does not exclusively own all the system resources (an assumption that many “systems programming languages” sadly do). There is too much focus on “performance” aka. microbenchmarks, but they don’t tell you the whole story. If you have a heavily async environment, with multiple tasks running in parallel and waiting on each other in complex patterns, more compact, reusable code can not only speed up the work but also allow you to do more work per watt of energy.

I think it’s great that golang designers decided to follow Swift’s approach instead of specializing everything. The performance issues can be fixed in time with more tools (like monomorphissation directives) and profile-guided optimization.


This is a very interesting article. I was however a bit confused by the lingo, calling everything generics. As I understood it the main point of the article quite precisely matched the distinction between generics and templates as I learned it. Therefore what surprised me most was the fact that go monomorphizes generic code sometimes. Which however makes sense given the way go's module-system works – i.e. imported modules are included in the compilation – but doesn't fit my general understanding of generics.


Rust also enthusiastically monomorphizes generic code. Templates vs generics seems to be more about the duck typing C++ templates use vs generics doing type parameters with statically checked constraints on the types.


Similar to how the GC has become faster and faster with each version, we can expect the generics implementation to be too. I wouldn’t pay much attention to conclusions about performance from the initial release of the feature. The Go team is quiet open with their approach.


> there’s no incentive to convert a pure function that takes an interface to use Generics in 1.18.

Good. I saw a lot of people suggesting in late 2021 that you could use generics as some kind of `#pragma force-devirtualization`, and that would be awful if it became common.


Why would that be awful?


First, because `[R io.Reader]` is an awful way to spell "force devirtualization". It's not explicit about what it means, and it's not derivable from first principles like Go's other odd spellings, e.g. `interface{}` or `var _ Reader = T{}`, are.

Second, it doesn't really promise to devirtualize it. If what I have is a variable of type `io.ReadCloser` - not just implementing it, but already boxed - it's not going to be able to unbox it for me.

Third, if that was all or even primarily what we wanted out of generics it would've been much better to spend the past two years hacking on the inliner and other parts of the compiler to improve devirtualization.

I don't think it would be awful to fix the unncessary pointer indirection (which looks like it's already happened), but I don't want 10x or even 2x longer compile times just because someone is trying to get the compiler to avoid boxing. It's a tradeoff, vs. e.g. spending that time simplifying methods to get the inliner to approve of them, which is win/win.


People that demanded generics don’t care about performance.

They care about making excuses about not using Go.


Bravo on the informative content and presentation. That component that shows the assembly next to syntax highlighted code? chefs kiss


Related. The introduction of Generics in Go revived an issue about the ergonomics of typesafe Context in a Go HTTP framework called Gin: https://github.com/gin-gonic/gin/issues/1123

If anyone can contribute, please do.


Meh. The people who screamed loudest about Generics missing in Go aren't going to be using the language now that the language has them, and are going to find something new to complain about.

The language will suffer now with additional developmental and other overhead.

The world will continue turning.


Reading the title I'm worried, should I keep using reflection instead?


If the information in this article is make-or-break for your program, you probably shouldn't have chosen Go.

In the grand space of all programming languages, Go is fast. In the space of compiled programming languages, it's on the slower end. If you're in a "counting CPU ops" situation it's not a good choice.

There is an intermediate space in which one is optimizing a particular tight loop, certainly, I've been there, and this can be nice to know. But if it's beyond "nice to know", you have a problem.

I don't know what you're doing with reflection but the odds are that it's wildly slower than anything in that article though, because of how it works. Reflection is basically like a dynamically-typed programming language runtime you can use as a library in Go, and does the same thing dynamically-typed languages (modulo JIT) do on their insides, which is essentially deal with everything through an extra layer of indirection. Not just a function call here or there... everything. Reading a field. Writing a field. Calling a function, etc. Everywhere you have runtime dynamic behavior, the need to check for a lot of things to be true, and everything operating through extra layers of pointers and table structs. Where the article is complaining about an extra CPU instruction here and an extra pointer indirection there, you've signed up for extra function calls and pointer indirections by the dozens. If you can convert reflection to generics it will almost certainly be a big win.

(But if you cared about performance you were probably also better off with an interface that didn't fully express what you meant and some extra type switches.)


This is good high-level advice as well as low-level advice.

Go is positioned to be most useful as an alternative to Java, and to C++ where performance isn't the key factor (i.e. projects where C++ would be chosen because "Enh, it's a big desktop application and C++ is familiar to a lot of developers," not because the project actually calls for being able to break out into assembly language easily or where fine-tuning performance is more important than tool-provided platform portability).


In practice, it's used as an alternative to python and ruby and nodejs. It can't fully do what Java or C# do.


Well, that is simply not true at all.

Go is a perfectly capable replacement for Java and C#. Many huge projects that would likely never be written in Python have been written in Go when they would have otherwise been written in Java or C# in years past: Kubernetes, Prometheus, HashiCorp Vault and Terraform, etcd, CoreDNS, TiDB, Loki, InfluxDB, NATS, Docker, Caddy, Gitea, Drone CI, Faktory, etc. The list goes on and on.

What, exactly, are you saying that Go can't do that Java can?

Go is not a perfectly capable replacement for Rust, for example, because Rust offers extremely low level control over all resource usage, making it much easier to use for situations where you need every last ounce of performance, but neither C# nor Java offer the capabilities Rust offers either.

I like C# just fine (Java... not so much), but your comment makes no sense. Certainly, I would rather use Go than most scripting languages; having static types and great performance makes a lot of tasks easier. But that doesn't mean Go is somehow less capable than Java or C#... it is a great alternative to both. If someone needs more than Go can provide, they're going to rewrite in Rust, C++, or C, not Java or C#.


> What, exactly, are you saying that Go can't do that Java can?

Runtime library addition (plugins) and dependency injection are two big ones. (We can argue the merit separately, but they're not possible in Go)

I think if Java had easily distributable static binaries k8s would have stayed Java (it started out as Java).


Plugins are barely possible and utterly impractical, so no objection there.

DI is totally possible, just about every system I build is nothing but dependency injection. What confuses people in the Java world is that you don't need a framework for it, you just do it. You could say the language simply supports a simple version of it natively.

If you want something much more complicated like the Java way, there are some libraries that do it, but few people find them worthwhile. They are a lot of drama for what is in Go not that much additional functionality.

This is one of the many places the interfaces not requiring declaration of conformance fundamentally changes Go vs. Java and leaves me still preferring Go even if Java picks up every other thing from Go. You don't need a big dependency injection framework; you just declare yourself an interface that matches what you use out of some 3rd-party library, then pass the value from the 3rd-party library in to your code naturally. Dependency injected. All other things you may want to do with that dependency, like swap in a testing implementation, you just do with Go code.

(And I personally think that if Java's interfaces were satisfied like Go's, there would never have been a Go.)


> Runtime library addition (plugins)

https://pkg.go.dev/plugin

Linux only, but it exists and it works… I just wouldn’t recommend that particular pattern for almost anything.

Either some kind of RPC or compile-time plugins would be better for almost all cases.

- With RPC plugins (using whatever kind of RPC that you prefer), you get the benefit of process isolation in case that plugin crashes, the plugin can be written in any language, and other programs can easily reuse those plugins if they desire. The Language Server Protocol is a great example of an RPC plugin system, and it has had a huge impact throughout the developer world.

- With compile-time plugins, you get even better performance due to the ahead-of-time optimizations that are possible. Go programs compile so quickly that it’s not a big deal to compile variants with different plugins included... this is what Caddy did early on, and this plugin architecture still works out well for them last I checked.

> dependency injection

https://go.dev/blog/wire

Java-style DI isn’t very idiomatic for Go, and it’s just a pattern (the absence of which would not prevent applications from being developed, the purpose of this discussion)… but there are several options for doing DI in Go, including this one from Google.


> Runtime library addition (plugins)

I don't see anything inherit to Go that would prevent it. gc even added rudimentary support some time ago, fostering the addition of the plugin package[1], but those doing the work ultimately determined that it wasn't useful enough to dedicate further effort towards improving it.

There was a proposal to remove it, but it turns out that some people are using runtime library addition, and so it remains.

[1] https://pkg.go.dev/plugin


I believe Go supports DI via Wire (https://github.com/google/wire).


Go supports DI by allowing functions with arguments :) Every language with functions supports DI. Wire is just a code-generator version of automatic DI.


golang doesn't have annotations, doesn't have enums, error handling is very error prone. No ability to choose from different GC implementations, nothing remotely close to JFR.


Go does have some of these features: annotations, enums, JFR. They're just built for features of Go rather than Java. They're not the same but perform similar roles for the Go language. Error handling is subjective but I'm not going to disagree it could use some help. Same with Go's GC in certain situations but nothing I've coded has needed more.

All that said, what makes your list any different than a similar list comparing Java to Blub? I don't see your list as preventing one from writing a Java app in Go using Go's idioms instead. (It can't fully do what Java or C# do.)


> They're just built for features of Go

They're inferior, and do not cover the same grounds (e.g. "enums" in golang are just integer constants, you can't code gen using golang tags, JFR is way way more comprehensive than anything that golang has etc.)

The GC selection, JIT, and hot swapping/reloading are features that do not exist in golang, and we've seen what hoops people have to jump through when they face issues that can only be resolved by them. Basically, they're features most people don't know they need them until they do, then they'er already in a big mess.

You can write a Java app in golang or assembly, but it won't be anywhere near as maintainable, clear, concise, debuggable, or correct.


Enums in Go are inferior but that doesn't stop you from using them in the same way.

Of course you can code gen using Go's tags. It's done all the time.

As already pointed out, Go has the JFR it needs, not what Java has.

Go doesn't need a JITter.

Go isn't perfect, but neither is Java or C#. Go code is known for being maintainable and easy for newcomers to read so your last line applies to Java more than Go. But it seems like you want to hate Go. Why?


> Enums in Go are inferior but that doesn't stop you from using them in the same way.

It does, I have to think twice because I know I'm not getting the compile time features that Java offers, and may end up just using strings in an awkward manner.

> Go has the JFR it needs, not what Java has

That doesn't make sense. All programs need the JFR that Java has, otherwise you wouldn't see companies like Datadog doing well.

> But it seems like you want to hate Go. Why?

I've used it in several large projects at my employer, and in every single project, Java would have been a superior alternative. We've had issues that would not have happened in Java due to many things, such as better error handling in Java, enum support, better frameworks and libraries to interact with the DB, etc.

And no, golang is not known for being maintainable, it's just things that some people parrot without proof. It's basically marketing. There's nothing in the language that inherently makes it more maintainable than Java/C#/etc., and in fact, it has many things that make it less maintainable.


I've coded Java, C#, and now Go and have spent years doing each. My experience differs from yours. Clearly our experiences differ and that's okay. I occasionally support Go and occasionally bash Go but really I don't have the passion to dislike any language as much as you seem to. Be well.


Why does golang need JIT when it already statically compiles to a non-virtual machine target?

It could probably use some performance optimizations, but JIT would be redundant when it's already compiled.


JIT makes things like the JFR possible.


Honestly, this thread is the first I've heard of Java Flight Recorder, so while I'm sure it has uses, it doesn't seem vital.

Nice-to-have (I've certainly enjoyed something similar writing JavaScript in a browser for years), but most developers aren't writing code under tight-enough constraints to justify a profiler running under a virtual machine.


I mean of coure. I have not seen IBM Websphere Server 6.0.1 written in Go. Neither is there a full fledge SOAP/WSDL engine in Go. So clearly Go is less capable.


I've seen similar abominations written in golang, but they're not public.


> It can't fully do what Java or C# do.

I am curious, what all stuff it cannot do?


Mostly stuff related to tooling and environment because unlike Java, it hasn't been around for two decades for people to patch its weaknesses with workarounds.


Very few people are actually answering your question, so I'll answer it: Generics are slower than concrete types, and are slower than simple interfaces. However, the article does not bother to compare generics with reflection, and my intuition says that generics will be faster than reflection.


If you're worried you should benchmark the differences on your requirements.


Use generics if it makes your dev experience better. Profile if it's slow. Optimize the slow bits.


If you're using reflection or storing a bare interface{}, you should probably instead try using generics.

If you're using real interfaces, you should keep using interfaces.

If you care about performance, you should not try to write Java-Streams / FP-like code in a language with no JIT and a non-generational non-compacting GC.


Definitely not. In the general case, you will make things simpler and faster by turning reflection-based code into generic code.

What this article says is that a function that is generic on an interface introduces a tiny bit of reflection (as little as is necessary to figure out if a type conforms to an interface and get an itab out of it), and that tiny bit of reflection is quite expensive. This means two things.

One, if you're not in a position where you're worried about what does or does not get devirtualized and inlined, this isn't a problem for you. If you're using reflection at all, this definitely doesn't apply to you.

Two, reflection is crazy expensive, and the whole point of the article is that the introduction of that tiny bit of reflection can make function calls literally twice as slow. If you are in a position where you care about the performance of function calls, you're never really going to improve upon the situation by piling on even more reflection.


Premature optimization is a bad thing.

Just implement naively, then if you have performance issues identify the bottleneck.


Ignorance of how your language works is a bad thing.

Knowing where performance issues with certain techniques might arise is not premature optimization. Implement with an appropriate level of care, including performance concerns. Not every kind of poor performance appears as a clear spike in a call graph, and even fewer can be fixed without changing any external API.


> Ignorance of how your language works is a bad thing.

And I never said anything remotely close to contradict this statement.

> Knowing where performance issues with certain techniques might arise is not premature optimization.

It is:

  - Python: should I use a for loop, a list comprehension or the map function?
  - C++: should I use a std::list, std::vector, ...?
  - Go: should I use interface{} or generics?
The difference between those options is subtle and completely unrelated to the problem you want to solve.

> Implement with an appropriate level of care, including performance concerns.

  Step 1: solve your problem naively, aka: make it work
  Step 2: add tests, separate business logic from implementation details, aka: make it right
  Step 3: profile / benchmark to see where the chokepoints are and optimize them, aka: make it fast
Chances are that if you have deeply nested loops, generics vs interface{} will be the last of your problems.

To take the C++ example again, until you have implemented your algorithm, you don't know what kind of operations (and how often) you will do with your container. So you can't know whether std::list or std::vector fits best.

In Go, until you have implemented your algorithm, you don't know how often you will have to use generics / reflection, so you can't know what will be the true impact on your code.

The "I know X is almost always faster so i'll use it instead of Y" will bite you more often than you can count.

> Not every kind of poor performance appears as a clear spike in a call graph

CPU usage, memory consumption, idling/waiting times, etc... Those are the kind of metrics you care about when benchmarking your code. No one said you only look at spike in a call graph.

But still, to look for such information, you need to have at least a first implementation of your problem's solution. Doing this before is a waste of time and energy because 80% of the time, your assumptions are wrong.

> and even fewer can be fixed without changing any external API.

This is why you "make it work" and "make it right" before you "make it fast".

This way you have a clear separation between your API and your implementation details.


You're giving fine advice for well-scoped tasks with minimal design space (well, sort of - using std::list ever is laughable - but if you had said unordered_map vs. map, sure, so I take the broad point). But, some of us have been around the block a few times though, and now need to make sure those spaces are delineated for others in a way that won't force them into a performance corner.

> until you have implemented your algorithm, you don't know what kind of operations (and how often) you will do with your container.. until you have implemented your algorithm, you don't know how often you will have to use generics / reflection, so you can't know what will be the true impact on your code.

I don't mean to brag, but I guess I'm a lot better at planning ahead than you. I don't usually have the whole program written in my head before I start, but I also can't remember any time I had to reach for a hammer as big as reflect and didn't expect to very early on, and most of the time I know what I intend to do to my data!

> This is why you "make it work" and "make it right" before you "make it fast"... This way you have a clear separation between your API and your implementation details.

This is not possible. APIs force performance constraints. Maybe wait until your API works before micro-optimizing it, but also maybe think about how many pointers you're going to have to chase and methods your users will need to implement in the first place because you probably don't get to "optimize" those later without breaking the API. You write about "the bottleneck", but there's not always a single bottleneck distinct from "the API". Sometimes there's a program that's slow because there's a singular part that takes 10 seconds and could take 1 second. But sometimes it's slow because every different bit of it is taking 2ns where it could take 1ns.

Consider the basic read-some-bytes API in Go vs. Python (translated into Go, so the difference is obvious):

    type GoReader interface { Read([]byte) (int, error) }

    type PyReader interface { Read(int) ([]byte, error) }
You're never going to make an API like PyReader anywhere near as fast as GoReader, no matter how much optimization you do!


> using std::list ever is laughable

https://baptiste-wicht.com/posts/2012/11/cpp-benchmark-vecto...

> some of us have been around the block a few times though, and now need to make sure those spaces are delineated for others in a way that won't force them into a performance corner.

This, just like the rest of your comment, is just patronizing and condescendant.

> I don't mean to brag, but I guess I'm a lot better at planning ahead than you

See previous point...

> I also can't remember any time I had to reach for a hammer as big as reflect and didn't expect to very early on

This is not what I said at all. Let's say you know early on, before any code is written, you will need reflection. Can you tell me how many calls to the reflection API will happen before-hand? Is it `n`? `nlog(n)`? `n²`? Will you use reflection at every corner, or just on the boundaries of your task? Once implemented, could it be refactored in a simpler way? You don't know until you wrote the code.

> most of the time I know what I intend to do to my data

"what" is the spec, "how" is the code, and there is multiple answers to the "how", until you write them and benchmark them, you can't know for sure which one is the best, you can only have assumptions/hypothesis. Unless you're doing constantly exactly the same thing.

> but also maybe think about how many pointers you're going to have to chase and methods your users will need to implement in the first place because you probably don't get to "optimize" those later without breaking the API.

Basically, "write the spec before jumping into code". Which is the basis of "make it work, make it right, make it fast" because if you don't even know what problem you're solving, there is no way you can do anything relevant.

> You write about "the bottleneck", but there's not always a single bottleneck distinct from "the API".

I never implied there is a single bottleneck. But If you separate the implementation details from the High-Level API, they sure are distinct. For example, you can solve the N+1 problem in a GraphQL API without changing its schema.

If your implementation details leaks to your API, it just means it's poorly separated.

> You're never going to make an API like PyReader anywhere near as fast as GoReader, no matter how much optimization you do!

Because Python is interpreted and Go is compiled. Under the hood, the OS uses the `int read(int fd, void dest, size_t count)`, and there is an upper limit to the `count` parameter (specific to the OS/kernel).

Python's IO API knows this and allocates a buffer only once under the hood, it would be equivalent to having a PyReader implementation using a GoReader interface + preallocated []byte slice.

I can't tell you which one is faster without a benchmark because the difference is so subtle, so I won't.


> This, just like the rest of your comment, is just patronizing and condescendant.

I tried to be a bit self-effacing also, but platitudes like "make it right, then make it fast" are condescending to begin with. Sometimes "make it right" requires knowing the performance implications of your architecture decisions.

> Because Python is interpreted and Go is compiled.

This is not why the Go API beats the Python API, which you can tell from knowing the basic language semantics plus method signature.


The language is a tool for a job.

If I'm using low torque, I don't need to know the yield strength of my wrench


(off-topic) Anyone else using Firefox know why the text starts out light gray and then flashes to unreadably dark gray after the page loads? (The header logo and text change from gray to blue too)


I'm using Firefox and don't see that issue. Maybe some kind of plugin you have installed?


This is essentially how C# generics have worked since forever. If you want performance, don't use pointer type arguments.


> you create an exciting universe of optimizations that are essentially impossible when using boxed types

Couldn't JIT do this?


Seems obvious; like, did someone expect all the extra abstraction would make Go faster?


The article articulates why it's reasonable to expect that generics would make Go faster. From TFA:

> Monomorphization is a total win for systems programming languages: it is, essentially, the only form of polymorphism that has zero runtime overhead, and often it has negative performance overhead. It makes generic code faster.


What extra abstraction?

I'd expect without monomorphization the code should perform the same as interface{} code, perhaps minus type cast error handling overhead. That's the model where generics are passing interface{} underneath, & exist only as a type check (à la Java type erasure)


Yes? We used code generators to monomorphize our code in like 2015 and it was faster than using interfaces. Generics could reasonably produce the same code we did in 2015, but they don't.


It seems like the code size vs speed trade-off would be well managed by FDO.


Is there any large project that done an in-place replacement to use generics that has been benchmarked? I doubt that the change is even measurable in general.


[flagged]


The article is like 9k words and it only mentions Rust twice in passing.


Generics are a generic solution, but they are absolutely necessary in my opinion.


Well sure. Not writing hand tuned assembly can make your code slower, too. Go's value as a language is how it fills the niche between Rust and Python, giving you low level things like manual memory control, while still making tradeoffs for performance and developer experience.


I might have worded it differently, but yeah, of cause generics can make your code slower, what did people expect.


> I might have worded it differently, but yeah, of cause generics can make your code slower, what did people expect.

? In most languages, it is compile-time overhead, not runtime.


I wouldn't say "most", it's very variable. Also not unlike Go I think C# uses a hybrid model, where all reference types get the same instances of the generic function.


It depends what they are replacing. Typically, generics used to replace runtime polymorphism (using [T any] []T instead of []any) would be a speed boost in C#, C++, or Rust; and would have no impact on speed in Java.


And it is also a speed boost in Go, assuming you don't call any methods. (Which, if you were really using [T any], you either weren't or you were dissembling about your acceptable types.)


From the article:

> Monomorphization is a total win for systems programming languages: it is, essentially, the only form of polymorphism that has zero runtime overhead, and often it has negative performance overhead. It makes generic code faster.

The point is that the way Go implements generics is in such a way that it can make your code slower, even though there is a well-known way that will not make your code slower (at the cost of compile times).


>even though there is a well-known way that will not make your code slower (at the cost of compile times).

That's the point though. The Golang team was surely aware of both approaches, and chose what they did as a conscious design decision to prefer faster compile times. People love Go because of the iteration speed compared to C++. And these little things start to add up if you don't have a clear product vision about what your language is meant for.


Of course, I understand that they made that decision. I was just replying to this:

> of cause generics can make your code slower, what did people expect.

It's not unreasonable that some people might have expected monomorphization.


In fact I sort of thought they had announced monomorphization, though I wasn’t following really carefully.


I would have expected generics to make the compiler take longer, not the compiled program.


I don't know about you, but when I imagine what compilers do with generic code, I typically imagine monomorphization, which (aside from increasing cache pressure a little), should generally not make things slower, but rather introduce possibilities for inlining that could make it faster.


Apparently I scrolled right past that bit of the article. I’m a little unsure how it’s suppose to make the code faster, but maybe because I compare it wrong. The alternative to generics is writing all the different function by hand, in my mind at least. I don’t fully understand how generics are suppose to be made faster than a custom function for that datatype.


I think the reasoning is that for something that's commonly used for many different types, you won't go through the effort of re-implementing that function for each type (it may not even be feasible to do so). Which means you'll end up with some sort of indirection to make it generic.


The point is they shouldn't be slower than a manually-copied implementation for that concrete type. They also should be faster than vtable dynamic dispatch in the vast majority of cases. (I also fail to see a compelling reason that they couldn't have been implemented by passing the fat pointer directly, making the codegen the same as passing an interface, instead of having that business with the extra layer of indirection.)

If there are specialization opportunities when hand-implementing the function for a given concrete type, I would indeed expect that to be faster than a monomorphized generic function.


> I don’t fully understand how generics are suppose to be made faster than a custom function for that datatype.

The point is that a monomorphized generic function should not be slower than the custom-function-per-datatype, but because Go's generics are not fully monomorphized they can be, and in fact can be slower than the a function-for-interface.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: