Hacker News new | past | comments | ask | show | jobs | submit login
Sub-millisecond GC pauses in Go 1.8 (groups.google.com)
365 points by geodel on Oct 28, 2016 | hide | past | favorite | 235 comments

The proposal document for this change is a good read: https://github.com/golang/proposal/blob/master/design/17503-...

"typical worst-case"

So... not worst-case. :)

Regardless, very impressive achievement, especially given that they've been able to do it without a JIT or exotic tricks like Azul.

Of course not. You can't put absolute worst case times on code in practice. Worst case might be that someone managed to load 80PB of RAM on a CPU underclocked to 50 MHz. Or that the OS suspended the process in the middle of GC and was promptly frozen for a week while VMs were migrated across the country in the cup holder of someone's Toyota.

Worst case in absolute time always requires ignoring pathological cases.

This feels very wrong to me. Qualifying it with some sort of analysis is fine. Just saying "typical" is borderline negligent and is why nobody just uses naive quicksort in production systems where runtime can actually matter.

What's the alternative?

We don't care about worst-case latency in practice, and average case is often ignored as well. We look at 99.9% latency numbers or things like that. Worst-case is for people designing pacemakers or rockets, that's not what we are doing here.

> that's not what we are doing here.

Considering this code will likely be used in IoT, yes, this is what we’re doing here.

Pretty sure web enabled cameras are NOT pacemakers NOR rockets... heck they can't even put a different password on each device by printing on a sticker.

Nor is my sous vide machine or anything else I've seen as an IoT device. I wouldn't call say my oscilliscope an IoT device even if it can connect to wifi. It's a scientific device with cloud upload.

You do realize pacemakers nowadays have WiFi and Bluetooth and run on the same platform as other IoT devices?

There was an interesting talk at the Chaos Communications Congress a few years back.

I'm well aware of this. I'm also aware that people are hacking these devices left and right. But a cow is a mammal and a mammal isn't a cow.

Gp post was not talking about security we're talking about Real time computing. If it's running on IoT hardware, eg rpi running linux, it's likely not doing real time os stuff because linux isn't really a real time os.

And I still don't classify pacemakers as IoT. Just because you want to slap a vague acyronym on any and every "connected" embedded device doesn't mean I agree.

But that's the thing, there's now a trial for implanted, smart, insulin pumps, running on node.js, connecting to its app via WiFi.

That stuff is IoT in every definition of the word.

I know a few software developers that have diabetes, and they seem to be reluctant to automatic pumps.

Monitoring might be fine, but installing something in the body that can kill you and then running it on node.js doesn't seem like a good idea.

>Just saying "typical" is borderline negligent

Pretty harsh for an informal post on a mailing list, its not like this was a formal proposal or an announcement.

Talking about why "typical" lacks meaning is fair, but calling the OP negligent is not.

Completely fair point and one I should have acknowledged. I was not intending to call the OP negligent. From what I saw, they did a full analysis and are preparing the full numbers. I was just irked by this idea that "typical" is all one needs.

It depends on how it's been determined that it's the "typical worst case"

...maybe an anonymous usage program?

...maybe their typical use case? (which is definitely not everyone elses)

...maybe the typical use case of Google employees using Go for their single-computer tasks? (which would be closer to a "typical user" as in people who read HN)

...an average of those three, or perhaps even more, use cases? wighted in some way?

But if done correctly, optimizing the 90% typical problem by 20%, gives you a general 18% improvement, far more than even making disappear the least important 10% of the runtime (which some bad optimization efforts tend to focus on...).

Making software faster is always about knowing where to work, and then focusing exactly on that. Nothing wrong in attacking the "typical [anything] case".

Sure you can. It's standard in embedded software. Called WCET analysis. There's also real-time GC's. Go team has simply not done either of these in their design. So, they can't give a worst-time estimate.

Indeed. If there's no worst case time, that eliminates use in hard-real-time, certifiable embedded, e.g. aerospace, medical, etc.

You can't very well tolerate an "atypical" GC pause when you're firing a laser or radiation pulse, doing engine control, or even updating a display in a flight instrument setting.

What does JIT have to do with GC?

A JIT can do better escape analysis, eliminating allocations. A JIT can also be more clever about safe point insertion. And when a JIT detects that code isn't being called concurrently, it can emit much cheaper read/write barriers.

This is what amortized analysis is about.

You can't amortize in real-time. You don't get a mulligan for missing a deadline because you can do twice as much work in the next frame.

Wait, what?

In my, admittedly maybe lacking, understanding amortization doesn't really work that way. "Amortized worst case" (for example) means that you can still bound the worst case[1], but it's just not necessarily going to be a very accurate bound. Obviously, amortized complexity doesn't tell you "<X ms" right off the bat since it deals in abstract "operations", but if you have known worst case bounds for all the "operations", then an amortized bound for a given operation will give you something equivalent to "<X ms".

[1] I mean, it's a common proof technique to actually have a non-negative cost function (+ invariant) and postulate/derive an upper bound on that, so... what gives? What's your reasoning here?

No. The entire premise of amortized analysis is to get a more optimistic "eventually O" number. "Eventually" is not good enough for real time. Yes you can get a real hard worst-case number, but that's a different analysis from amortized analysis. Unless all of your amortized operations are happening between deadlines, it's useless--worse, dangerous--for safety. And amortized analysis is almost never used that way. You don't have language run-times that reinitialize between every deadline.

You're right.

I somehow confused myself by thinking of it in terms of one of the proof techniques for amortized worst case where you derive a fixed upper bound for any "n". Of course this is a much stronger property than needed.

Very impressive. I wonder how Go manages to even stop all threads in 100 microseconds, much less do any work in that time.

Other systems I'm aware of that are capable of similar feats (I know .NET's collector can, not sure about Hotspot's) use page poisoning. Basically the compiler/JIT puts a bunch of extraneous memory accesses to a known address at key points where the GC can track what all variables and registers hold (called "safepoints"), in a fairly liberal pattern around the code. When the GC wants to stop all the threads in their tracks, it unmaps the page holding that address, causing segfaults in every thread it it hits one of the safepoints. The GC hooks the segfault handler, and simply waits for all threads to stop.

I'm not sure that's what Go does, but I'm not aware of a faster way on standard hardware.

That's basically what Go does; it folds the preemption points into the stack checks during function prologs.

Where did you learn that? (curious to learn more)

I haven't seen it written up, but it's discussed some here:

  - https://github.com/golang/go/issues/10958

hopefully in back branches too, right?

Not according to this bug. :( https://github.com/golang/go/issues/10958

Java 7 and above provide XX:MaxGCPauseMillis flag for GC conf

From Mechanical Sympathy Blog [1]

" G1 is target driven on latency –XX:MaxGCPauseMillis=<n>, default value = 200ms. The target will influence the amount of work done on each cycle on a best-efforts only basis. Setting targets in tens of milliseconds is mostly futile, and as of this writing targeting tens of milliseconds has not been a focus of G1. "

So in Java world we are talking 100s of milli seconds of worst case which is 3 order of magnitude higher than Go.

1. http://mechanical-sympathy.blogspot.com/2013/07/java-garbage...

Well, yeah. You're comparing the G1 (for server workloads) to Go's concurrent collector. There's a concurrent collector you can use in incremental mode for the JVM if you want to trade throughput for pause time, like Go does.

The HotSpot designers have (correctly, IMO) observed that server use cases typically prefer throughput to latency.

On a meta note, I wish people would focus on the actual algorithms instead of taking cited performance numbers out of context and comparing them head to head. The Go GC uses the same concurrent marking and sweeping techniques that CMS does. This change to avoid stop the world stack sweeps is something that no HotSpot collector does. But it's counterbalanced by the fact that Go's GC is not generational. Generational GCs change the balance significantly by dramatically improving throughput.

In my experience you are still getting single-to-double-digit millisecond pauses on average using CMS (even with some attention/tuning). Do you really think Hotspot can offer a GC where pauses longer than 100us are considered interesting enough to look into?

Sure, if they implement barriers for stack references like Go is. But that's a significant throughput hit.

My original intention was to ask if you think that's currently achievable. But also interesting for the future.

Well, probably not.

The question is what use cases can tolerate large throughput hits but not few msec pause times (G1 can also do pauses of a few msec in many cases, I see hundred msec pause times on very large heaps, but not desktop sized heaps).

I suspect there are very few use cases. The G1 team seems to be focusing on scaling to ever larger heaps right now, like hundreds of gigabytes in size. They're relatively uninterested in driving down pause times as Shenandoah is going to provide that for the open source world and Azul already does for the $$$ world.

Unless you have numbers to share for CMS latency in Java I have no reason to assume that they are materially different from G1. I am using CMS for my server applications and multi second latency is quite common STW pauses in CMS.

In general Oracle JDK uses order of magnitude more memory and order magnitudes higher GC latency compare to Go. IMO it is quite useful to remember when deciding on language on new projects. If hotspot people think that these numbers are not true I am sure they would let people know.

* Heap size and stack sizes × number of threads matters. JVMs can manage hundreds of OS threads with deep stacks and 100GB+ heaps.

* Go's GC is completely non-compacting while G1 is for all generations and CMS for all but the old generation.

* Shenandoah will offer lower pause times by doing concurrent compacting, at the expense of throughput.

* Azul already offers a pauseless compacting collector for hotspot

Here are Shenandoah pause numbers 11]. Max pause is 45ms which I would agree is ultra low pause for Java. Because 100s of ms pause is common for by Java server applications. Here are GC numbers for Go >= 1.6 for 100GB+ [2]. Go 1.7/1.8 have/going to have lower numbers than that.

1. http://www.slideshare.net/RedHatDevelopers/shenandoah-gc-jav...

2. https://talks.golang.org/2016/state-of-go.slide#37

A big difference is that Shenandoah still does compacting while gogc does not. That also means it can do bump-pointer allocation, i.e. it has a faster allocation path.

There must be some end user (positive) impact of memory compaction in Java. But I do not see in benchmarks where Java programs takes 2-10 times more memory and runs slower than Go.


The end user benefit is stability. A runtime that compacts the heap cannot ever die due to OOM situation caused by heap fragmentation, whereas if you don't compact then the performance of an app can degrade over time and eventually the program may terminate unexpectedly.

Those benchmarks are of limited use since the JVM has startup costs which get armortized over time. Full performance is only achieved after JIT warmup. AOT would improve that, but on openjdk that's experimental and few people use other JVMs that support AOT out of the box.

About the memory footprint: The runtime is larger, so there's a baseline cost you have to pay for the JITs. But baseline != linear cost.

If you have a long-running applications that run more than a few seconds and crunch more than a few MB of data those numbers would change.

So unless you're using the JVM for one-shot script execution those benchmarks are not representative. And if you do then there would be certain optimizations that one could apply.

> There must be some end user (positive) impact of memory compaction in Java.

Fewer CPU cycles overhead per unit of work, i.e. better throughput, which does seem to be an issue for gogc[0]. No risk of quasi-leaks in the allocator due to fragmentation. Reduced cache misses[1]

[0] https://github.com/golang/go/issues/14161 [1] http://stackoverflow.com/q/31225252/1362755

> About the memory footprint: The runtime is larger, so there's a baseline cost you have to pay for the JITs. But baseline != linear cost.

I do not think so. Here is explanation about Java memory bloat in typically used data structures.


That is true - although typed arrays will solve that in the future - but the benchmarks you cite (e.g. fannkuch-redux) still run into a memory floor created by the larger runtime.

Also, `regex-dna` will probably benefit from compact strings in java 9.

And some of the longer-running benchmarks are in fact faster on java, so I think that supports my point about warmup.

> (e.g. fannkuch-redux) still run into a memory floor

Yes, apart from reverse-complement, regex-dna, k-nucleotide, binary-trees.

> Also, `regex-dna` will probably benefit from compact strings in java 9.

Do you actually know that com.basistech will re-write their library to use compact strings?

> I think that supports my point

I think that's called cherry picking.

> Those benchmarks are of limited use since the JVM has startup costs which get armortized over time.

-- Benchmarks are of limited use.

-- Are JVM startup costs significant?


binary-trees benchmark is much faster on Java than on Go.

> faster on Java than on Go

go version go1.7 linux/amd64

OP "Sub-millisecond GC pauses in Go 1.8"

Interesting enough .NET doesn't use the "memory access" trick, instead is uses either a direct poll against a variable, or for user code can sometimes just rudely abort the running thread (if it knows it's safe to do).

See http://mattwarren.org/2016/08/08/GC-Pauses-and-Safe-Points/ for all the gory details

Hotspot does that as well. Really nifty stuff.

Hotspot worst case (non) guarantees are 100s of milli seconds not 100s of micro seconds.

Again, you're comparing the server G1 GC (tuned for throughput) to Go's GC (tuned toward extreme latency guarantees).

Is this a bad thing? You can just increase the number of front-end servers and have high throughput in a service with reliable tail latency.

This. It's a lot easier to horizontally scale things with a lean towards consistently lower operational latency. You can keep raking in the benefits and cranking up throughput without a whole lot of thought.

It's much more expensive and complex to take an erratic latency operation and bring it down by throwing on more resources. As far as I can tell, the normal design course is making sure all your major actions are either pure or idempotent allowing parallel (and redundant!) requests to be made... which is a large (worthy, but large) engineering effort, and then we're talking about scaling to 2x or more just so you can make that redundant-request thing your default behavior.

In case of Java there is also a JVM implementation that does pauseless GC (zing): https://www.azul.com/products/zing/pgc/ .

For OpemJDK/Hotspot there will be a <100ms GC in Java 9: http://openjdk.java.net/jeps/189

'Pauseless' is marketing term like those unlimited data plans from telecom/cable providers.

Here is how Zing use flags:

-XX:+UseGenPauselessGC - Generational Pauseless GC (GPGC) optimized for minimal pause times.

You're right, Zing pause times are "only" uncorrelated with heap size or usage patterns.

Another approach you can use in some cases with the JVM, which is often the simplest, is to set up the JVM so it doesn't GC (give it a lot of memory), then either just spawn a new JVM to take over, or take the machine about to run a GC out of your load-balanced pool before running a full GC, then put it back in again.

Doing the manually triggered & staggered GC trick on a pool of machines you control can give you very low latency guarantees, since no production request will ever hit a GC-ing JVM.


You can also "just" remove servers that are currently in GC from your pool, and have high throughput in a service with reliable latency.

How would you do that?

Personally found Java's GC to be a little tricky but generally awesome.

However, 100, even 200ms pauses get "lost" during network coms. But users tend to notice if a page takes 30 seconds to load rather than 1 second (differences i've seen optimising Javas GC, don't know at all how Go and Java compare).

That kind of difference would be very expensive to solve using more servers.

The biggest issue is on single deployments like desktop or embedded, hence why these kind of improvements are so relevant.

That's the key trade of these days in software imho. What kind of memory management you need.

Sever side zero memory leaks absolute requirement and "real time" responses rarely a requirement. triggering gc during init and/or shutdown of a component often enough.

Building a CNC machine - every tick is valuable, but as long as it can run a job for 2 days before crashing no one will notice if it leaks memory like a sieve when you run a calibration routine.

From my knowledge G1 is for low latency by sacrificing some throughput. I do not know why you keep hammering a point what Oracle official documents do not claim.


Here are relevant points:

1. If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of 1 second or longer are acceptable, then let the VM select the collector, or select the parallel collector with -XX:+UseParallelGC.

2. If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately 1 second, then select the concurrent collector with -XX:+UseConcMarkSweepGC or -XX:+UseG1GC.

That's relative to other Hotspot collectors. It's still generational with bump-pointer allocation and lighter write barriers than Go, so it is still geared heavily towards throughput relative to Go's GC.

Sorry, I was referring to the implementation of safepoints, not GC latency.

To stop the world, Go preempts at the next stack size check [1], which happens during function call preludes. I guess that happens often enough to be fast in practice.

[1]: https://github.com/golang/go/blob/8f81dfe8b47e975b90bb4a2f8d...

I assume this works so quickly in part because goroutines are backed by a pool of OS threads that don't block? So everybody doesn't get stuck waiting for a thread that's blocked to get around to checking the preemption flag?

Right, blocked threads are stopped in the userspace scheduler so there are no synchronization issues with just keeping them stopped.

> (...) much less do any work in that time

100 microseconds is quite a long time in CPU time for a single-core these days, and proportionally longer with multi-core, or say in GPU time. However taking into account the VM runtime environment, this wouldn't make Go's feat any less impressive.

the 100 microseconds is the length of the pause after all threads are stopped.

I like faster latency as much as the next person, but these improvements aren't free. On my own servers I notice approximately 20% of CPU time spent in GC.

I'm a C++ programmer who hates garbage collection on general principle, but I have to say: 20% of CPU time spent in memory management isn't all that unusual (especially in software that hasn't been optimized), whether or not that memory management is automatic garbage collection or manual.

It's not unusual in completely terrible code.

The difference is if the memory management is manual, you have the ability to clean it up and reduce that overhead toward 0%.

If it's a system-enforced GC, you are limited in what you can do.

Quite true, but the reality is that since Modula-3, Oberon, Eiffel and others did not get adopted at large by the industry, many got the idea that it isn't possible to have a GC for productivity and still manually manage memory, if really required.

So now we are stuck with modern languages having to bring those ideas back to life.

In the code I write where it matters the issue is allocation & deallocation time. Thus you don't do those things on the hot path in either gc'd or manual memory management environments.

Given that the overhead become 0 in either.

Is it sometimes harder to write zero alloc code in GC'd languages? Sure but its not impossible.

In Java for instance the difference in performance compared to C++ comes from memory layout ability not memory management.

> In Java for instance the difference in performance compared to C++ comes from memory layout ability not memory management.

We need more modern languages with AOT compilation, ability to allocate on the stack and use value types.

I like C++ and have spent quite a few years using it, before settling on memory safe languages for my work.

Many C++ presentations seem to forget on purpose that there are memory safe programming languages, which offer similar capabilities for controlling memory layouts, thus presenting the language as the only capable of doing it.

Modula-3 is an example, or Swift and D for something more actual.

The reality is that there aren't any straight replacements though since C++ has such mature tools, move semantics take away significant amounts of memory management pain, D without a gc is still exotic, and rust is still in its early stages.

It is a matter of which market is relevant for a developer.

There are many use cases where C++ is still used where the use case at hand actually didn't any a real need for it.

For example, on my area of work, C++ has been out of the picture since around 2006. We only use it when Java or .NET need an helping hand, which happens very seldom.

On the mobile OSes for example, C++ is only relevant as possible language to write portable code across platforms, but not so much for those that decide to focus on just one.

There Swift, Java and C# are much better options.

For those in HPC, languages like Chapel and X10 are gaining adepts.

Also as an C++ early adopter (1993) I remember being told by C developers something similar to what you are saying in regards to tool maturity.

Now around 30 years later, their compilers are written in C++.

I'm not trying to claim C++ is the only language anyone will ever need. I've tried hard to find alternatives but until the D community really goes full force on getting the garbage collection out and starts to care about the tools for the language instead of just the language, it seems like rust will be the only contender (and future C++ and maybe even jai). I wish a language called clay had taken off, it was pretty well put together as a better C.

Actually I would rather like they would improve the GC to more modern algorithms, instead of the basic one it uses.

SaferCPlusPlus[1]: C++. Memory safe. No GC. Quite fast.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus

I'll mention a significant case that doesn't have to do with allocation. Large graph-like data structures (lots of small nodes linked by reference) are effectively prohibited entirely by the GC. They make every marking phase take much too long, until whenever time the whole thing gets promoted into the long-lived generation. A mainstream runtime having such an impactful opinion about my data structures (not alloc patterns) is something I just find deeply offensive. Imagine if C++ programs stuttered because you had too many pointers!

They could have avoided that whole problem by providing an API like DontCollectThisRoot, but instead they (and a lot of programmers) chose to pretend the GC was good enough to treat like a black box.

Huh? Are you talking about a particular GC? Because every object-oriented program I've ever seen could be described as "Large graph-like data structures (lots of small nodes linked by reference)".

Any GC that walks the graph, and isn't mostly concurrent. You will know when the graph becomes complex enough, because thereafter the GC will not let you ignore it. In my experience, as few as several hundred thousand objects can become a problem. Imagine trying to write a responsive 3D modeling app with vertices, edges, and faces all bidirectionally pointing to each other. You the programmer would think very carefully before fully traversing such a data structure (much of the point of having thorough internal references is avoiding doing much traversal!), and yet the runtime helpfully does it for you, automatically, and there's nothing you can do to stop it.



FWIW, Go has value types, so there's less referencing than in Java, etc. Also worth noting that these are actually used unlike in C# which has a reference-type-first culture.

> If it's a system-enforced GC, you are limited in what you can do.

Perhaps I'm misunderstanding, but do many C programmers understand not only the current state of malloc at any given moment in their code but exactly how it works?

I think not.

A lot of the things you do in C++ to reduce memory management overhead are the same things you can do in Java, C#, and Go to reduce memory management overhead. That effort is neither special nor universal.

HLLs often have to be careful about using language features that unexpectedly create garbage, but in terms of actual management and collection it's not like ANY competent modern language is slow at it.

People often seem to neglect the fact that Java is still surprisingly fast despite spending lots of time on memory management just because many developers are so insensitive to how things alloc. Modern GC systems can manage allocs as well as deallocs, so with care from the programmer and the runtime authors you can reach the kind of performance people talk about as critical for "embedded systems" (even though in practice SO many things do not deliver on this promise in shipped products!).

> Perhaps I'm misunderstanding, but do many C programmers understand not only the current state of malloc at any given moment in their code but exactly how it works?

Good programmers understand how malloc works. What, are you kidding, or am I misunderstanding?

Performance-oriented programmers do not use malloc very much. As you say, you can also try to avoid allocations in GC'd languages. The difference is that in a language like C you are actually in control of what happens. In a language that magically makes memory things happen, you can reduce allocations, but not in a particularly precise way -- you're following heuristics, but how do you know you got everything? Okay, you reduced your GC pause time and frequency, but how do you know GC pauses aren't still going to happen? Doesn't that depend on implementation details that are out of your control?

> even though in practice SO many things do not deliver on this promise in shipped products!

But, "in practice" is the thing that actually matters. Lots and lots of stuff is great according to someone's theory.

> The difference is that in a language like C you are actually in control of what happens. In a language that magically makes memory things happen, you can reduce allocations, but not in a particularly precise way -- you're following heuristics, but how do you know you got everything?

First of all, it's not like most mallocs DON'T have heuristics they're following. Without insight into what it wants to do it is equally as opaque to how Java or the CLR manages memory.

And your behavior absolutely can and does influence how much things are allocated, deallocated, and reused. If you think that the JVM cannot be tuned to that level, you're dead wrong and I can point to numerous projects written for virtual machines that reach levels of performance that are genuinely difficult to reach no matter your approach.

> Good programmers understand how malloc works.

"Good programmers know how their GC works. What, are you kidding, or am I misunderstanding?"

> But, "in practice" is the thing that actually matters.

"In practice" Kafka is the gold standard of pushing bits through distributed systems as fast as possible. "In practice" distributed concurrency systems (that are often the speed limit of anything you want to build on more than one computer, e.g., etcd, Zookeeper, Consul) are I/O limited long before their collected nature impacts their performance.

And if we can eventually liberate ourselves from operating systems that give priviledged status to C and C++, that ecosystem will diminish further because its performance benefits come at too high a cost, and are generally oversold anyways.

>> Good programmers understand how malloc works.

> "Good programmers know how their GC works. What, are you kidding, or am I misunderstanding?"

I think you are not understanding what I am saying.

You link your allocators into your code so you know what they are. You see the source code. You know exactly what they do. If you don't like exactly what they do, you change them to something different.

A garbage-collector, in almost all language systems, is a property of the runtime system. Its behavior depends on what particular platform you are running on. Even 'minor' point updates can substantially change the performance-related behavior of your program. Thus you are not really in control.

As for your other examples, apparently you're a web programmer (?) and in my experience it's just not very easy for me to communicate with web people about issues of software quality, responsiveness, etc, because they have completely different standards of what is "acceptable" or "good" (standards that I think are absurdly low, but it is what it is).

> You link your allocators into your code so you know what they are. You see the source code. You know exactly what they do. If you don't like exactly what they do, you change them to something different.

In my experience, most C/C++ devs know what malloc/free or new/delete does, but how? They don't care as long as it works and doesn't get in their way. Sure in larger applications, the allocator/deallocator can consume quite some time - but even then it rarely is the bottleneck.

I happen to have a more hands-on experience with allocators, I had to port one a long time ago, but in C or C++, I rarely knew how the one I was using was implemented (except for the one I ported). Seeing the source code? Sorry, that's not always available and even if it is, not too accessible - not that many devs actually ever look into the glibc code...

And linking your allocator? Most of the times you just use the default-one provided by your standard library - so that happens 'automagically' without most developers realizing this. I yet have to see a modern C or C++ app that specifically has to link it's own allocator before it could actually allocate memory. Most compilers take care of this.

For most stuff I do - I like gc's. In most real-world situations, they are rarely the bottleneck, most applications are I/O bound. For most stuff, a GC's benefits outweigh it's disadvantages by a huge margin. And if the gc could be become a bottleneck, you should have been aware of that up front, and maybe avoid something using a GC, although I'm not a fan of premature optimization.

Embedded systems in most cases have memory constraints and GC languages are memory hogs. The cost of having GC is that you pay with higher memory usage for doing GC in batches thanks to which you do not have to pay for single deallocations. So this performance advantage cannot be used in embedded space because there is no free memory for it, you would need to GC all the time which would kill the performance.

> GC languages are memory hogs

This is unrelated to GC.

> The cost of having GC is that you pay with higher memory usage for doing GC in batches thanks to which you do not have to pay for single deallocations. So this performance advantage cannot be used in embedded space because there is no free memory for it, you would need to GC all the time which would kill the performance.

The same is true for C. You don't get to make frequent allocations for free in any language. You have to trade space for performance; in GC-land, the answer is tuning the collector. In Go, this is one knob: the GOGC env var.

I've read a lot of really cool papers on GCs that avoid this. The bigger problems arise from languages that take for granted that any abstraction they offer with a memory cost is okay because it offers the user no additional complexity.

For example, languages that use closures have to have very smart compilers or even innocuous functions can create implications for the working set size, which puts the allocator and deallocater under a lot more pressure.

And that's not even the most subtle problem you might run into! A common cause of memory constraints in Java 1.7 and earlier stemmed from subarrays of large arrays. Java made a fairly innocuous decision regarding methods like String.substring that ends up biting a lot of people later on, even as it is the right decision for a slightly different set of performance considerations.

Except there are quite a few vendors doing exactly that, selling Java, Oberon and .NET compilers for embedded systems.

A very well know, Aonix, now part of PTC, used to sell real time Java implementations for military scenarios like missile guidance systems.

> completely terrible code.

Not everyone is writing games...

It drastically changed my perspective on gc when I realized that it's really not much slower, it just batches the slowness.

That's true for GC vs. manual heap-based memory management, but most GC languages don't do stack allocation at all or only for primitive types and stack allocation is much, much faster than any sort of heap-based memory management.

All these GC languages (RC is GC, just in case) do allow stack allocations for any user defined type.

Mesa/Cedar, Modula-2+, Modula-3, Oberon, Oberon-2, Active Oberon, Component Pascal, Eiffel, D, Swift

Listing largely-obscure languages and repeating them multiple times doesn't do much against a claim of "most".

Java, Go, and C# all do stack allocation either implicitly via escape analysis (Java), explicitly via value types (C#), or both (Go). I don't know that this is "most" (perhaps by marketshare), but these are certainly 3 of the most popular languages in this space.

That answer is fair, and it sounds like btmorex is completely wrong. Thank you.

Well, the Algol derived ones, might be a bit obscure for those that haven't learned history of computing, but they are quite well known.

I can list other ones that are actually quite obscure.

In any case, I also had D and Swift on the list, which are quite actual.

As for the rest, I don't have anything to add to weberc2's answer.

The unfortunate part is finding good IDEs with solid debuggers and auto-completion.

Go has excellent auto-completion support, even for vim. It's debugger (delve) is also decent, though not graphical. There are not many languages with better tooling than Go, in my experience.

FYI, delve is integrated into a lot of editors, and basically works exactly like visual studio when used in VS Code (I used visual studio for C++ & C# for 13 years before moving to Go).

Yes, but that is not relevant for those that live on VI and Emacs.

"stack allocation is much, much faster than any sort of heap-based memory management"

No, it's not. For short-lived objects, at least on the JVM, allocation is a pointer bump, and collection is free, because unreachable objects are not even walked. Stack allocation doesn't beat that by much.

> stack allocation is much, much faster than any sort of heap-based memory management.

Is that true? I thought nursery allocation was usually as fast if not faster than stack allocations.

The allocation speed will be close; a bit worse since the pointer to the end of the current allocation buffer usually isn't in a register and a function call and range check is required. However the overall cost of handling that memory from start to finish is significantly higher than the stack even if it gets clobbered in the first nursery collection.

Not because it is very high, but because stack allocation/deallocation is so very simple.

Range checks can be done by the MMU, they're basically free.

That's fair. Does Go do this? Or any other somewhat mainstream language? Any thoughts on how arenas (rust) compare to gc and manual allocation for speed?

Arenas trade some granularity and flexibility for speed and fragmentation-free allocation; they're a great choice for bulk data that you can precompute the size of and want to iterate over very quickly, and they're also easy to reason about. You can do many tasks using only arenas and stack allocations, and it'll zip along very quickly. If needed, you can flag liveness to get very efficient reuse of short-lived objects. They're less ideal if you are gradually adding more data over time, and you want to retain a consistently low usage, since you end up with "stairsteps" of latency when it copies into a bigger buffer, and once you have the big buffer, it's quite complicated to shrink it down again.

malloc()-style allocation gives you precision over how much you want, and this is most interesting to a memory-constrained system trying to allocate amongst many different processes(the original Unix use-case). But willy-nilly use of malloc() and free() leaves you with lots of fragmentation, as well as a larger attack surface for memory errors. What the actual allocation algorithm does is out of your hands, too, at least if you're depending on the OS allocator(you can always go write your own and supply it with a giant heap to munch on, and this may occur when you need tuning for individual allocation scenarios).

In the right scenario, a GC won't do too much differently from a manual allocator(there are lots of optimizations that could bring the allocation to same-or-negligible runtime), but as we all know, right scenarios are something you can't always count on. A GC does, however, greatly simplify the memory management of a long-lived process since it can do nifty things like automatically compact the heap.

IME, a mix of stack, GC, and some arenas in the form of growable arrays, is absolutely fine for the needs of most applications. Quite often this last requirement creates a sticking point, though, where the language disallows value semantics for arrays of objects, and then you can no longer assume they're allocated linearly, plus the GC is taxed with additional references to trace. In those cases, if I have them I use arrays of primitive values as crude containers for an equivalent optimization. Either way, it's very annoying and creates some awful code, because those huge batches that I'd like to put in arenas also tend to be the bottleneck for the GC.

C# & Go both have stack allocation options. The JVM is supposedly getting them sometime soon.

I'm of the impression that Java has done escape analysis for a while now. They just haven't had value types, which as I understand, just introduce a semantic for stack allocation.

Actually I have seen presentations that mentioned Graal is much better at it than Hotspot.

stack allocation: 20 years in the making.

Malloc is O(n²) so it totally depend on how able you are to not do gradual allocation.

In most C libraries malloc() is O(log n), e.g. when implemented as balanced trees.

Argh. True, sorry for the brainfart.

In Java, heap allocation is a single instruction, most of the time.

Do you mean a single jvm instruction? Or?

No, he means a single CPU instruction. That's not quite fair, I don't think it actually is a single instruction, more like a few instructions in the best case and a very large number of instructions on the slow path.

The tradeoff here seems to be a more complicated write barrier, so the loss in performance here will for the most part not show up as time spent in the GC. I'm curious to see how big of an issue this will be; the only GC I've heard of with such a heavy write barrier is OCaml's, which is ameliorated by the language's disposition towards immutability.

And OCaml has a generational GC, unlike Go. So Go's throughput is going to be hit even harder.

Not going with a generational GC is a choice that runs against what has been accepted practice for decades. It may of course work out, but it's what I'd consider an experiment.

Typical Go code does not generate a lot of short-lived objects, compared with, say, Java or with typical usage of persistent data structures in functional languages. That removes the practical need for generational GC.

If this were true, then escape analysis wouldn't be so important in Go code. But it is: the lack of it is the reason why gccgo is slow in practice.

I see the importance of escspe analysis as another indication that tyical Go code does not generate a lot of short-lived objects on the heap. It is just the language does not allow to express particular stack-allocation idioms requiring the compiler to infer them.

Compare that with Java where rather sofisticated escape analysis does not help much besides allowing to return several things cheaply from a function. Typical code there just does not follow stack-like allocation patterns.

I'm also of the impression that Go's "transactional GC" is similar to a generational GC?

Sort of, but it sacrifices the main benefit of generational GC by not allowing for bump allocation in the nursery.

I thought it was compacting/moving GC, not generational one, that allowed bump-allocation.

That might imply that most allocations are happening on the heap - if your code is structured to make allocations only on the stack as much as possible, there wouldn't be that much work to do.

If someone could do this for Unity, it would change my life.

I've never been more careful to avoid any allocation as I have been in Unity. I had fewer memory concerns in C++ for crying out loud.

I would 10,000x rather have to match new with delete (big deal) than to maintain the revolting unidiomatic contortions I'm obligated to do to outwit the GC.

In modern C++ you don't even have to do that.

Well, in modern C++ you _shouldn't_ do that.

Great point. I may actually have to re-evaluate Unreal 4.

Agreed x10000. For VR we must maintain strictly zero garbage generation. Which is really damn hard in a language built on the assumption of GC.

This. Times 100,000

Seriously! Once they've fully upgraded .NET and the GC, we'll all sigh in relief.

.NET already has concurrent garbage collection: https://msdn.microsoft.com/library/ee787088(v=vs.110).aspx#b...

See the "SustainedLowLatency" mode for something very similar to what Go does (although .NET's GC, unlike Go's GC, is generational, which is a significant difference).

The problem is that Unity doesn't use a modern .NET runtime at all, rather the frozen Mono version from back when the Mono developers where still working for SuSE.

Unfortunately, Unity still uses .NET 2, and their next planned upgraded (to 4.6) is still listed as an undetermined eta. That's good to know for the future though.

They're beta-testing .NET 4.6 in Unity 5.5: https://forum.unity3d.com/threads/upgraded-mono-net-in-edito...

.NET also has the TryStartNoGCRegion API, which allows you too prevent all GC collections (for a certain while).

See http://mattwarren.org/2016/08/16/Preventing-dotNET-Garbage-C... for the full details

Does unity actually use the .NET runtime, or mono? I was under the impression it used mono. Does that apply for mono?

Depends on the platform you are targeting. Mono's GC (last I heard, they may have integrated .NET Core's by now) is relatively primitive. It was mostly developed just to have something better than Boehm.

Their experience was actually pretty instructive; with their first release of the new GC (a proficient but basic generational collector) they still weren't beating Boehm all the time, and usually weren't beating it by much. Given its constraints, Boehm's GC is impressively optimized when you run it in semi-precise mode.

Very interesting, I'm interested in how you came to this understanding. How did acquire information about this? Did they blog about it, or do you happen to follow their mailing lists, or ... ?

I read a blog post from someone at Xamarin with some test cases graphed, a bit before Xamarin started suggesting SGEN as the default. I'll see if I can find the one I'm thinking of.

Were they bump allocating in the nursery? That's where you tend to see the biggest gains from generational GC.

Yes, they even used TLABs. I believe the issue was that Xamarin was more interested in mobile performance at the time when SGEN was seeing heavy development and preparing for release, so they optimized for memory usage instead of throughput. The generational part was probably more of a hindrance than a benefit at that point in the process.

For anyone else who's in this hell, this is the best reference I know of about how you have to write C# when the GC hates your data:


tl;dr - "arrays of structs are really the only bulk storage method available in C#"

Does anyone have any write ups for GC algorithm comparisons?

E.g. Java vs Go vs C# vs Actionscript vs Javascript?

Does this mean Go could be more suitable for robotics and drones now?

For IoT there's a package for that

Wow, makes me want to develop a AAA game in Go.

What makes a game AAA is the resources (human and financial) that were used making and marketing it, not the language used to write it.

And the culture.

AAA game developers only moved from Assembly to C, Pascal when forced to do it.

A few years after they moved from C to C++ when the SDKs started to be C++ only.

Similarly they will only move from C++ when forced to do so, and they only major complaint is related to build times.

I bet most are willing to wait until C++20 for modules than switching to another language, even if some studios manage to have some hits developed in safer languages.

Very good point here The outlook of gamedevs to new tooling tends to be pessimistic because so little is actually geared towards what they work on, or want to work on.

On the other hand, Web and mobile games have lived with the consequences of a managed runtime for many years now. There are limits in how much processing power is available there, but it ultimately just diverts developer attention towards other things like a more robust backend, faster turnarounds, and other general workflow improvements independent of scene fidelity.

I get why you're saying this, but go-to-c bindings are not fun and that's how you'd have to talk to the hardware.

Only for the one implementing it.

Also on some platforms it could even be Go-to-hardware.

You really want a higher-level API in most cases, and these are almost exclusively written in C/C++. Also in Go, there's a certain overhead when calling C functions.

Can one write an HFT application in Go now?

You might be able to, but it won't be because of GC times.

In HFT systems in GC'd languages or not, you don't allocate on the hot path so GC times are immaterial.

If you are measuring response in nanoseconds, 100 microseconds is still a lot.

However, it may be good enough for games at well below 1% of your time budget for a 60 fps game assuming cache locality is good enough so you don't waste too much time fetching from main memory.

Besides games it would be interesting how good Go would now work for things like low-latency audio processing (single-digit-millisecond-latency). That's some kind of classic domain where performance is not a problem but once you miss the target timeframe you are pretty fucked up (producing and hearing glitches).

I write a fair amount of low latency code in go, but none of it is hard real time. Average throughput of a few microseconds (for my simple workloads) with spikes of milliseconds here and there is what I tend to find. Fine for a lot of things but I'd be hesitant to use it for high fi audio apps that are very sensitive to latency. Humans may only be able to notice 50ms, but if you're chaining DSP you can end up with a fair bit of variance in your processing pipeline.

Here is a demo from IBM doing audio processing with their real time GC in Java.


Already in 2012, so even if not yet there, I think Go can get there, specially since you can take care to just stack allocate and minimise GC use during the hot tub path.

.1 of a millisecond is very impressive. Does this mean Go can finally be considered a systems programming language?

Go can become a systems programming language, even if the GC haters don't think so.

It has the same features as Oberon for systems, programming which was used to build quite a few Workstation OS at Swiss Federal Institute of Technology in Zurich (ETHZ), by Niklaus Wirth.


You can read the source code here in the 2013 revised edition of the 1992 book.


The only thing missing from Go versus what Oberon offered is register access on the unsafe package, but even then can be sorted out with an extension or a few calls into Assembly.

Oberon-07, which is even more minimalist that either Go or the original Oberon is sold by Astrobe for bare metal programming on cortex M boards.


Go is already bootstraped into itself, so the writing compilers is taken care of.

It is used by Docker and Kubernetes for managing containers.

It just needs someone writing an OS with Go for its Master or PhD thesis, or even just port Oberon source code, which is freely available into Go.


Personally, I would say latency isn't the problem, and that not being able to avoid the gc makes Go automatically not a systems programming language. Not all situations can afford a gc, and really for a lot of systems programming usage (Particularly embedded systems) using a gc just obscures memory usage when you in a lot of cases the memory can be declared statically to begin with (Ensuring a maximum amount of memory usage).

That's not to say you couldn't potentially write an OS or an embedded system in Go (I mean, you can write OS's in Lisp if you really want) but I doubt it would be fun and I doubt anybody would recommend it. You definitely won't be writing idiomatic Go without a lot of extra pieces that you can't really afford in those situations.

Using Oberon, Java and .NET is surely more fun than plain C and there are a few embedded companies selling such stacks for their boards.

Most other sophisticated GCs (e.g. .NET and Java's) can obtain pause times in a similar range for generation 0 collections. So if GC is the reason you don't want to use one of those you'd probably be more interested in improvements in worst case pause times. Go is however very good at avoiding unnecessary object allocation and doesn't need a JIT so it may still be closer to what you need than those languages.

Right, and tenured generation collections are already concurrent in HotSpot and .NET. Generally only heap compaction needs to stop the world; you can disable compaction if this is a problem for you.

What the Go developers consider a "systems programming language" was exactly explained in the very first announcement/presentation of Go. They clearly outlined that it's for building systems like Google's, not operating systems.

> .1 of a millisecond is very impressive. Does this mean Go can finally be considered a systems programming language?

Can you write an OS kernel in Go ? no Go's runtime still depends on a OS. And whoever talked about Go as a system language didn't have kernels in mind, but "network infrastructure".

Of course you can, time to learn about Mesa/Cedar, Modula-2+, Modula-3, Oberon, Sing#, System C# and possibly a few others.

Here you can learn how those guys at Swiss Federal Institute of Technology in Zurich did it.


Basically the runtime is done bare metal thus becomes the OS kernel.

Oh, and Oberon wasn't just a research OS, it was actually used across the computing department by many of its employees.

Go needs a runtime, but it need not be a full OS; it could be a not too big library that gets linked to a kernel written in go.

Similarly, can you write an OS kernel in ISO C? No, you still need some assembly or runtime support. For example, ISO C doesn't have any notion of making syscalls or returning from the kernel to a caller or for setting up page tables.

Any argument why go isn't suitable for systems programming along these lines should be about how much and what kind of runtime is acceptable for a systems programming language.

A (fairly popular, I think, but certainly not universally agreed upon) argument could be that systems programming languages cannot have garbage collection because it should be possible to implement a garbage collector in them, and doing that in a language that already has one is possible but silly.

Go's garbage collector is implemented in Go.

I can't find it in the documentation, but I would think that must be implemented in ProtoGo, where ProtoGo is a Go-like language that doesn't use Go's garbage collector or a library that does (Enforcing not using anything that uses the garbage collector may be left to the programmer)

That is necessary even with a concurrent garbage collector because a garbage collector that allocates in its own heap may hang itself (propaganda allocates; gc triggered; gc allocates; gc triggered to satisfy the allocation; new gc triggered; etc.) . Or do Go's developers accept this risk and live with it?

> Enforcing not using anything that uses the garbage collector may be left to the programmer

It's left to the compiler actually. Programmers can't be trusted.

The runtime does not implicitly generate garbage (like arbitrary Go code). It is compiled with a mode that fails compilation if something can't be stack-allocated. When heap allocation is necessary, it is requested manually. However, the memory returned is garbage collected, as usual, there is no free.

Besides 4ad reply, here you can see how Oberon has its GC implemented in Oberon, as another example of a bootstrapped GC enabled systems programming language.


You can absolutely write a kernel in Go, many year ago Go used to ship with a bare metal runtime, as a demonstration...

Since the term is not defined anything can be considered a systems programming language.

The Go developers provided their definition, which is something people conveniently ignore whenever they try to be clever about Go not being a systems programming language.

To be fair, if I repurpose a term that is widely used in one way to mean something else, the confusion that ensues is kind of my fault.

So many places I worked had systems engineering department and none of them had anything to do with operating systems/ device drivers. I wonder whether it has to do with OS hackers hanging on internet together and deciding what systems would mean.

There was no repurposing, as the context was exactly explained. Only later, the detractors decontextualized the Go team's original statements.

The lack of control over threading makes it problematic as systems programming language.

Brief synopsis:

Go 1.8 is moving to a hybrid write barrier in order to eliminate stack rescanning. The hybrid write barrier is equivalent to the "double write barrier" used in the adaptation of Metronome used in the IBM real-time Java implementation.

Some benchmarks:




Like we've asked before, please don't do this here.

Sorry got carried away

If only there were generics, and no nulls, oh and strongly typed errors... regardless, this is Very impressive.

What about the way Go handles errors today makes them not "strongly typed"?

> What about the way Go handles errors today makes them not "strongly typed"?

Error as type or error as values ? the std lib promotes error as values (i.e. check equality) instead of errors as type (i.e. check the type). Go error system WAS written with errors as value in mind. There is no point having errors in place of exceptions if errors were intended to be used as types (which they are not, as said previoulsy). Basically developers are implementing their own mediocre exception system on top of Go errors.

The error as value thing made sense in C 30 years ago, it doesn't in a language created less than 10 years ago.

There are a lot of C inspired patterns in Go that make the language half modern/ half dated in strange ways. That's fine when one comes from C though, that isn't when one comes from anything remotely modern. But I guess it's why Go is successful, it's basically C with garbage collection.

error is an interface type is Go, which means an error contains both a type and a value, or "error as type" and "error as value" are both true in Go.

I said that already. And that's not the problem at end. When you test an error, do you test against a value or a type in order to know what kind of error it is ?

You can test the interface. A type is just an interface around memory, albeit more consrained.

> You can test the interface. A type is just an interface around memory, albeit more consrained.

Wow, again, that's not the problem here. Errors in the standard libraries are defined at values. There is no point testing it as interfaces, it will not give you the nature of the error, since they are all defined with fmt.Errorf . Do you understand now the problem ? the problem is being consistant across codebases between errors are values and errors as types.

That's because single error instances are much cheaper than always creating a new instance of a given type. No need to create garbage for lots of common errors. Of course you could have a dedicated type plus a singleton implementation of it in Go. But what would be the advantage? Checking if err.(type) == io.EofType does not give you more information the only checking if err == io.Eof, as long as you don't store any error instance related information in it. Which makes sense for custom errors and which Go absolutely allows you to do.

> they are all defined with fmt.Errorf

No, an error implements the error interface. It means that it can be a value of any type that implements the constraint of having an Error method.

> No, an error implements the error interface. It means that it can be a value of any type that implements the constraint of having an Error method.

It doesn't matter what a value implements if you don't test its content by type. Std lib errors are meant to be tested by value, not by type. It has nothing to do with interfaces again. When you get a std lib error in order to know what error it is you compare it to an exported value, not its type. I don't know why you keep on insisting on interfaces that's not the core of the issue here.

We do have the same problem in Java and .NET though.

I have already used libraries that catch the exceptions internally and return some kind of error value instead. :(

Neither the Java std lib or the .NET std lib do that though, they don't declare an exception as a static constant you need to compare with. Because it's (rightfully so) considered bad coding. Exceptions give you some valuable informations like stack traces. Go errors gives you nothing. They are inferior to Exceptions and a relic of C programming patterns.

The compiler provides you no help at all with them, and no syntax that makes error conditions and handling separate. It also mixes application logic and recovery logic.

Basically everything that's problematic with returning a status int in C, but all new, hip, and backed up by Rob Pike's pseudointellectual bullshit and a bunch of Silicon Valley 20somethings.

They could at least, you know, have an Either type or something. Anything?

>It also mixes application logic and recovery logic.

When did this separation become law ? What if the "application logic" requires recovery ?

>They could at least, you know, have an Either type or something

(int64, error) in func ParseInt() (int64, error) is your Either type. And checking if you got the "left or the right side of the Either" is IMHO much shorter and clearer than in Scala.



>backed up by Rob Pike's pseudointellectual bullshit and a bunch of Silicon Valley 20somethings

Why the ad hominems ?


How ecactly is having to check twice the amount of cases an improvement (note btw, that checking for Left/Right is doing it wrong)?

You only have to check 1 case in Go (`if err != nil { ... }`). What language lets you check half a case?

  (int64, error)
gives you exactly four possibilities.

Either gives you exactly the two you want.

Please feel free to enumerate them.

  (value, no error)
  (value, error)
  (no value, error)
  (no value, no error)

The convention is that if the err == nil then the value is not nil. The exceptions to this rule are very few and usually specified in the documentation. Normally you only have to check for error.

The fact that it's a convention, and not a compiler error is the entire issue at debate here.

There is no "no value" representation for int64. The only two cases are "<int64>, nil" or "<int64>, <error>".

Yes, but at least they are making the tools we may eventually have to rely on due to market pressure (Docker, K8s, ...) in Go instead of C.

Don't forget Brian Kernighan and Ken Thompson!

Typed vs untyped nil means that, in practice, all functions must return an error of type 'error', and force clients to downcast at runtime:


Of course, there are ways around this such as returning a struct, but then that's no longer compatible with the error interface.

> Of course, there are ways around this such as returning a struct, but then that's no longer compatible with the error interface.

Which your struct can easily fulfill. That's the beauty of Go interfaces.

The linked FAQ specifically talks about returning pointers to structs that fulfil the 'error' interface and why it's a bad idea.

It's not that it's a bad idea, just that because of the "nil interface" absurdity it can happen if you accidentally mix concretely typed variables and interfaces, as in the example.

This is perfectly valid and doesn't cause the nil issue:

    return someErrorStruct{}
...where someErrorStruct is a strict that implements the "error" interface. Using structs for errors is fine, and is in fact generally preferable to singletons like io.EOF, which can (by their very nature) never be extended with more data about the error.

> absurdity

There's nothing absurd about it; interfaces are a reference type. If you have a reference to a reference, then checking the "outer" reference for nil doesn't tell you anything about the nullity of the "inner" reference. The advice is just a special case of "don't needlessly use pointers-to-pointers".

Go chose to rely heavily on nil pointers, which is a design mistake (see Tony Hoare's apology for inventing it). The resultant tension between interfaces and nils is, in my opinion, an absurd side effect that cannot be explained away as anything except an ugly wart. We should have something better than this in 2016.

I say this as someone who uses Go daily for my work and mostly likes it (despite the many warts).

I don't especially love nil either, but people make too big a deal of it. The only arguments against it are hypothetical scenarios, anecdotal examples, and appeals to Hoare's authority. While there's probably a more ergonomic design, there's no substantial evidence that nil is the catastrophe it's made out to be. Using terms like "absurdity" and "catastrophe" seems overly dramatic without some decent evidence.

I don't think I'm being overly dramatic, actually. I deal with nil oversights on a daily basis during development, and I would say that it is is the main source of unexpected crashes in anything. It equally applies to other languages such as Ruby and Python.

It's exacerbated by the fact that Go chose to make things like maps, slices and channels pointers, too. It has an excellent philosophy about zero values (so you can't get nil strings, even though they are pointers internally), yet goofed when it came to something as intuitive as "var foo []string", and claimed this behaviour to be a feature. The (nil, nil) interface is just icing on a crappy cake.

The fact that such a new language as Go doesn't have a way to express missing values safely should be disappointing to developers.

By a wide margin, the biggest production issues I see are index out of bounds or key errors (regardless of language). When I'm treating `nil` as a valid state for some object, I take extra care to test its permutations, and uninitialized maps/interfaces/etc are quickly discovered during testing (every test has to initialize, so this logic is well-covered).

> The (nil, nil) interface is just icing on a crappy cake.

The same problem exists with languages without nil. For example, if you choose to do the stupid thing and return Option<Option<error>> when you only need Option<error>, then checkout the outer Option<> for None is not going to magically guarantee that the inner Option<> is not None.

> It has an excellent philosophy about zero values (so you can't get nil strings, even though they are pointers internally), yet goofed when it came to something as intuitive as "var foo []string", and claimed this behaviour to be a feature.

What are you talking about? nil slices are every bit as safe as an empty slice or an empty string (which is just an immutable empty slice).

> The fact that such a new language as Go doesn't have a way to express missing values safely should be disappointing to developers.

I agree, but I'm mildly annoyed by it, but as it's the least of all of my problems, I find words like "absurdity" to be too heavy-handed.

Nil slices do cause problems. One when it's an aliased type that satisfies an interface (again with the nil interfaces!). Another is that it leads to inconsistencies: json.Marshal(([]string)nil) returns "null", for example, not "[]". Yet another annoyance caused by nils (including nil slice) is that reflect.Value becomes a lot more byzantine than it ought to have been, requiring careful use of IsValid(), Kind checks etc. when you want to deal with something that is either an interface or something else.

As for Option: Not sure how that's an argument. And anyway, a language with real sum types will never allow going down a branch that doesn't receive a correctly matched value.

> One when it's an aliased type that satisfies an interface (again with the nil interfaces!)

It sounds like you're again confusing nil interfaces with an interface holding a nil value (in particular, there is no way to get a nil interface from a nil slice). Here's an example that demonstrates nil slices do not cause problems: https://play.golang.org/p/tSA_otqg3-

> Another is that it leads to inconsistencies: json.Marshal(([]string)nil) returns "null", for example, not "[]".

1. This is unrelated to the language; it's the behavior implemented by the JSON library

2. This behavior is correct; a nil slice is not the same as an empty slice:

        fmt.Println([]int(nil) == nil) // true
        fmt.Println([]int{} == nil)    // false
3. This behavior is consistent: https://play.golang.org/p/NjdO0boHln

> As for Option: Not sure how that's an argument.

You posited that Go's nils are bad because you can satisfy an error interface with a pointer type, and then when you return (*ConcreteType)(nil), your error handling code executes. The problem here is unrelated to nil or to interfaces; the problem is that you're nesting one binary type inside another (binary in the literal sense of having two states, either nil or not nil in this case). You would have the same problem in Rust had you done Result<Foo, Option<ConcreteError>> or (in the case of a single return value) Option<Option<ConcreteError>>. You would fix the problem in either languages by recognizing that you only have 2 states (error or not error) and removing the nesting (in Rust: Result<Foo, Error> or Option<Error>; in Go `return ConcreteType{}`).

> And anyway, a language with real sum types will never allow going down a branch that doesn't receive a correctly matched value.

I agree, and it would be nice if Go had this, but this is also not a very significant problem--this problem is blown way out of proportion.

The return type would still be the error interface. If you want more information than the error interface you can just use a type switch/assertion.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact