Hacker News new | past | comments | ask | show | jobs | submit login
Go’s march to low-latency GC (twitch.tv)
400 points by etrevino on July 6, 2016 | hide | past | favorite | 289 comments

Another very nice feature of Go is, that since 1.5, the whole runtime, including the GC is written in Go itself. So every Go developer can look at the implementation. The GC itself is surprisingly small amount of code.

I never understand this argument. Are there people who know enough about GC to understand potential issues and have an informed opinion on how to improve it, but can't read C?

"read C" is a very loose term. Yes, I can read C. But, as I am not using in daily practice, I am rusty. While as a Go programmer, I am quite trained in reading Go code. You also do need to know the behavior of one compiler (the Go one) vs. also have a good understanding what the C compiler does.

Also, Go is a much clearer and strongly typed language, so Go certainly is a much nicer implementation language than C. (If I thought C was better suited, I would be using C in the first place...)

Right, but are you also skilled in writing garbage collectors? (You may be, I'm not assuming you aren't. I'm not.) Otherwise the fact that you can read Go more easily than C doesn't make a difference, as what would you do after you've read the code if you don't understand what any of the algorithms or why they are designed that way?

I'm sure it's very good that as much of the runtime is written in Go as possible, but I think people are being too optimistic when they hope it will empower people who aren't already skilled in compilers or garbage collection to contribute.

Of course, being a Go expert does not make you a GC expert automatically. You need to be both. But why should you also be a C expert? Adding another whole field of expertise to the requirements does not sound like an improvement.

And everything written in Go also means you are dealing with just one compiler and not two, if you mix Go and C code.

I don't think you should also have to be a C expert, but I am suggesting that, in practice, there is nobody in the world who is proficient in garbage collection who does not also know C. I think if you learned everything needed to understand GC, but were never exposed to C, you would already know enough to pick the language up in a couple of hours.

Nobody is going to turn up in the Go IRC room saying that they have a great idea how to reduce pause times by improving the work-stealing between concurrent markers by using a better lock-free queue algorithm, except do'h they don't know C.

I get your point about just one compiler though - less moving parts is good.

One benefit you miss is by using Go in GC all Go tools (fmt, profiler, vet, godoc lints etc) available to writers of GC also which was not possible with C.

I think it is similar to Oracle was trying to JVM in Java known as Maxine. Now JVM contributors or potential contributors would know C++ but from OpenJDK website one motivation was to leverage amazing Java tooling to write its own VM.

I just noticed Oracle seems to have removed references to Maxine VM from Oracle website and OpenJDK website. Seems that project is no longer active.

Try searching for Graal instead.



From the link:

> Graal is a dynamic compiler written in Java that integrates with the HotSpot JVM

I am not sure if this is Maxine VM which I thought something analogous to Hotspot JVM. Or may be Maxine was similar in scope as the link you have given and not an experimental or otherwise replacement of Hotspot JVM

It's the JIT compiler written in Java that was in Maxine, updated, integrated into Hotspot so you don't need to use an experimental VM to use it.

Maxine is still alive though - but I think the only people maintaining it are academics.

Ok, makes sense. Maxine might be now independent of Oracle as links from Oracle website goes nowhere.

It isn't Maxine. I believe Maxine lives on in the Oracle-only "Substrate" project. But there is also Jikes which is a similar jvm-in-java project.

equivalents to all those tools (and more) are available in the C ecosystem.... just saying....

BTW: C is a much more mature ecosystem than that of Go.

C has an eco-system? Really? Yeah, sure I can download source code and headers and somehow use 12 different antiquated tools that strung together with duck tape and bubble gum that are also not standard on Windows let's say, to actually hopefully compile that code, and then worry about 12 other different tools each with 6 different options to be able to do something simple like "link a library", and hope that works on FreeBSD and OS X and Windows, but it won't, it never does, without spending an insane amount of time tweaking headers and m4 macros, before giving up and learning CMake, but if that's what you consider a "mature eco-system", then you have very low standards.

No one doubts that there is a rich ecosystem available for C. The thing just is, when you are working in a Go environment, what benefit would adding C with its own toolchain bring?

No need for bootstrapping step, 90% of portability taken care off for free, ability to use lower level and faster primitives than Go offers, access to tons of great C libraries.

> 90% of portability taken care off for free

You sir, have clearly never attempted to write portable C or Go code. Writing portable C code takes a serious effort. It's not hard if you know what you have to pay attention to - but 90% portability taken care of for free? That's simply not true, unless you think being portable is "it runs on a POSIX system".

Writing portable Go code - in most cases - you don't need to do anything or make only slight changes to your code, and cross-compiling is the easiest I've ever encountered.

> ability to use lower level and faster primitives than Go offers

Like what? Go's unsafe code is free from doing almost anything (except things like array bound checks that are always enforced).

Not the build system, a pain to get the libraries in Windows for a Multiplatform project that I wanna compile.

Where in Go this is very easy

There have been similar Smalltalk-in-Smalltalk, Lisp-in-Lisp, and Java-in-Java, etc. projects.

I have yet to see one that lead to better garbage collection.

> Nobody is going to [...] have a great idea how to reduce pause times by improving the work-stealing between concurrent markers by using a better lock-free queue algorithm, except do'h they don't know C.

I think you do yourself a disservice to discount that.

I'll happily tell you about places in the go runtime where we could use some smarter memory fencing instructions to build faster lock-free queues on x86_64.

I also don't write C. (Well, I'm trying to write a patch to libgit2 right now, but really, the operative word there is "trying": it's just highlighting it all the more clearly: I don't know C.)

I learned about the memory fence instructions while doing concurrent programming in java. The notion that C is the only bridge we can cross -- or the right bridge to cross -- to get to assembly (or any other abstraction layers necessary for high performance engineering) is absurd. We can put it to rest now.

>Of course, being a Go expert does not make you a GC expert automatically. You need to be both. But why should you also be a C expert?

You shouldn't, but historically and statistically GC experts and compiler experts are also C experts. And it's not like "Go is written in Go" that gonna change that (we've had languages written in themselves for half a century and still most compilers are written in C/C++).

I'd argue that Garbage Collection and the Go language are mutually exclusive.

I'd argue that the term ' mutually exclusive' means that one thing cannot coexist in the presence of another. So... no.

What do you mean by that? Obviously Go has a GC.

I can read both Go and C, but not in the style that is preferred by core Go runtime developers. I struggle a lot to read the code when all important data structures are one-letter names - G, P, M, etc. I understand not wanting Obj-C style identifiers, but single-letter ones?

Go runtime code reads like `x := g.b(a.C)` and you have to do quite a bit of manual cross-referencing of variables and identifiers to even get a vague idea of what is going on. It obviously somehow works for them.

You don't need to be an expert in garbage collectors to get value from reading how your particular garbage collector works.

You don't need to be, but if you can't read C, I'm not sure how much extra value you are going to get from reading how a particular garbage collector works.

You may use Go all day every day, but that doesn't mean you have the skill to implement or extend a GC. If you have experience in that area you almost certainly know C like the back of your hand.

Let's pretend this is true today. What about the future? Is there any benefit to restricting the pool of people working on the GC?

Are you really restricting the pool though? Most devs can't write or efficiently develop a GC. Learning a language is easy compared to writing a reasonably performant and correct GC. Anyone in this area worth their salt can pick up C if they don't know it already and, again, they probably already do if they have experience at this level.

I've been hearing 'C is dead' for fifteen years now, but it hasn't gone anywhere.

We've had compilers being written in their own language for half a century, but it hasn't changed the fast that most (and all the succesful ones with millions of users) are written in C/C++.

I've frequently wanted to understand the actual workings of the JVM garbage collector but I get bogged down reading the C/C++. If you want to know exactly how a program works then reading the source is a great way to do it. Someone who is not a GC expert could become at least GC proficient by reading the code that does say the object tree walking. I know that having the language implementation source available has been useful to me in Java. I go to the implementation of BigDecimal or ArrayList to really see how things work. The same can be true of compilers and runtimes. It's much easier if you are not context switching or required to know different languages.

Just check Maxime or JikesRVM, two examples of meta-circular JVMs.

It doesn't seem like an argument to me. It just seems like an interesting fact about Go. I don't think the post you are responding to is trying to say anything about GC so I've no idea what your comment is getting at. I'm yet to use Go but intuitively it seems like a good thing that the runtime is written in Go. It presumably reduces context switches if debugging an issue that seems to come from the runtime, regardless of if that is a GC related issue.

One thing that is very helpful is that you can navigate the code in your editor using the exceptionally effective code navigation tool for Go. With the tool one can very easily and precisely jump to the definitions and references without building an index beforehand.

Excuse my obvious ignorance, but how is the GC written in Go? The GC in Go is not optional, right? Does the GC use GC? Turtles all the way down?

In general, memory allocated by the Go runtime is garbage collected, yes. When the garbage collector runs, there isn't really any difference between user and runtime goroutines anymore. But the garbage collector itself doesn't created garbage implicitly.

Go is a language that does not expose stack vs. heap allocation as a primitive to the programmer. Memory is allocated in the best place possible, preferably on the stack, but if that is not possible it is allocated on the heap. But in the runtime we need to control the generation of garbage, so runtime code is compiled with a switch that forbids implicit heap allocations (code won't compile if it requires transparent heap allocations). Heap memory is allocated by calling a function like runtime.mallocgc. However, this memory is garbage collected just like everything else (e.g. there is no free).

How does one implement malloc in C? (Edit: ignore this. As pointed out in the responses, this actually different since GC calls are inserted by the compiler)

Unsafe code, direct syscalls, using only a subset of language features, and coupling to the compiler (both for generating data used by the GC as well as inserting calls into the runtime in appropriate places).

None of this would really be any different if implemented in C. C is clearly an unsafe language, the syscalls would still be there, as would the coupling to the compiler. The difference is that you have to have a fast way to call from Go into C. With Go this is unnecessary.

Of course if you're really curious, you can always check out the source :)

There is a pretty big difference between these two cases.

In C, calls to malloc() are explicit. You implement malloc() in C by not calling malloc().

In GC'd languages, the language runtime calls the garbage collector implicitly. So you need some more clever way of ensuring that these implicit calls do not occur. You also need to ensure that no garbage is created that will leak without a GC.

> How does one implement malloc in C?

It's unrelated this one. C the language does not depend on malloc being present, whereas the GC is part of the language in Go.

The important thing for the grandparent comment is that Go is not interpreted, but compiled. Thus when the Go compiler (i) is being compiled by another Go compiler (ii), the (i)'s own GC code is not utilised, but the already-compiled (ii)'s one is used. After that it's all machine code.

One of the signs of maturity in a language is when it reaches the point that it can implement itself. C wasn't written in C originally either.

I can't speak for go specifically but in some languages you can use a manually allocated subset of the language to write the gc.

> Does the GC use GC?

Think of the Go runtime as like the kernel of an operating system. It doesn't have to follow the rules of a user-land process.

It's not yet idiomatic or good Go code now, though, is it?

IIRC the initial state (1.5) was mostly machine translated code from C to Go.

The runtime is written by people, not machines.

The compilers and low-level toolchain libraries that encode machine instructions have been machine-translated.

The runtime is an example of very good low-level Go code that breaks safety rules. However, it is not a good example of user-level Go code. Note that even though the Go code in the runtime heavily uses unsafe, the runtime is much safer and more stable (we find less bugs) then when it was written in C.

The main thing is, that they machine translated the Go compiler itself from C to Go, so the compiler had not so great Go code. The rest of the runtime probably was already human-written in Go before that. And of course, anything which has been under development since then is human written, like the new SSA-based compiler.

    The rest of the runtime probably was already human-written in Go
find -name '*.X' | xargs wc -l in src/runtime says

Go 1.4:

    21701 .c  33348 .go 19160 .s
Go 1.7b2:

    3340 .c 77597 .go 33827 .s
The assembly growth seems to be attributable to the increased number of supported platforms

You should exclude runtime/cgo, since it's not relevant.

There's also an increased number of optimized code-paths, like for instance some crypto code in the TLS stacks. The natural way to optimize Go code, for instance to benefit from advanced CPU instructions in modern architectures, is to fallback to assembly.

> since 1.5, the whole runtime is written in Go itself

The parser was written in Yacc (with C code generated) until version 1.6. I'm wondering if there's any other parts of Go yet to be converted to Go.

I have to wonder - when you're digging down into this level of complexity in order to discover issues with the language you're using, wouldn't something like C be better? IRC isn't a very hard protocol and you know the language won't be getting in your way if you're using C.

> wouldn't something like C be better?

Jim is an intermediate-level coder, by and far the average guy that you are going to get. He writes his IRC server in C. It performs acceptably and can be scaled horizontally. There are a few threading bugs and exploits (buffer overflows etc.).

Sally is an advanced coder, it took a year of recruitment to find her. She also writes her server in C. It's blazingly fast. Virtually nobody else understands how it works. She's a human, so it's still littered with the same types of bugs that Jim's server has.

Jack is at the same level as Jim. He starts off in Go 1.4. While his server is nowhere as fast as Sally's, its much faster than Jim's. Race conditions and exploits tend toward zero. Everyone on the team can approach the code and maintain it.

Go 1.6 is installed on prod and suddenly Jack's server is now negligibly slower than Sally's. Jim notices this and has to spend a few weeks on his to catch it up. Sally is stuck debugging a race condition that occurs once a month. Jack is adding more emoticons, more features and decommissioning servers in the cluster.

Edit: IRC is a simple problem and that begets a simple solution. While C may be significantly simpler than C++, Go requires far less cognitive effort: it is actually simpler than C.

"Simpler" obviously didn't happen here. They had to debug the garbage collector, which included looking at traces that went into the OS. They effectively debugged a Go and C mix.

I'm afraid the truth of these things is that if you try to squeeze maximum performance out of some of these more sophisticated languages, you have to be able to understand and debug the runtimes that come with them. Not that many people are up to that, which means the hurdle can be higher.

I'm not implying that C is objectively inferior. Quite the contrary: there are tools that suite a specific problem. Maybe Twitch did spend money tuning Go, maybe that was a waste. Point is: they go a free performance boost thanks to a language the prioritizes what you want to do instead of how you do it.

I'm a big believer in early performance optimization, premature if you will. Though, ultimately, what I can crack out in a day with C# is worth months of C. Iterate, even with languages. If, after profiling, you find that your expressive (as opposed to explicit) language is wanting it's time to iterate into a different language. Get something competent out of the door, spend more money on it only if you have to. Right now our socket library consumes less than 1% of the CPU (GC and all), until everything else catches up there is no benefit to getting any closer to the metal.

I find it hard to believe Sally is so advanced if she writes code that others have difficultly reading.

It shouldn't take a year to find someone who writes unmaintainable soup.

If you had me on an IRC server that had to service the numbers that Twitch is talking about; I would use every single server coding practice I am aware of. Someone else comes along. Maybe his first refactor would be to remove the buffer pooling. Maybe he'd change a data structure specifically designed to prevent false sharing, cutting throughput.

Good code doesn't always mean approachable code, writing a decent socket server in C assumes a ton of advanced knowledge.

If Sally had to write, say, a C logging library it would be a masterpiece of simplicity. These days your code has an audience, and those audiences can vary quite greatly.

I consider myself an "advanced coder", whatever that means and so I know that an IRC server is likely to be IO bound. I also know to avoid C, especially multithreaded C. Not because I would be unable to write correct multithreaded C, but because I probably don't have to to solve the problem adequately.

My experience means I know to pick my battles.

Disclaimer: despite all my praises for Go, I don't actually use it for real problems. Would I like to use Go? Definitely! Is there a problem that I'm trying to solve that needs Go? Certainly not. In many cases I'd objectively pick Rust over Go, but when it comes to threading primitives, Go is somewhat unmatched.

> pick my battles

Exactly my point. A good coder will choose the tool that expresses the solution correctly. C is a very good choice, it always will be, there are sometimes better choices. Ultimately it seems as though we are in agreement; cheers!

Go's threading primitives are not unmatched. You can do the same in Rust [1], but the gold standard in my eyes is STM in Haskell which allows you to block on arbitrary conditions.

[1] https://doc.rust-lang.org/std/macro.select!.html#examples

OT: why is an IRC server I/O bound? That really piqued my interest.

I've never made an IRC server but I share a similar feel for I/O (Network) being the limiting factor. Have one user write a message and then you need to send out potentially hundreds of thousands of messages. (At twitch scale)

The actual CPU computation going on per event is minimal (process maybe a few kb of text), and if we're only dealing with text, probably not memory (capacity or throughput) bound and certainly not disk bound.

Correct, fast, and maintainable C code is a very tall order. Good luck finding ANYONE who can do that consistently for a reasonable paycheck.

Huh, it's almost as if the foundation of all the devices we use on a daily basis are written in some language that apparently isn't possible to write correctly.

They are mostly reliable (though probably still full of security holes) despite C—made possible by the millions of person-hours thrown at the problem of maintaining large C codebases.

Back then, we had no choice. We can do better.

Essentially all but the most used code-paths with be bug riddled. This paper [1] on finding bugs in C compilers by fuzzing is an interesting case-study.

[1] http://lambda-the-ultimate.org/node/4241

Server programming isn't even that hard. You could do it in C all day. (Just have good sandboxing for when someone finds an exploit...)

Millions of people have been writing C for games, firmware, computer systems without Internet since the 80s on, and they didn't even get software updates. If your server crashes you get a coredump and you can update anytime you want.

s/isn't possible to write correctly/is difficult to write correctly cheaply in non-trivial applications/

and you pretty much have it. Building higher-level abstractions has historically been a good thing, unless your day job is punching in opcodes.

I was more poking fun at the hyperbole that it's not possible to hire anyone who writes C at a reasonable rate.

That's basically true.

Apparently it didn't have to be C or blazing fast.

I think this is why 'advanced' is used of 'better'. :)

I think that's true, but I don't like that it's true. What does that say about programming as a profession? Yuck.

Reading this causes me to experience déjà vu; years and years of reading stories and watching presentations about someone struggling with GC in the JVM. It's happening all over again with Go. The same 'discoveries', the same trade-offs, the same discussions about hardware resources, the same 'concurrent mark and sweep', the same 'more to do' conclusions. You could replace every occurrence of 'Go' with 'Java' and it would probably go undiscovered.

Maybe it's all worth it and this is how developers are supposed to spend their time, but it's no longer interesting to me.

It's because GC is a bad idea that's had 30 years of good research thrown after it. Advancing GC is building a faster horse instead of stepping back and building a car. It's time to move on, and I'm thrilled to see modern languages (Swift, Rust) abandoning it and focusing on building more intelligent compilers.

The goal shouldn't ever be to make the world's best GC, it should be to create the world's best way to elide lifetimes so that developers don't have to think about memory management. GC shouldn't be a goal, it's a technique for solving a problem, one of many that we should explore.

Seems like the opposite surely? We should be developing languages that more succinctly address the problems we humans are trying to solve not book keeping for the computer's hardware (that should be the computers job!).

I think you missed the bit where he said "more intelligent compiler". The compiler is the bit that does the bookkeeping, only in Rust (and evidently Swift--I haven't played with it much) it's done statically, ahead of time rather than at runtime.

That said, I think Go is a much more practical language than Rust for most problems. That said, I'm still very excited about Rust.

>I think you missed the bit where he said "more intelligent compiler"

Also known as "sufficiently smart compiler": http://c2.com/cgi/wiki?SufficientlySmartCompiler

These are different things. A sufficiently smart compiler is a hypothetical compiler that could theoretically optimize a high level language so that it could be faster than some low level language. This isn't what we're talking about here--we're talking about the concrete ability of the Rust compiler to preclude certain classes of errors.

Yeah, that's exactly what I was trying to say. Rust does all the bookkeeping at compile time, Swift keeps a lot of it at run-time although the compiler can easily optimize away a lot of lifecycle stuff too when it's in scope, so I assume it either does or will.

I agree that Rust likely does not have the be-all answer to automatic memory management, though what I love about it is that they're pushing the boundaries and getting people thinking differently about memory management.

> I agree that Rust likely does not have the be-all answer to automatic memory management, though what I love about it is that they're pushing the boundaries and getting people thinking differently about memory management.

Me too. I intend to use it for more of my hobby things, but Go is currently the best fit. Eventually I imagine Rust will pick up some decent GUI libraries or at least get decent editor support (vim-go is lightyears ahead of YCM+racer) and I'll be able to afford to justify using it more.

What he said is that the goal should be that developers need not handle memory manually. GC is one technique to achieve that goal, and the one that has been the most successful so far, however we should not equate automatic memory management and garbage collection: other techniques could offer an as good or even better experience if we took the time to explore, develop, and refine them.

GC is also required for persistent data structures which makes it a must have for languages where immutable data is a fundamental strategy for handling concurrency.

Thank you! Finally someone who talks sense.

> Reading this causes me to experience déjà vu; years and years of reading stories and watching presentations about someone struggling with GC in the JVM. It's happening all over again with Go.

It's because GC is an area full of tradeoffs, and despite popular belief, the HotSpot GC is really good. In fact, I honestly don't know of any way to improve on the HotSpot GC for general-purpose use (i.e. throughput/latency balancing). HotSpot has a generational, concurrent, compacting GC; allocation takes 4 or 5 instructions (really!); the compiler has SROA to aggressively optimize out allocations where unneeded.

Look up the Azul Systems pgc. That is more or less the holy grail, but it is patented out the wazoo, and is a commercial implementation only.

Also, the jrockit jvm (which was from BEA and was purchased by Oracle) is actually quite a bit faster than hotspot and easier to introspect (lookup jrockit mission control) than hotspot. I suspect eventually they'll merge however.

> Look up the Azul Systems pgc. That is more or less the holy grail, but it is patented out the wazoo, and is a commercial implementation only.

That's what I was alluding to in the parenthetical. According to the paper, C4 trades off a significant amount of throughput for reduced latency. That's what you want for many applications, and C4 is a great advance for those apps, but throughput is very important for most workloads, so HotSpot's GC ends up yielding a good balance.

Touche, I clearly didn't fully understand your original statement. You're entirely correct in this case.

So we finally reaching the point where batch and interactive jobs are clearly separated because the very different tradeoffs they need. http://www.winestockwebdesign.com/Essays/Eternal_Mainframe.h... indeed.

It's not perfect. Azul's C4 does a lot of work in read barriers, so code that looks intuitively like it should be fast can end up causing "read storms" that bog the code down.

C4 never pauses, and that's impressive. But there's no free lunch. The work the GC would do when the app is paused is sometimes being simply done by the app threads instead.

I heard the read storm problem has been solved by Shenandoah, a GC developed by Christine Flood (who was in the original GC G1 team at Sun in 2001). It is under the RedHat umbrella and should be merged in OpenJDK [1].

Shenandoah uses a forwarding pointer in each object, adding overhead but limiting the problem only to write barriers. Here is Christine commenting on Azul vs Shenandoah [2]

From the talk: average pause is 6-7ms, max is 15ms, and the talk is one year old.

She hints at further developments in a version 2 which would make it entirely pauseless.

She has made another talk at RedHat's DevNation conference a few days ago, but they just won't put the video online arg!

[1] http://openjdk.java.net/jeps/189

[2] https://youtu.be/4d9-FQZZoVA?t=13m11s

Does Java still have a word in every object allocation for locking it? Adding a 64-bit pointer sounds terribly inefficient.

Did you know Objective-C does locks and retain counting without allocating any extra fields in objects?

On hotspot: There are two bits in the header of every object. This is enough for an object that's never been used as a contended lock, CAS operations on the header can be used to handle the locking and that's that. As soon as you actually block on it, a 'real' lock is created (you can't get around the need for a list of threads to wake up as the lock is released) and the header grows to accomodate it. The process is called 'monitor inflation'. At a later date this might be cleaned up by a 'monitor deflation'.

There's a certain amount of work that has to be done for GC, and that work is going to be done somewhere. The question is just what trade-offs you want.

Don't want compacting? You'll pay for it in allocation.

Don't want pausing? You'll pay for it in application threads.

Precisely, and this is what is so often missed in these discussions. Most of the time, when you see claims of GC silver bullets, there's some hidden downside that's being papered over. Latency wins (i.e. "max pause time" or whatnot) tend to be throughput losses. Less copying results in more fragmentation. Value types can result in more copying, reducing performance over pointer indirections through nursery allocations. And so forth.

I don't know that anyone disputes this. The discussions I participate in don't deny this; they mostly talk about whether or not the tradeoffs result in a net gain (if you sacrifice a little from the minority of cases to gain the same amount in the majority of cases, you do indeed have a net gain).

> Value types can result in more copying, reducing performance over pointer indirections through nursery allocations

Having value types means you can pass by copy, but it also means you can allocate on the stack and pass by reference--in other words, you get performant passing without involving the GC.

Nearly all the work you allude to is done in other threads - which indeed consume machine resources (CPU cycles, memory bandwidth). If your application does not burn all cores/bandwidth then the GC work is all done on the idle/spare machine resources. At the limit though, indeed you'll have to slow down the Application so that GC can play catchup - and bulk/batch stop-the-world style GC's require less overhead than the background incremental GCs Cliff

> C4 never pauses

To my knowledge this is false. AFAIK while the C4 algorithm is pauseless the C4 implementation is not. It's just that the pauses are really short.

My understanding: C4 does not pause but the JVM still does for various other reasons and a part of the work on Zing has been forcing down those pause times too.

Sorta kinda all of the above. C4 the algo has no pauses, but individual threads stop to crawl their own stacks. i.e., threads stop doing mutator work, but only 1-by-1, and only for a very short partial self-stack crawl. C4 the impl I believe now has no pauses also. HotSpot the JVM has pauses, and yes much Zing was on forcing these pause times down. Cliff

Isn't it also because Java and/or having GC makes certain patterns easy although they should be hard? At least my limited experience with Java, from writing a system which dealt with millions of integers, is that Java really wants you to use ArrayList<Integer> instead of the GC-friendly int[].

The problem with Java is that most things are a pointer, which means the GC has to deal with it. Go on the other hand allows the user to specify which things should be a value and what should be a pointer, which significantly decreases stress on the GC.

C# has something called value types, and while this helps (and Java is working on implementing something similar for Java 10) it's not as flexible as Go, where users can decide this at whim instead of specifying it in the type.

Java doesn't "want" you to use ArrayList<Integer>, that's merely more convenient if you need a dynamically sized array and don't want to do the resize yourself.

But the JVM folks are adding support for ArrayList<int> to the language, with the efficiency you'd expect from it.

Value types would hopefully get a big help in terms of getting the JVM GC better once the SDK and popular libraries full utilize it, both by reducing memory pressure and making the heap more GC friendly.

Fun stuff I've been doing with the H2O project is basically using nearly-pure Java (some Unsafe) to hold onto numbers with better efficiency than e.g. int[], and giving out an easy-enough-to-use API for writing parallel & distributed code over an Array-like API. i.e., feels "almost like an array", and "for-loops" run at memory bandwidth speeds and also parallel and distributed. Cliff

And the actual data is stored in giant byte[] (hidden behind the API), so the GC costs are near zero. Cliff

Are the number of garbage objects generated by idiomatic Go and idiomatic Java comparable?

My guess is Go implementation will produce an order of magnitude less garbage.

In Go, an array of structs (= objects) is just one object.

In Java, an array of objects is array object itself plus one object for each value in the array. Except for elementary types, like bool, int, long, etc.

I completely agree and with so much tuning required in past frameworks for their GC it makes me wonder why more don't simply adopt the C++ / Rust models of resource management.

I remember way back when people said you couldn't use the JVM for real time applications because of the GC pauses but it's been improved significantly since then and now all the same topics are coming up with GO.

Because the C++/Rust way of memory management is better for some things, but worse for others. I've worked with several different projects during my career, and not once did we require manual memory management. A GC based language was simpler for us to use, and the few times we had problems with GC, they were possible to overcome by writing better code, as is the case with any language.

This is not to say that no project benefits heavily form C++/Rust. But I would argue that for many, GC is the best trade off.

I completely agree that explicit memory management (I wouldn't call it manual) in the C++/Rust way is a cognitive overhead you don't want for a great deal of the software work - perhaps most of it.

But there are definitely projects that require explicit memory management, and it's not just games and realtime software. Often high-performance backend code in Java and Go just end up using object pools instead of reallocating objects, just as the OP described.

With Go specifically we've seen the rise of fasthttp, which just adds completely manual memory management in the 90's C++ fashion. Want to create a new request object?

req := AcquireRequest() req.DoSomething() ReleaseRequest(req)

Compare to C++98:

Request* req = new Request(); req->DoSomething(); delete req;

And now you're back at the same manual memory management problem modern C++ and Rust are striving to solve.

I'd argue that one of the biggest differences between golang and java is not technical but cultural. That is, the golang idioms and thus the std libs are quicker to reach for things like object pools and other performance "hacks". Even the std http library uses an arena in golang.

Similarly, high performance Java libraries like the Disruptor, SBE or Chronicle look very much like C code.

Personally, that doesn't bother me, as it allows you to write your hot path and your non-optimized path in the same language with the same tooling.

says the guy who has split JVMs across processes for performance and contemplated doing it per core

For what, I would assume, is a minor portion of your total lines of code.

I'm not sure where you got manual memory management from. I was strictly referring to RAII. Manual memory management is such a pain but the C++ and Rust ways are very similar with RAII.

I usually refer to everything that isn't GC as 'manual'. RAII is an abstraction on top of manual management, you still need to decide what type of pointer/lifetime the allocated object should have, making it manually managed, IMHO.

RAII of course deals with more than just memory, but in a thread about GC I assumed it was memory management you referred to.

>Maybe it's all worth it and this is how developers are supposed to spend their time, but it's no longer interesting to me.

Unless you design and implement GCs yourself, it's not supposed to be interesting to you anyway. It's just something that will benefit users of the language, not something to excite them.

If it's not supposed to be interesting, why do so many that have found themselves running up against the limits of GC in their chosen language/platform/implementation end up writing epic blog posts that represent months or years of work and/or presenting their hard one solutions at conferences? There certainly seems to be a lot of people that end up having to be very interested in solving their GC problems once their systems grow to non-trivial size, and they all seem to be relearning and resolving the same set of problems.

>If it's not supposed to be interesting, why do so many that have found themselves running up against the limits of GC in their chosen language/platform/implementation end up writing epic blog posts that represent months or years of work and/or presenting their hard one solutions at conferences?

Because they care about improving actual, existing, languages, with actual, existing, ecosystems, not doing cutting edge academic memory management research.

>and they all seem to be relearning and resolving the same set of problems.

So like architects relearn and resolve the same problems, about building bridges, skyscrapers, condos etc -- instead of designing some new structures to replace them?

Go is Java 2.0, but from Google instead of Sun. At least the syntax is a little nicer and less boilerplated.

I sincerely hope Go does not go in that direction. The 'less is more' approach is so far very strong among the Go steering committee.

The cruel irony here is that simplicity was a foundational goal and major rallying point for Java. Here's a website from 1997 describing it: http://www.cafeaulait.org/course/week1/16.html

Ignoring the last part that obsesses over the glory of OOP, replacing Java with Go in that page is... pretty spookily familiar!

Maybe it's a bit too early to judge but IMO Go has not introduced any new language complexity since its public launch.

> I sincerely hope Go does not go in that direction. The 'less is more' approach is so far very strong among the Go steering committee.

There is no "less is more" approach in Go. It's more like you can't write something really complex in Go so people use it for trivial things like servers that do almost nothing aside from de-serializing JSON. Try write a large LOB app in pure Go or a fully featured CRM. And see if you can get away with "less is more" when you need to reason about complex business rules, data validation, complex routing, mapping RDBMS data to values, and what not. "less is more" is a mirage. Go short comings will show up pretty fast.

All you're really saying is that Go is not great for web apps. I admit, it is not. So is C. I would not write a CRM in either of these languages.

At codebeat (codebeat.co) we use Go for our backend - very CPU-heavy, complex static analysis workflows. Our frontend is in Rails which is not ideal but probably the best bang for the buck for an early stage startup. This is the beauty of having many tools to choose from.

> All you're really saying is that Go is not great for web apps. I admit, it is not. So is C.

C is 30 years old, so it has an excuse. Go has none. The fact that it's extremely difficult to write a classic, complex webapp in Go is a proof that this language has serious flaws.

It is just as easy as doing it in Rust or Swift - both being thoroughly modern languages. It's more about the lack of comprehensive frameworks than the language being somehow flawed. I used Go as a backend for mobile apps and it was OK but where it really shines is the kind of workload we do when analysing source code: where you need excellent performance and low memory footprint, all that while keeping the code readable.

> It's more like you can't write something really complex in Go

Have you recently checked out the bigger projects that are currently being written in Go? You'd be surprised...

> Try write a large LOB app in pure Go or a fully featured CRM.

Woah. Have you tried doing that in C, C++ or Rust? Has anybody? Every language has it's strengths and weaknesses. Sure it's possible to do so in them - but is it a good idea? Not necessarily. I'm not going to write a database engine in Python - but we have timeseries databases being written in Go.

> Go short comings will show up pretty fast.

Every language has shortcomings. Go's major thing seen as a shortcoming is the classic "lack of generics", which arguably is true to some extend - but not it's GC. The thing is - Go's strong points have become clear long before these shortcomings you're talking about. The entire ops-space jumped on it because it solved a few problems plaguing their tools: memory overhead, slowness, dependencies, hard to make portable. Pretty much every major new project related to infrastructure is created in Go.

One of the biggest attractions of Go is it's ability to create programs that perform a lot better than the same thing written in Ruby or Python, which then again allows developers to undertake more ambitious projects.

So does that mean Go will finally have generics somewhere around version 5?

Fingers crossed for the Hindley–Milner type system.

Even maybe algebraic data types? One can keep dreaming.

Then we might be in some alternate reality talking about how Twitch could never deliver a viable service because the developers kept creating segfaults. C is the last language I would choose in a race to a viable service.

>C is the last language I would choose in a race to a viable service.

And yet, the majority of services the world relies on everyday, from OSes and drivers, to Google search, NASA code, medical devices, etc are written in C/C++.

Too bad no one can deploy linux servers because of all the segfaults.

I'm flabbergasted that people are able to post to this thread at all, what with all the segfaults their C++ based browsers must be experiencing all the time.

I experienced quite a few kernel panics in the earlier days... Generally from poor drivers, but just the same.

At scale it may be more efficient to write the system in a higher level language (saving time) and then spending some time tuning only the slowest parts, instead of building everything from the start to be highly optimized, even the parts which may not need it (investing time where it may not produce results).

And the improvements to Go that they drove will help everyone.

I happen to prefer C, but I understand why they did it the way they did it.

Better in what way? Performance wise, perhaps but that's only one aspect of why someone might pick one language over another.

Even distilling better down to just the max throughput you can get for a solution in a language vs another is hard to do as a lot also depends on how the code ends up being written and how easy you want to be able to debug that solution. You can solve this stuff in C many ways with different performance characteristics.

In some ways it's accurate to think of Go as a more convenient version of C with modern facilities like automatic memory management, concurrency primitives and data structures (i.e. maps), with the minimal level of runtime scaffolding included to support them. Interop with C is very easy, and Go is miles away (stylistically) from some of the more esoteric and abstract languages that are used these days.

Interop with C is easy to code but deceptively expensive from a machine standpoint, due to correctness guarantees when you're switching from green threads to no-green-threads. It involves interacting with the go scheduler and possibly/probably blowing your cache and TLBs.

And the automatic memory management is great, but the above commenter was saying that if you're going to extreme lengths to work around the automatic memory management, maybe you needed a non-GC language in the first place.

Which ones though? How about Rust? To me, as a C programmer, Go looked like the closest one to my taste, but not enough or blitz to switch/consider.

C is not a very nice language for concurrent programming.

If you want to hire 300 people to write reliable software in a language they don't know yet, Go is a good choice. You might also have like half a dozen people who are so deep into Go that they do the stuff in this post.

>Go is a good choice

Until you ask them to do stuff with channels -- where Go offers 100s of subtle ways to shoot yourself in the foot.

This may be anecdotal but Twitch is an example of a service that just bloody works. I've been a user for awhile and I've yet to notice any noticeable service disruptions or issues. They were also one of the largest early adopters of EmberJS, pretty sure it was well before the 1.0 release when many bugs were still being worked out and the API suffered frequent changes, so hats off to the engineering team for continued awesome work.

Twitch has a fairly high number of outages, although not all affect video playback (eg API outages). Most recently the whole site was down for about an hour from EU due to a botched CDN setup. I have a status tracker that monitors from four locations, https://twitter.com/TwitchStatus

I'll always be a little bitter about their VODpocalypse and retroactively muting streams with copyrighted music.

Software should serve people and they eradicated countless memories/achievements, eliminated a priceless historical record.

I don't mean to diminish how untenable the previous situation was, and I'm sure I'm underestimating the difficulty/cost of what they ended up doing. I appreciate their work, engineering, and use the service regularly. But it's an "Our Incredible Journey" part of their story and I don't want to let them off the narrative hook for it. They made ~$1b on this content, after all.

Maybe if you have fast internet connection. I'm on a HSDPA+ connection and Twitch is unusable, not even the VODs. Then again, Youtube and Vimeo is pretty much the only sites from which I can watch video streams smoothly.

Mind sending me details? I can be reached at tarrant@twitch.tv. Information that can help are things like who your ISP is, what your specific IP is, where you are located. Possibly a traceroute to live.twitch.tv.

We really do care about the QOS of users and are constantly working to improve service to users everywhere in the world. Sadly there are many constraints out side of our control that can cause bad service. The information our users are kind enough to provide to us can often help us identify problems and reduce issues.

I'm curious, have you tried viewing the streams or VODs through an alternate player?

I've found that I can almost always improve the smoothness of their content by using Livestreamer[0] to play it in VLC (or Kodi, more often)

[0]: http://docs.livestreamer.io

I use this on my netbook. Pretty much unwatchable otherwise

What's your location? I've found the same - I can stream HD on Youtube gaming without an issue but not anything better than Twitch Medium. Sometimes I need to downgrade to low or mobile quality to keep up. I figured it was because of my location (not EU or NA).

It doesn't have anything to do with the topic. It's quality of their network or quality of yours that is the main reason here. Not software.

The video service almost always works. The website has issues pretty regularly.

I've never successfully watched a video on Twitch. I get a black rectangle with playback controls, and when I press play, nothing happens. Disabling AdBlock doesn't help.

You have flash allowed on all domains without click to play?

I don't have Flash installed at all. Does the site require Flash?

If so, then that's news to me; other video sites that require Flash usually a) show a "you need Flash" message in the place where the video whould show, and b) don't show playback controls, because they're part of the Flash component itself. Also, I never saw any mention of Flash in any of the site's help/troubleshooting/FAQ documents.

It requires Flash, but it uses apple HLS.

I think Safari (+iOS) works without Flash, but everyone else is relegated to the Flash player.

Controls arein HTML5, just the actual video handler appears to be flash.

In the past (probably 2? years ago) the entire player including controls was part of flash.

Good to know, thanks.

Since posting my last message, I looked at the documentation on the website again, and saw that it claimed that you could use the site on iOS and Android by just using an ordinary browser. So I tried visiting it in Safari on my iPhone, and the videos worked. Then I tried using Firefox on my Kindle Fire, and the videos didn't work. But they do have a dedicated Twitch app for the Kindle, so I downloaded that. So now I have a way of watching Twitch videos. :-)

The fact that Flash is not mentioned anywhere on the site as being a requirement seems like a glaring omission.

"ordinary browser" = Chrome for Android, latest version, Safari for Mac, Safari for iOS

Firefox does not support HLS in either desktop or mobile version. It requires Flash as a last-ditch fallback on any platform that doesn't support HLS, afaik.

So how does the GC performance of Go compare to something like Java/the Hotspot JVM?

The approaches to GC are difficult to compare, and Java offers a selection of garbage collectors. Overall, the Java collectors are very sophisticated and tuned over years, so in principle are excellent. The downside is, that the Java language itself puts a lot of stress on the GC. The biggest problem is, that Java offers no "value" types beyond the builtin int, double,... So everything else has to be allocated as a separate object and pointed to via references. The GC then has to trace all these references, which takes time. While a collection of the youngest generation in Java is extremely fast, a global GC can take quite some time.

Go on the other side has structs as values, so the memory layout is much easier for the GC. Go always performs full GCs, but mostly running in parallel with the application, a GC cycle only requires a stop-the-world phase of a few milliseconds (for multi gigabyte heaps).

All these numbers of course depend a lot on what your application is doing, but overall Go seems to be doing very well with its newest iterations of the GC.

Another problem with Java is inability to return multiple values. For that one often creates a wrapper object holding the results. JVM can recognize this pattern and stack allocate those wrapper objects, but it does not happen always increasing GC pressure.

The lack of custom value types has ramifications not only for GC, but for cache behavior. Which is why there's serious work on custom value types for Java; it's the major feature planned for Java 10.

Of course, most of the old-gen GC work in G1 is also done in parallel with the application, too.

> Of course, most of the old-gen GC work in G1 is also done in parallel with the application, too.

Did you want to write concurrently? If so that would be wrong because evacuation can't be done concurrently with the application in G1, only initial marking.

I didn't say that all work is done concurrently with the app. How much work needs to be done in the STW phase is application-dependent. It is likely that if the application exhibits a transactional behavior, namely that objects are created in the beginning of a transaction and are all reclaimed at the end, there's very little need compaction required, as entire regions are likely to be completely free.

Another strong point for the JVM is the availability of alternate implementations from several vendors. Is this possible for Go say in the future?

It is certainly possible. There are already two Go implementations, the official one, and a gcc based one. And due to the fact that the whole Go implementation is available under BSD license, allows anyone without any license worries to fork a custom Go implementation.

You forgot GopherJS.

There's also llgo.

Many view this as an overall negative point, particularly for those who are tasked with running complex JVM applications without deep operational knowledge...

This observation has fed into the Go team's design philosophy; they're doing their best to minimize the "knobs" the GC has, because tuning them is inevitably a black art. As far as I know, there's still just one right now, GOGC, documented in the third paragraph of https://golang.org/pkg/runtime/ .

Yes, but HotSpot G1 is meant to be usable with only a single knob too (target pause time). Other knobs do exist, but only for unusual cases where you want to precisely control the GC's operation to work around some bad app/gc interaction, for instance. And Go lacking such knobs is probably not really a feature: it's not like the Java guys set out from the start saying "we will build a complicated and hard to configure GC". It's just that as you work through more and more customer cases, these knobs become valuable for the hard ones.

The point is not that lacking knobs is a feature. The point is that the designers are well aware of the issues and they are explicitly making it a goal that knobs should be unnecessary. (Especially since it has had some knobs off and on in the various versions, as mentioned in the article.)

This is in contrast to something we've probably all done at one point or another, which is just to add a checkbox to avoid having an argument about what the behavior should be. They're committing to having the argument out instead of "just adding knobs".

They also have a track record of, for better or worse, just refusing to add knobs and telling you to either do without or use a different language. If you've got an intensely GC-based workload, I'd consider using something other than Go. (However, bear in mind what may be an intensely GC-based workload in Java may not be in Go, since Go has value types already.)

HotSpot cares a lot about proper defaults too. I don't think that there's a significant philosophy difference between HotSpot and Go there. The philosophy difference is, as you say, that Go is opposed to adding configuration options, while HotSpot does have those options (per customers' requests).

I have seen how G1 suppose to have this one flag but I often get a bad feeling about G1 (without really using it much). It seems it reduces average GC pauses but performs really bad in the really lower (CMS) range. One of thing that looked bad to me is that originally they believed that it can completely ignore the generational hypothesis and then had to brought that back when finding the performance bad. There are also other issues like cross region links that it doesn't handle well. It seems to me that they thought their regional idea was a silver bullet and now tweaking it all over the place. It is a nice GC probably but I don't think its really the GC to end all other GC like Oracle wants it to be.

I get what they're doing but it's a false dichotomy and therefore wrong. The dichotomy being it has to be one, ultra-simple GC or what Java was doing.


It's well-known after over a decade of research and deployments in GC's that certain styles match certain workloads better. So, multiple ones should be available. This can be a small number that are largely pre-built with sane settings. What's left to tune can likewise be small: pause time, max memory, or whatever. There can also be a default as in current Go that covers 95% of apps well. The result is that specific apps or libraries if they went that far can have GC well-suited to their requirements with about one page of HTML describing what those GC's do and how to choose them.

That's what they should do. It will be easy for them and developers. Nothing like JVM mess. Still avoids one-size-fits-all: longest-running, failed concept in IT. Meanwhile, I can't wait to see someone make a HW version of their GC like I've seen in LISP and RT-Java research. IT would be badass given the current metrics. Allow whole OS to be done memory managed like A2 Bluebottle Oberon without performance penalty.

Can hardware accelerated GC be generalized enough to make it useful? Isn't that what killed previous efforts?

Previous efforts got killed because the off-brand hardware, especially the CPU's, were never as fast and/or cheap as Intel/AMD. They also required new tooling and such most of the time. This happened to LISP machines and apparently Azul's Vega's as they're pushing SW solution these days. So, that's my guess.

Most general I saw was in a Scheme CPU where the designer put the GC in the memory subsystem. The Scheme CPU would just allocate and deallocate memory. The GC tracked what was still in use on its own in concurrent fashion. Like reference counting I think. Eventually, it would delete what wasn't needed. Pretty cool stuff.

I don't see how it can be a negative. The availability of multiple vendors has given us commercial solutions tuned for particular needs.

For example Azul's C4 garbage collector which they claim is pauseless: https://www.azul.com/resources/azul-technology/azul-c4-garba... ; a pauseless GC is great if you want to tackle real-time systems. For real-time systems actually most garbage collected platforms are unsuitable.

But even more problematic is that stop-the-world latency is directly proportional to the size of the heap memory and today's mainstream garbage collectors cannot cope with more than 4 GB of heap memory without introducing serious latency that's measured in seconds. Think about that for a second - with most GC implementations you cannot have a process that can use 20 GB of RAM, which is pretty cheap these days btw. So keeping a lot of data in memory, like databases are doing, is not feasible with a garbage collector.

> For example Azul's C4 garbage collector which they claim is pauseless: https://www.azul.com/resources/azul-technology/azul-c4-garba.... ; a pauseless GC is great if you want to tackle real-time systems. For real-time systems actually most garbage collected platforms are unsuitable.

As far as I can tell Azul's collector claims to be pauseless because they use the new x86 nested page tables (https://en.wikipedia.org/wiki/Second_Level_Address_Translati...) to implement a read barrier (interesting aside: this means it should be possible to implement a read barrier on CPUs without nested page tables by moving the GC into the kernel). Here is an interesting discussion: http://stackoverflow.com/questions/4491260/explanation-of-az...

That still does not mean that C4 is necessarily real-time. You have to take a fundamentally different approach to GC to guarantee real-time bounds (see these papers on the Metronome collector: http://researcher.watson.ibm.com/researcher/files/us-bacon/B... https://www.cs.purdue.edu/homes/hosking/690M/ft_gateway.cfm....) and that comes with a restriction that ties your program's allocation rate to the scheduling of the GC. I am still skeptical about this - it is easy to imagine coming up with an adversarial allocation pattern that breaks time bound guarantees because of some detail of the GC implementation, so both the algorithm and every implementation will need proofs.

> So keeping a lot of data in memory, like databases are doing, is not feasible with a garbage collector.

It is very feasible if you do not make garbage. Either mmap some memory that the GC won't touch or pre-allocate large arrays of primitive types.

You also have Excelsior[0], which provides full Java AOT (ahead-of-time) compilation to native machine code.

[0]: https://www.excelsiorjet.com/

Is the output code faster than HotSpot after its been warmed up?

Eh, HotSpot can handle heaps of hundreds of gigabytes with pause times in the 100msec range. It takes a bit of tuning but can be done with the basic open source code.

What HotSpot are you talking about? I assume you aren't talking about the Serial, or the Parallel GC or about CMS, which are the older generation, but about G1, right?

Well, I have extensive experience with tuning G1. G1 is a good GC, capable of low latency incremental pauses.

The problem is that with a stressed process, at some point G1 still falls back to a full freeze-the-world mark-and-sweep. For 50 GB I've seen the pause last for over 2 minutes !!!

2 minutes is cute. If you stress a CMS setup hard enough that the young generation is completely full, it will allocate directly in the old generation. This of course screws the full gc heuristic totally, up to the point where the GC is started too late and you fully run out of memory. At which point the JVM drops down to a single threaded oldschool serial GC as last line of defense. On a 96GiB heap, that thing can take hours; all stuck 100% on a single cpu with even signal handling suspended. Fun times.

That said, for heaps above 32ish GiB, we still go with our tuned CMS settings and overcommit one or two additional memory modules. It's a lot cheaper than the time it takes trying to tune in G1 on a large heap with a lot of gc pressure.

Got any sources for that? I would be very interested in these tuning parameters and some explanation of what they do! :)

Cassandra works around this by pushing some of that responsibility onto the OS's disk-caching mechanism.

No it won't, I asked and got an answer from Brad Fitzpatrick:


The link you posted was about switchable GCs in the official Go runtime, which won't be there, but the question was whether there are multiple Go implementations.

Is there any data from production systems available that confirms that is an issue in most/many real world applications (the lack of value types)? From the allocation profiles I have seen in the applications I have seen most allocations in Java programs seem to be from strings, often in logging, or byte array buffers. Value types would no help here but compressed strings would.

A significant drawback of the hotspot JVM is the amount of memory required for even simple apps. At least 64Mi for the most simple, and typically much higher. A typical web app with a 1000 request threads will use something like 1.5Gi of memory (512Mi heap, 1Gi for thread stacks, classes, etc).

Golang apps tend to happily run with less than 100Mi, so are well suited as daemon processes that don't get in the way.

However if you need to support a large amount of dynamic state (> 1Gi), the hotspot GC is very difficult to beat.

Memory usage in Java can be misleading. Some versions of the JVM will happily take ALL your free RAM if it thinks it's sitting there unused because there's a RAM/CPU tradeoff in garbage collected systems: the less frequently you GC the less CPU time you burn and the faster the app runs.

If your machine actually does have gobs of free RAM, it therefore makes sense for Java to use all of it.

If your machine has gobs of free RAM you were planning on using for something else after your Java app started, well, that's something the JVM couldn't know. Some versions (on Windows?) monitor free memory and adjust down its own usage if you seem to be consuming the headroom, but on other platforms, you just have to tell Java it's got a limit and can't go beyond it.

Unless you specifically tell the JVM otherwise using the -Xmx flag.

See: https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrd...

Technically hotspot GC might do more work in same amount of time but Go's GC makes some performance guarantees like <10ms STW phase which hotspot do not claim or offers for large heaps.

HotSpot does offer that. It's basic functionality that all incremental or concurrent garbage collectors offer. You can adjust the max pause time with -XX:MaxGCPauseMillis.

That's a target. (GC ergonomics)

It won't guarantee it, just tires to size things (eden space, survivor spaces) and time things to meet its target.

But it's a fickle beast. And usually it requires a lot of tinkering with the code for it to be able to meet it. And then it's easier to disable ergonomics, set fixed sizes, and just enjoy how blazingly fast CMS is, restart the app every few weeks (CMS heap fragmentation), and try G1 with every new point release, maybe finally it beats CMS.

Yes, but that's because the Go GC doesn't compact, and nobody quotes throughput numbers. Building a slow GC that leaves the heap fragmented but has short pauses is not that difficult indeed, the tricky part starts when you wish to combine all these things together.

Of course Java has all technical bullet points checked and may be superior GC from strictly that point of view. But Go has 2 things from users' perspective upfront which Java lacks.

1. It uses about an order of magnitude less memory than Java.

2. It openly proclaims <10ms STW pauses for GC.

Go definitely does not use "an order of magnitude less memory than Java". That would mean that a Go program that uses 1GB of memory would need 10GB in Java.

Google wrote a paper comparing C++, Java, Scala, and Go and definitely did not find that (https://days2011.scala-lang.org/sites/days2011/files/ws3-1-H...).

I like Go and it has many wonderful qualities. Still, it's important to be realistic.

I think for "small-data" programs it does work out to about an order of magnitude of overhead in Java. I have ported several small Java programs to Go and I see it (like 100MB Java vs 8MB Go). One encryption program I coded multiple versions of ran 350k C vs. 1.3MB Go vs. 16MB Java.

Programs holding GBs of data in arrays would look much closer, though, I imagine, as the overhead would be dwarfed by the data itself.

Possible. But typically idiomatic Java usage patterns with collection types have huge overheads. So unless Java code is written in specially memory efficient way that memory usage gap should remain


Well depending on program. Here is what I see.


"When there's a big disparity between those Java and [Go] programs, it's mostly the default JVM memory allocation versus whatever [Go] needs."


I looked at this:


For me using Java 1.8 the number is around 1.5MB. Something similar in Go is 60KB

In my experience it does use less memory than Java. I had application written in Java which i migrated to Go and i see less memory usage.

The lack of compaction is not just a bullet point. Especially on large memory data-sets Go's GC will start to suffer and not just on the collection side of things but on the allocation side as well.

On the other hand, compaction is itself an extremely expensive operation as it means moving the blocks of memory (allocations) around in the heap.

Do you know of any real-world examples where the lack of compaction is impacting usage of Go ?

> On the other hand, compaction is itself an extremely expensive operation as it means moving the blocks of memory (allocations) around in the heap.

That's why you don't do it often.

> Do you know of any real-world examples where the lack of compaction is impacting usage of Go ?

The biggest problem with all nongenerational GCs, including Go's, is lack of bump allocation in the nursery. You really want a two-space copying collector (or a single-space one) so that allocation in the nursery can be reduced to 3 or 4 machine instructions. By allowing fragmentation in the nursery, Go pays a heavy cost in allocation performance relative to HotSpot.

>That's why you don't do it often.

You need to do it when you become too fragmented (or suffer the same potentially poor allocation performance as Go), how often that happens largely depends on what the application is doing.

>including Go's, is lack of bump allocation in the nursery.

Yes, but as I recall this is in the future roadmap for consideration/attempt.

And again, as with everything it's not a silver bullet, as you sacrifice the high cost of promotion (again expensive moving of memory) in order to have very fast allocations while the nursery isn't fragmented or full.

>Go pays a heavy cost in allocation performance relative to HotSpot.

But not the cost of compaction/promoting, which are also heavy when they need to be performed.

That said, I personally believe a bump allocator with generational copying will be a 'net win' if implemented in Go's GC, but all things considered I'd rather see some cold hard numbers confirming it.

> Yes, but as I recall this is in the future roadmap for consideration/attempt.

Not according to the transactional collector proposal. By not unconditionally tenuring young objects, it sacrifices one of the main benefits of generational GC: bump allocation in the nursery.

> And again, as with everything it's not a silver bullet, as you sacrifice the high cost of promotion (again expensive moving of memory) in order to have very fast allocations while the nursery isn't fragmented or full.

You're questioning the generational hypothesis. Generational GC was invented in 1984. Whether generational GC works is something we've had over 30 years to figure out, and the answer has consistently been a resounding "yes, it works, and works well".

> all things considered I'd rather see some cold hard numbers confirming it.

Again, we have over 30 years of experience. Generational GC is not some new research idea that we have to try to see if it works. The odds that things will be different in Go than in the myriad of other languages that preceded it are incredibly slim.

>Not according to the transactional collector proposal.

That's hardly the end all of Go GC development, also as I understand it's not even certain it will be used in a Go release as it depends on it actually showing the benefits aren't just theorethical.

>Whether generational GC works is something we've had over 30 years to figure out, and the answer has consistently been a resounding "yes, it works, and works well".

This was not about generational GC's 'working' or not, it was if it is the best solution for the typical workloads of Go applications.

Speaking as someone who works in Go full-time and really likes it: kind of the whole point of Go is that it's workloads are very much like every other language.

I figured the wide use of goroutines would be the cause for different choices in Go's GC compared to other GC's for languagues not sharing that characteristic.

From what I understand, the upcoming transactional collector is written directly with Go's goroutines in mind.

> From what I understand, the upcoming transactional collector is written directly with Go's goroutines in mind.

I'm pretty skeptical that it will produce wins over a traditional generational GC. At best the "transactional hypothesis" will roughly approximate the generational hypothesis, without the primary benefit that truly generational GC gives you, namely bump allocation in the nursery. Time will tell.

How do you mean? The 1.5 benchmarks showed Go cleaning up a 250GB heap in 5ms.

Your allocation times will go up as the heap gets fragmented and the it becomes harder to find places to put new items. This is especially true of large interconnected data sets.

Further, depending on your data access patterns you can see data access start to degrade over time as well because the memory locality is worse.

GC benchmarks are great at showing how well 1 part of memory management is behaving (ie the deallocation step) but it doesn't do much for talking about the other 2 parts, allocation and access.

That said, I use Go lang every day and the GC improvements to date have been great, especially given the kinds of memory patterns lots of the services I write have (small, short lived items that aren't really connected to each other). But there are definitely memory patterns where Hotspot will smoke the golang memory system and that doesn't begin to describe something like Zing.

Zing is not some magic. To start with it needs heavily over provisioned servers with like 64GB+ RAM recommended. And to have the pauseless GC it needs additional contingency and pause prevention memory pools on top of -Xmx memory settings.

And still I have heard the one of the best way to control GC in many trading systems where Zing might be popular is to just provision 100s of GBs of memory heap and simply restart server once trading day is over.

When I was writing trading systems on JVMs we were much more worried about allocation costs and memory access patterns than we were about GC. The former issues impact the normal latency while the latter impacted the worst case. Now you needed to think about and deal with the worst case, but as you say, making a system that doesn't GC often is pretty straight forward.

Now that I'm writing high throughput systems in go I use many of the same techniques that I did writing low latency systems on the JVM (arena allocation, memory locality, etc). This is because the other 2 parts of memory management, allocation and access, continue to be major drivers of performance even though the deallocation step is fundamentally different.

That is to say, GC times are not the only thing that matters when it comes to memory management and it is a relatively straightforward tradeoff between deallocation and allocation that the current golang GC is making.

It would be nice to quantify what the impact of this is. Go is no worse in this regard than C++ (it even uses a fork of tcmalloc for allocation) and has support for value types so there is a lot less pointer chasing than in Java.

Not sure how it could be done but having some numbers on this would be great.

Cool, thanks for the information. :)

I do not disagree with that but unless some real life example is shown of Go vs Java GC I am inclined to believe advantage is mostly theoretical.

binary-trees is Hans Boehm's benchmark of garbage collection performance (mostly throughput). Bump allocation in the nursery ends up being king here, and it's reflected in the numbers: https://benchmarksgame.alioth.debian.org/u64q/performance.ph...

From Hans Boehm's java program --

"The results are only really meaningful together with a specification of how much memory was used. It is possible to trade memory for better time performance. This benchmark should be run in a 32 MB heap, though we don't currently know how to enforce that uniformly."

For some years, the benchmarks game did show an alternative task where memory was limited -- but only for half-a-dozen language implementations.

Figuring out an appropriate max heap size for each program was too hands-on trial and error.

I am not sure in the apps where Go/Java languages are mostly targeted how many times people are implementing binary trees as their core/dominating business logic.

Header comment from Hans Boehm's original test program --

"This is no substitute for real applications. No actual application is likely to behave in exactly this way. However, this benchmark was designed to be more representative of real applications than other Java GC benchmarks of which we are aware."

Most allocations in real-world apps are nursery allocations (that's the generational hypothesis after all), so the speed of nursery allocations, which is what results in the throughput differential here, very much matters in practice.

> 2. It openly proclaims <10ms STW pauses for GC.

Java's GCs make no concrete claims because they scale from tiny to very large heaps with vastly different object populations and root set sizes.

Some java applications run with 100GB heaps or on 128-core NUMA machines with lots of threads.

10ms pause times are achievable with "modest" heap sizes (~single-digit GBs) if you have some cores to spare for a concurrent collector to do its work, well, concurrently.

If you don't have enough spare CPU time or have a larger heap or a workload without enough breathing room then it would be silly to make such guarantees.

Of course they could easily write "<10ms STW pauses. sometimes. read the fine print"

Posted earlier without the random hash in the URL https://news.ycombinator.com/item?id=12040349

That's interesting, but it would be even more interesting if the article contained some info about heap sizes, memory utilization and the number of CPU cores.

I think that not using Erlang in this particular case was a mistake. Erlang is running some of the largest chats out there, including League of Legends and WhatsApp. They would have avoided all the hassle of GC pauses, since Erlang has per-process GC collection. And scaling single OS process to their number of connections was done for Erlang machines years ago.

Hi there, I'm one of the original engineers who worked on our re-implementation of chat which ended up in Go.

We've a culture of being willing to try new things at Twitch. When our twisted-python chat system no longer met our needs of being easy to iterate on we decided to rebuild it; it was a monolith and we decided to chunk it up to reflect needs of our users and the pace at which we could develop new features. Notably we wanted to no recycle TCP connections whenever a new feature was added (which was a short coming of the twisted-python solution - along with a bunch of global state that was becoming hard to reason about). As part of this re-work we had a pub-sub portion which was super simple and we decided to try this new exciting language with a lot of promise out on it - it worked amazingly well. Over the course of another year or so we ended up rebuilding all of the components in Go.

When we first evaluated rebuilding chat we assessed a few options:

- python

- nodejs (we started with this, but random crashes and poor tooling at the time didn't work for us)

- erlang (notably could we use ejabberd as the hub of the system)

Ultimately we chose python because we knew python and we needed this to work right now. The move to go happened incrementally thereafter and was driven by:

- increase in trust

- great tooling

None of this can be pitched as "Go vs X", it is purely a tools and expediency orientated set of decisions.

> Notably we wanted to no recycle TCP connections whenever a new feature was added

So with the Go server, you're able to redeploy without closing open connections? Do you just run multiple versions in parallel and load balance over to the new version once connections close, or something else?

There are actually two (or more) different services. One that sits and talks to the users via TCP and maintains the IRC connection state and then makes back end calls to the bit that makes decisions and publishes information.

This allows us to almost never deploy changes to the first service, while frequently making changes to the second system. Of course when you do want to make changes to the first you have to reestablish all the TCP connections again, but if you engineer it correctly you can do it infrequently enough to be worthwhile.

Disclaimer: I don't actually work on the chat team, this is based upon various conversations with people on the chat team and may be incorrect in some specifics or out of date.

Yes, Dobbs captures this here. To be clear, the first re-write of the chat service was from twisted python into tornado python. In that re-write we produced a number of services which implemented the biz logic. One of those services was a TCP terminating edge server which has very little logic in it beside how to call the biz logic and send messages to connected clients. Once this was all written we converted to Go incrementally.

> We've a culture of being willing to try new things at Twitch

How is that different from NIH syndrome?

How about if you have a culture of being willing to try new things not invented here? That would be quite different from NIH syndrome.

being willing to try new things != trying things because they're new

Finding an erlang programmer available on site within 1-2 months is the hardest part of deciding to go with erlang. With go you can take a C++/python programmer and have them writing production code pretty soon, i think this is what inhibits functional programming in general, the learning curve bundled with the amount of work around prevents people jumping onboard also willingness of some employers to hire someone without a ton of exp. with erlang makes it difficult for a senior programmer to switch.

Truthfully, I think the supply of programmers with functional programming skills far outstrips the demand. I use FP languages on personal projects (most recently Clojure) all the time, but the day job is still programming in Java at a shop that is all Java, all the time.

Every time I suggest bringing a language like Scala or Clojure into the mix (where they would provide real benefits over Java), I always get the "And where will we find programmers to maintain the code you write?" line from management. The answer, of course, is that there are likely legions of programmers like me, who hack around with FP languages in their spare time but whose only 'professional' experience is in mainstream languages.

I suspect the real reason is that most management is just too risk-averse to consider using technology that isn't mainstream.

That's changing now, thankfully. Most people at the company I work for use Clojure every day.

Why don't you tell them that?

Two Erlang shops I know of never had problems finding Erlang devs. Competent devs will pick it up if they are interested in it even if they haven't done it full time before.

I did that to some extent.

> i think this is what inhibits functional programming in general

Yes, functional aspect was the harder part to learn. It wasn't the syntax, which is what most people mention.

The other part that is hard is to learn to use concurrency construct -- actors. But Go has the same problem, solving problems with goroutines and channels is just as much of impedance mismatched as using actors.

Actually, the hardest part of deciding to go with Erlang is not finding a programmer within 1-2 months, but deciding that one will need to train the programmer themselves (which obviously takes time; it took me half a year dabbling with Erlang and OTP (totally on my own) to actually start writing idiomatic code).

The trickiest bit about Erlang is not really FP - you can figure that out decently well in short order - it's OTP and processes and how all of that fits together, and how to do a good job with architecting everything.

Finding people should not be too hard. We were able to find a guy in Padova, Italy, who dove in and got started without much trouble as I was leaving.

>> With go you can take a C++/python programmer and have them writing production code pretty soon

In my experience, learning Go (and by that I mean fully grasping the ways of Go, goroutines, channels, selects, interfaces, type switches, etc) takes at least a year for someone whose background is C/Python/Ruby. Then may be it's just me.

I think for everyone this number is different. Depends on your background, years of experience, knowledge about computer science (not how to write in specific language) etc.

I am polyglot (i write in different langauges) and it took me couple of weeks to master Go.

Read "The Go programming language" book, it's really well written and it touches everything you need to know about Go (or most).

It totally took a year to really internalize goisms. But a week or so to pickup the basics, and some really deep code reviews and a bit of pair programming from people I respect, including the author of this article, helped me quickly learn most of the low hanging goisms and the a large portion of the standard library.

The time to go from python dev but never touched go to working on a go code base is measured in weeks in my experience.

It took me a good 3 months of fighting with the language before it started to click for me, and then another 3 months to really start using it properly. It sucks to always read things like "I was able to write production code on the first day!" around here.

I think that depends on the mindset of how you see computer languages. I like learning new languages, every time I encounter some new language, I have to try it out and get at least something basic working.

I'm pretty decent at C, C++, Python and Bash scripting, have participated in larger projects in Java, Perl, Pascal/Delphi and Ruby, and have toyed around with Rust, Haskell, Clojure, Angelscript, Crystal, Lua and probably a bunch more that I forget.

Go for me was a breeze, everything just clicked. It helps that it got a lot of it's inspiration from other languages I already knew pretty well. When started toying around with Haskell for example, this wasn't the case, it took me quite a while to get me up & running with the basics, and I still don't think I know basic Haskell. Go on the other hand was easy, and within a week I was diving into the stdlib sourcecode.

I've been programming Go for almost 2 years and I routinely get stuck trying to figure out the "right" way to do something in the language.

For reference, I felt comfortable in Java, Scala, C# and Perl all faster than Go.

Possibly in the past, yes. But if they're now just paying 1ms in GC every so often, the advantage is now gone. Go is generally faster than Erlang (in Go-native vs. Erlang-native code) so the system is quite possibly net outperforming what Erlang can do now. 1ms is just noise when packet latency jitter is higher than that.

That is not the point. The advantage of Erlang is not raw speed, but the sheer amount of language constructs helping you write distributed system without thinking too much about low-level stuff.

If I had to do some parallel data crunching, I would probably use Go or something similar. To write an actual system, it's much easier just to stand on the shoulders of Erlang guys instead of developing everything by hand(i.e. whole supervision tree).

Well, there's one less thing for Go programmers to worry about now.

The problem with being a language advocate is than when "competing" languages improve you start thinking it's bad news for your side. I try to avoid that. If Go improves it's good news for everyone, if only because we all benefit from stronger competition.

I do not feel like a language advocate. Go is great language for a number of appliances and its growing popularity shows it. I didn't have time yet to tinker with it myself, but I certainly intend to.

My point was that strength of Erlang/OTP was never in processing power/speed, but in designing the runtime so it actually solves most problems regarding distribution. Go, as far as I understand it, was created with different goal in mind - to enable fast and parallel processing. It does not make it better or worse, just different. What I'm saying is that solving garbage collection issue (and only partially, when we're at it) is not what makes in competitive in comparison to Erlang, because Erlang was designed with totally different goal in mind.

The advantage of language constructs helping you, but the disadvantage of finding devs to actually build with them.

I learnt Elixir myself, on the job. It's certainly doable. Actually, if I ever had to hire developers, I wouldn't care at all about their previous language experiences. The only thing you achieve asking for n years experience in y is discouraging talented people who worked on something else.

The thing is, you choose right tech for the job and then reserve some time to bring everybody up to speed. From my own experience it is much cheaper than trying to use already know tech in ways it was not designed to work.

> The advantage of language constructs helping you, but the disadvantage of finding devs to actually build with them.

The flip side of that coin is you're liable to get someone reinventing the wheel - poorly - in whatever language doesn't have all those goodies.

Yep, Greenspun's tenth rule comes to mind.

In the Erlang world, we call that Virding's rule:

"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang."

Edit: I'll add, though, that the Go people are pretty smart and seem like they're doing good things, so I wouldn't be too complacent in thinking Erlang is the only game in town. It still does get some things right that are hard to replicate in Go, though.

I would emphasize that, in many of those examples, Go wasn't a very viable choice when the apps were originally written. Twitch chose Go back at ~1.2 (2013), when Erlang might have made more sense.

Today, for companies making a similar decision now, that argument is a bit different. Go 1.6/1.7 obviously has massive improvements in the areas the article outlines. But, in Erlang camp, we have Elixir making that more enticing.

I would argue Twitch made the right choice. They will have a magnitude easier time finding devs to support a Go system over an Erlang system. And their product never suffered for it. And they are clearly a force behind making Go better, which has helped more people than just them.

This propagates a myth that the choice of language is the bottleneck in a complex program. Twitch hires engineers who are good enough not to be constrained by the difficulties of learning a particular language.

Odds are, Twitch wasn't trying to optimize over a long time horizon when they chose Go. They were once a scrappy startup, surely accumulating technical debt left and right to get product features out. Go was likely a locally optimal choice.

Twitch chat also requires heavy string processing, and that's an arena where, if I had to guess, Go has an edge over Erlang.

It's not really heavy string processing, it's all just replacing inner strings (not even Regex).

Sure. I think we're talking about different things when we say heavy. Perhaps less ambiguous phrasing would have been "frequent string manipulation."

There are likely other tradeoffs. This (GC pause times) is probably not the only criterium, nor even the most important. It's really hard to draw a conclusion based on such limited information.

Go 1.6 GC is probably faster thant Erlang GC now.

OK, having posted something defending Go in this thread, now let me exasperate everyone by going the other way. Because Erlang's GC is per Erlang-process whereas Go's GC is still OS-process-global, there really isn't a "faster than/slower than" comparison available, because their workloads are so dissimilar. When an Erlang GC runs, it may be running across a mere few hundred or thousand bytes, freezing only that one Erlang-process that was quite likely not running at the moment anyhow. Erlang also has the GC-time advantage that it doesn't have pointers, so there's no pointer-fixup penalty. (It may be a disadvantage at other times, but it's certainly an advantage at GC time.)

Golang's more recent async GC changes begin to resemble Erlang's per-process GC in how they would affect overall system performance.

When people talk about Go's GC freezes, they're talking about the spinup/spindown time before the async GC kicks in. That part is incomparable to Erlang, but its a part which has gotten much faster recently, specifically through virtue of becoming smaller.

> Golang's more recent async GC changes begin to resemble Erlang's per-process GC in how they would affect overall system performance.

They resemble generational GC more than anything. Generational GC has some of the advantages of Erlang (though I think the traditional HotSpot generational GC will end up working better than the one Go is going with) in the minor collections, but not in the major collections.

Go is adding to per-process heaps (there will still be a global heap).

Further complicating things are large binaries, which are handled in yet another way in Erlang. That might be an issue for some situations, and not at all a concern in others.

Do you have any benchmarks that running small GC collection ( per process ) vs one big Heap is faster?

Define "faster".

I actually have experience that looping over all Erlang processes and running a GC on each of them is definitely human-clock-time orders of magnitude slower than a Go garbage collection across a similar set of data. But who cares? First of all, that was a bit of a desperation play on my part anyhow, run for diagnostic purposes in the REPL, not an operation you do all the time, and secondly, only one process at time was frozen then anyhow, so I didn't care that it took about 10 seconds. It didn't take my service down.

Which was my point in the first place, that "faster" and "slower" don't really apply here, because what they're doing is so different from each other. There's too many different possible definitions of faster. And you have to be careful to use one that matters to your code, not just an artificial benchmark that shows your preferred choice in the better light.

(For those who may be curious, the problem that led me to that play was some now long-fixed issues with large binaries.)

Also important is the fact that short lived processes usually don't every need a GC on their heap. When they are finished they just free the full heap. For a web service this is very useful.

Stop the world vs stop one country while the rest of the world carries on.

According to the article they chose it because of "Its simplicity, safety, performance, and readability" perhaps it has more in some of those than Erlang does/did... ?

Isn't Facebook's chat also powered by Erlang?

The Erlang components of Facebook Chat were replaced with C++ several years ago.

The two main reasons I remember are A) it was hard to maintain a group of engineers with acceptable competency in Erlang over time and B) the C++ code offered faster and more consistent performance albeit with somewhat less scalability in terms of sessions per host. We just added X% more servers to the channel pools and were happy to have the chat services in a language where more FB engineers could contribute. There's been a lot more changes in the architecture than just moving to C++ though, so it's hard to do a direct comparison between the products.

This doesn't take anything away from WhatsApp though, who has built a strong product and infrastructure on top of Erlang.

It definitely was when it originally launched. After that I have no idea.

I'm glad they did. Sounds like they have helped push the development of Go along which is good for everyone.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact