Also, Go is a much clearer and strongly typed language, so Go certainly is a much nicer implementation language than C. (If I thought C was better suited, I would be using C in the first place...)
I'm sure it's very good that as much of the runtime is written in Go as possible, but I think people are being too optimistic when they hope it will empower people who aren't already skilled in compilers or garbage collection to contribute.
And everything written in Go also means you are dealing with just one compiler and not two, if you mix Go and C code.
Nobody is going to turn up in the Go IRC room saying that they have a great idea how to reduce pause times by improving the work-stealing between concurrent markers by using a better lock-free queue algorithm, except do'h they don't know C.
I get your point about just one compiler though - less moving parts is good.
I think it is similar to Oracle was trying to JVM in Java known as Maxine. Now JVM contributors or potential contributors would know C++ but from OpenJDK website one motivation was to leverage amazing Java tooling to write its own VM.
I just noticed Oracle seems to have removed references to Maxine VM from Oracle website and OpenJDK website. Seems that project is no longer active.
From the link:
> Graal is a dynamic compiler written in Java that integrates with the HotSpot JVM
I am not sure if this is Maxine VM which I thought something analogous to Hotspot JVM. Or may be Maxine was similar in scope as the link you have given and not an experimental or otherwise replacement of Hotspot JVM
Maxine is still alive though - but I think the only people maintaining it are academics.
BTW: C is a much more mature ecosystem than that of Go.
You sir, have clearly never attempted to write portable C or Go code. Writing portable C code takes a serious effort. It's not hard if you know what you have to pay attention to - but 90% portability taken care of for free? That's simply not true, unless you think being portable is "it runs on a POSIX system".
Writing portable Go code - in most cases - you don't need to do anything or make only slight changes to your code, and cross-compiling is the easiest I've ever encountered.
Like what? Go's unsafe code is free from doing almost anything (except things like array bound checks that are always enforced).
Where in Go this is very easy
I have yet to see one that lead to better garbage collection.
I think you do yourself a disservice to discount that.
I'll happily tell you about places in the go runtime where we could use some smarter memory fencing instructions to build faster lock-free queues on x86_64.
I also don't write C. (Well, I'm trying to write a patch to libgit2 right now, but really, the operative word there is "trying": it's just highlighting it all the more clearly: I don't know C.)
I learned about the memory fence instructions while doing concurrent programming in java. The notion that C is the only bridge we can cross -- or the right bridge to cross -- to get to assembly (or any other abstraction layers necessary for high performance engineering) is absurd. We can put it to rest now.
You shouldn't, but historically and statistically GC experts and compiler experts are also C experts. And it's not like "Go is written in Go" that gonna change that (we've had languages written in themselves for half a century and still most compilers are written in C/C++).
Go runtime code reads like `x := g.b(a.C)` and you have to do quite a bit of manual cross-referencing of variables and identifiers to even get a vague idea of what is going on. It obviously somehow works for them.
I've been hearing 'C is dead' for fifteen years now, but it hasn't gone anywhere.
Go is a language that does not expose stack vs. heap allocation as a primitive to the programmer. Memory is allocated in the best place possible, preferably on the stack, but if that is not possible it is allocated on the heap. But in the runtime we need to control the generation of garbage, so runtime code is compiled with a switch that forbids implicit heap allocations (code won't compile if it requires transparent heap allocations). Heap memory is allocated by calling a function like runtime.mallocgc. However, this memory is garbage collected just like everything else (e.g. there is no free).
Unsafe code, direct syscalls, using only a subset of language features, and coupling to the compiler (both for generating data used by the GC as well as inserting calls into the runtime in appropriate places).
None of this would really be any different if implemented in C. C is clearly an unsafe language, the syscalls would still be there, as would the coupling to the compiler. The difference is that you have to have a fast way to call from Go into C. With Go this is unnecessary.
Of course if you're really curious, you can always check out the source :)
In C, calls to malloc() are explicit. You implement malloc() in C by not calling malloc().
In GC'd languages, the language runtime calls the garbage collector implicitly. So you need some more clever way of ensuring that these implicit calls do not occur. You also need to ensure that no garbage is created that will leak without a GC.
It's unrelated this one. C the language does not depend on malloc being present, whereas the GC is part of the language in Go.
The important thing for the grandparent comment is that Go is not interpreted, but compiled. Thus when the Go compiler (i) is being compiled by another Go compiler (ii), the (i)'s own GC code is not utilised, but the already-compiled (ii)'s one is used. After that it's all machine code.
Think of the Go runtime as like the kernel of an operating system. It doesn't have to follow the rules of a user-land process.
IIRC the initial state (1.5) was mostly machine translated code from C to Go.
The compilers and low-level toolchain libraries that encode machine instructions have been machine-translated.
The runtime is an example of very good low-level Go code that breaks safety rules. However, it is not a good example of user-level Go code. Note that even though the Go code in the runtime heavily uses unsafe, the runtime is much safer and more stable (we find less bugs) then when it was written in C.
The rest of the runtime probably was already human-written in Go
21701 .c 33348 .go 19160 .s
3340 .c 77597 .go 33827 .s
The parser was written in Yacc (with C code generated) until version 1.6. I'm wondering if there's any other parts of Go yet to be converted to Go.
Jim is an intermediate-level coder, by and far the average guy that you are going to get. He writes his IRC server in C. It performs acceptably and can be scaled horizontally. There are a few threading bugs and exploits (buffer overflows etc.).
Sally is an advanced coder, it took a year of recruitment to find her. She also writes her server in C. It's blazingly fast. Virtually nobody else understands how it works. She's a human, so it's still littered with the same types of bugs that Jim's server has.
Jack is at the same level as Jim. He starts off in Go 1.4. While his server is nowhere as fast as Sally's, its much faster than Jim's. Race conditions and exploits tend toward zero. Everyone on the team can approach the code and maintain it.
Go 1.6 is installed on prod and suddenly Jack's server is now negligibly slower than Sally's. Jim notices this and has to spend a few weeks on his to catch it up. Sally is stuck debugging a race condition that occurs once a month. Jack is adding more emoticons, more features and decommissioning servers in the cluster.
Edit: IRC is a simple problem and that begets a simple solution. While C may be significantly simpler than C++, Go requires far less cognitive effort: it is actually simpler than C.
I'm afraid the truth of these things is that if you try to squeeze maximum performance out of some of these more sophisticated languages, you have to be able to understand and debug the runtimes that come with them. Not that many people are up to that, which means the hurdle can be higher.
I'm a big believer in early performance optimization, premature if you will. Though, ultimately, what I can crack out in a day with C# is worth months of C. Iterate, even with languages. If, after profiling, you find that your expressive (as opposed to explicit) language is wanting it's time to iterate into a different language. Get something competent out of the door, spend more money on it only if you have to. Right now our socket library consumes less than 1% of the CPU (GC and all), until everything else catches up there is no benefit to getting any closer to the metal.
It shouldn't take a year to find someone who writes unmaintainable soup.
Good code doesn't always mean approachable code, writing a decent socket server in C assumes a ton of advanced knowledge.
If Sally had to write, say, a C logging library it would be a masterpiece of simplicity. These days your code has an audience, and those audiences can vary quite greatly.
My experience means I know to pick my battles.
> pick my battles
Exactly my point. A good coder will choose the tool that expresses the solution correctly. C is a very good choice, it always will be, there are sometimes better choices. Ultimately it seems as though we are in agreement; cheers!
The actual CPU computation going on per event is minimal (process maybe a few kb of text), and if we're only dealing with text, probably not memory (capacity or throughput) bound and certainly not disk bound.
Back then, we had no choice. We can do better.
Millions of people have been writing C for games, firmware, computer systems without Internet since the 80s on, and they didn't even get software updates. If your server crashes you get a coredump and you can update anytime you want.
and you pretty much have it. Building higher-level abstractions has historically been a good thing, unless your day job is punching in opcodes.
Maybe it's all worth it and this is how developers are supposed to spend their time, but it's no longer interesting to me.
The goal shouldn't ever be to make the world's best GC, it should be to create the world's best way to elide lifetimes so that developers don't have to think about memory management. GC shouldn't be a goal, it's a technique for solving a problem, one of many that we should explore.
That said, I think Go is a much more practical language than Rust for most problems. That said, I'm still very excited about Rust.
Also known as "sufficiently smart compiler":
I agree that Rust likely does not have the be-all answer to automatic memory management, though what I love about it is that they're pushing the boundaries and getting people thinking differently about memory management.
Me too. I intend to use it for more of my hobby things, but Go is currently the best fit. Eventually I imagine Rust will pick up some decent GUI libraries or at least get decent editor support (vim-go is lightyears ahead of YCM+racer) and I'll be able to afford to justify using it more.
It's because GC is an area full of tradeoffs, and despite popular belief, the HotSpot GC is really good. In fact, I honestly don't know of any way to improve on the HotSpot GC for general-purpose use (i.e. throughput/latency balancing). HotSpot has a generational, concurrent, compacting GC; allocation takes 4 or 5 instructions (really!); the compiler has SROA to aggressively optimize out allocations where unneeded.
Also, the jrockit jvm (which was from BEA and was purchased by Oracle) is actually quite a bit faster than hotspot and easier to introspect (lookup jrockit mission control) than hotspot. I suspect eventually they'll merge however.
That's what I was alluding to in the parenthetical. According to the paper, C4 trades off a significant amount of throughput for reduced latency. That's what you want for many applications, and C4 is a great advance for those apps, but throughput is very important for most workloads, so HotSpot's GC ends up yielding a good balance.
C4 never pauses, and that's impressive. But there's no free lunch. The work the GC would do when the app is paused is sometimes being simply done by the app threads instead.
Shenandoah uses a forwarding pointer in each object, adding overhead but limiting the problem only to write barriers. Here is Christine commenting on Azul vs Shenandoah 
From the talk: average pause is 6-7ms, max is 15ms, and the talk is one year old.
She hints at further developments in a version 2 which would make it entirely pauseless.
She has made another talk at RedHat's DevNation conference a few days ago, but they just won't put the video online arg!
Did you know Objective-C does locks and retain counting without allocating any extra fields in objects?
Don't want compacting? You'll pay for it in allocation.
Don't want pausing? You'll pay for it in application threads.
> Value types can result in more copying, reducing performance over pointer indirections through nursery allocations
Having value types means you can pass by copy, but it also means you can allocate on the stack and pass by reference--in other words, you get performant passing without involving the GC.
To my knowledge this is false. AFAIK while the C4 algorithm is pauseless the C4 implementation is not. It's just that the pauses are really short.
C# has something called value types, and while this helps (and Java is working on implementing something similar for Java 10) it's not as flexible as Go, where users can decide this at whim instead of specifying it in the type.
But the JVM folks are adding support for ArrayList<int> to the language, with the efficiency you'd expect from it.
My guess is Go implementation will produce an order of magnitude less garbage.
In Go, an array of structs (= objects) is just one object.
In Java, an array of objects is array object itself plus one object for each value in the array. Except for elementary types, like bool, int, long, etc.
I remember way back when people said you couldn't use the JVM for real time applications because of the GC pauses but it's been improved significantly since then and now all the same topics are coming up with GO.
This is not to say that no project benefits heavily form C++/Rust. But I would argue that for many, GC is the best trade off.
But there are definitely projects that require explicit memory management, and it's not just games and realtime software. Often high-performance backend code in Java and Go just end up using object pools instead of reallocating objects, just as the OP described.
With Go specifically we've seen the rise of fasthttp, which just adds completely manual memory management in the 90's C++ fashion. Want to create a new request object?
req := AcquireRequest()
Compare to C++98:
Request* req = new Request();
And now you're back at the same manual memory management problem modern C++ and Rust are striving to solve.
Similarly, high performance Java libraries like the Disruptor, SBE or Chronicle look very much like C code.
Personally, that doesn't bother me, as it allows you to write your hot path and your non-optimized path in the same language with the same tooling.
says the guy who has split JVMs across processes for performance and contemplated doing it per core
RAII of course deals with more than just memory, but in a thread about GC I assumed it was memory management you referred to.
Unless you design and implement GCs yourself, it's not supposed to be interesting to you anyway. It's just something that will benefit users of the language, not something to excite them.
Because they care about improving actual, existing, languages, with actual, existing, ecosystems, not doing cutting edge academic memory management research.
>and they all seem to be relearning and resolving the same set of problems.
So like architects relearn and resolve the same problems, about building bridges, skyscrapers, condos etc -- instead of designing some new structures to replace them?
Ignoring the last part that obsesses over the glory of OOP, replacing Java with Go in that page is... pretty spookily familiar!
There is no "less is more" approach in Go. It's more like you can't write something really complex in Go so people use it for trivial things like servers that do almost nothing aside from de-serializing JSON. Try write a large LOB app in pure Go or a fully featured CRM. And see if you can get away with "less is more" when you need to reason about complex business rules, data validation, complex routing, mapping RDBMS data to values, and what not. "less is more" is a mirage. Go short comings will show up pretty fast.
At codebeat (codebeat.co) we use Go for our backend - very CPU-heavy, complex static analysis workflows. Our frontend is in Rails which is not ideal but probably the best bang for the buck for an early stage startup. This is the beauty of having many tools to choose from.
C is 30 years old, so it has an excuse. Go has none. The fact that it's extremely difficult to write a classic, complex webapp in Go is a proof that this language has serious flaws.
Have you recently checked out the bigger projects that are currently being written in Go? You'd be surprised...
> Try write a large LOB app in pure Go or a fully featured CRM.
Woah. Have you tried doing that in C, C++ or Rust? Has anybody? Every language has it's strengths and weaknesses. Sure it's possible to do so in them - but is it a good idea? Not necessarily. I'm not going to write a database engine in Python - but we have timeseries databases being written in Go.
> Go short comings will show up pretty fast.
Every language has shortcomings. Go's major thing seen as a shortcoming is the classic "lack of generics", which arguably is true to some extend - but not it's GC. The thing is - Go's strong points have become clear long before these shortcomings you're talking about. The entire ops-space jumped on it because it solved a few problems plaguing their tools: memory overhead, slowness, dependencies, hard to make portable. Pretty much every major new project related to infrastructure is created in Go.
One of the biggest attractions of Go is it's ability to create programs that perform a lot better than the same thing written in Ruby or Python, which then again allows developers to undertake more ambitious projects.
And yet, the majority of services the world relies on everyday, from OSes and drivers, to Google search, NASA code, medical devices, etc are written in C/C++.
And the improvements to Go that they drove will help everyone.
I happen to prefer C, but I understand why they did it the way they did it.
Even distilling better down to just the max throughput you can get for a solution in a language vs another is hard to do as a lot also depends on how the code ends up being written and how easy you want to be able to debug that solution. You can solve this stuff in C many ways with different performance characteristics.
And the automatic memory management is great, but the above commenter was saying that if you're going to extreme lengths to work around the automatic memory management, maybe you needed a non-GC language in the first place.
Until you ask them to do stuff with channels -- where Go offers 100s of subtle ways to shoot yourself in the foot.
Software should serve people and they eradicated countless memories/achievements, eliminated a priceless historical record.
I don't mean to diminish how untenable the previous situation was, and I'm sure I'm underestimating the difficulty/cost of what they ended up doing. I appreciate their work, engineering, and use the service regularly. But it's an "Our Incredible Journey" part of their story and I don't want to let them off the narrative hook for it. They made ~$1b on this content, after all.
We really do care about the QOS of users and are constantly working to improve service to users everywhere in the world. Sadly there are many constraints out side of our control that can cause bad service. The information our users are kind enough to provide to us can often help us identify problems and reduce issues.
I've found that I can almost always improve the smoothness of their content by using Livestreamer to play it in VLC (or Kodi, more often)
If so, then that's news to me; other video sites that require Flash usually a) show a "you need Flash" message in the place where the video whould show, and b) don't show playback controls, because they're part of the Flash component itself. Also, I never saw any mention of Flash in any of the site's help/troubleshooting/FAQ documents.
I think Safari (+iOS) works without Flash, but everyone else is relegated to the Flash player.
Controls arein HTML5, just the actual video handler appears to be flash.
In the past (probably 2? years ago) the entire player including controls was part of flash.
Since posting my last message, I looked at the documentation on the website again, and saw that it claimed that you could use the site on iOS and Android by just using an ordinary browser. So I tried visiting it in Safari on my iPhone, and the videos worked. Then I tried using Firefox on my Kindle Fire, and the videos didn't work. But they do have a dedicated Twitch app for the Kindle, so I downloaded that. So now I have a way of watching Twitch videos. :-)
The fact that Flash is not mentioned anywhere on the site as being a requirement seems like a glaring omission.
Firefox does not support HLS in either desktop or mobile version. It requires Flash as a last-ditch fallback on any platform that doesn't support HLS, afaik.
Go on the other side has structs as values, so the memory layout is much easier for the GC. Go always performs full GCs, but mostly running in parallel with the application, a GC cycle only requires a stop-the-world phase of a few milliseconds (for multi gigabyte heaps).
All these numbers of course depend a lot on what your application is doing, but overall Go seems to be doing very well with its newest iterations of the GC.
Of course, most of the old-gen GC work in G1 is also done in parallel with the application, too.
Did you want to write concurrently? If so that would be wrong because evacuation can't be done concurrently with the application in G1, only initial marking.
This is in contrast to something we've probably all done at one point or another, which is just to add a checkbox to avoid having an argument about what the behavior should be. They're committing to having the argument out instead of "just adding knobs".
They also have a track record of, for better or worse, just refusing to add knobs and telling you to either do without or use a different language. If you've got an intensely GC-based workload, I'd consider using something other than Go. (However, bear in mind what may be an intensely GC-based workload in Java may not be in Go, since Go has value types already.)
It's well-known after over a decade of research and deployments in GC's that certain styles match certain workloads better. So, multiple ones should be available. This can be a small number that are largely pre-built with sane settings. What's left to tune can likewise be small: pause time, max memory, or whatever. There can also be a default as in current Go that covers 95% of apps well. The result is that specific apps or libraries if they went that far can have GC well-suited to their requirements with about one page of HTML describing what those GC's do and how to choose them.
That's what they should do. It will be easy for them and developers. Nothing like JVM mess. Still avoids one-size-fits-all: longest-running, failed concept in IT. Meanwhile, I can't wait to see someone make a HW version of their GC like I've seen in LISP and RT-Java research. IT would be badass given the current metrics. Allow whole OS to be done memory managed like A2 Bluebottle Oberon without performance penalty.
Most general I saw was in a Scheme CPU where the designer put the GC in the memory subsystem. The Scheme CPU would just allocate and deallocate memory. The GC tracked what was still in use on its own in concurrent fashion. Like reference counting I think. Eventually, it would delete what wasn't needed. Pretty cool stuff.
For example Azul's C4 garbage collector which they claim is pauseless: https://www.azul.com/resources/azul-technology/azul-c4-garba... ; a pauseless GC is great if you want to tackle real-time systems. For real-time systems actually most garbage collected platforms are unsuitable.
But even more problematic is that stop-the-world latency is directly proportional to the size of the heap memory and today's mainstream garbage collectors cannot cope with more than 4 GB of heap memory without introducing serious latency that's measured in seconds. Think about that for a second - with most GC implementations you cannot have a process that can use 20 GB of RAM, which is pretty cheap these days btw. So keeping a lot of data in memory, like databases are doing, is not feasible with a garbage collector.
As far as I can tell Azul's collector claims to be pauseless because they use the new x86 nested page tables (https://en.wikipedia.org/wiki/Second_Level_Address_Translati...) to implement a read barrier (interesting aside: this means it should be possible to implement a read barrier on CPUs without nested page tables by moving the GC into the kernel). Here is an interesting discussion: http://stackoverflow.com/questions/4491260/explanation-of-az...
That still does not mean that C4 is necessarily real-time. You have to take a fundamentally different approach to GC to guarantee real-time bounds (see these papers on the Metronome collector: http://researcher.watson.ibm.com/researcher/files/us-bacon/B... https://www.cs.purdue.edu/homes/hosking/690M/ft_gateway.cfm....) and that comes with a restriction that ties your program's allocation rate to the scheduling of the GC. I am still skeptical about this - it is easy to imagine coming up with an adversarial allocation pattern that breaks time bound guarantees because of some detail of the GC implementation, so both the algorithm and every implementation will need proofs.
> So keeping a lot of data in memory, like databases are doing, is not feasible with a garbage collector.
It is very feasible if you do not make garbage. Either mmap some memory that the GC won't touch or pre-allocate large arrays of primitive types.
Well, I have extensive experience with tuning G1. G1 is a good GC, capable of low latency incremental pauses.
The problem is that with a stressed process, at some point G1 still falls back to a full freeze-the-world mark-and-sweep. For 50 GB I've seen the pause last for over 2 minutes !!!
That said, for heaps above 32ish GiB, we still go with our tuned CMS settings and overcommit one or two additional memory modules. It's a lot cheaper than the time it takes trying to tune in G1 on a large heap with a lot of gc pressure.
Golang apps tend to happily run with less than 100Mi, so are well suited as daemon processes that don't get in the way.
However if you need to support a large amount of dynamic state (> 1Gi), the hotspot GC is very difficult to beat.
If your machine actually does have gobs of free RAM, it therefore makes sense for Java to use all of it.
If your machine has gobs of free RAM you were planning on using for something else after your Java app started, well, that's something the JVM couldn't know. Some versions (on Windows?) monitor free memory and adjust down its own usage if you seem to be consuming the headroom, but on other platforms, you just have to tell Java it's got a limit and can't go beyond it.
It won't guarantee it, just tires to size things (eden space, survivor spaces) and time things to meet its target.
But it's a fickle beast. And usually it requires a lot of tinkering with the code for it to be able to meet it. And then it's easier to disable ergonomics, set fixed sizes, and just enjoy how blazingly fast CMS is, restart the app every few weeks (CMS heap fragmentation), and try G1 with every new point release, maybe finally it beats CMS.
1. It uses about an order of magnitude less memory than Java.
2. It openly proclaims <10ms STW pauses for GC.
Google wrote a paper comparing C++, Java, Scala, and Go and definitely did not find that (https://days2011.scala-lang.org/sites/days2011/files/ws3-1-H...).
I like Go and it has many wonderful qualities. Still, it's important to be realistic.
Programs holding GBs of data in arrays would look much closer, though, I imagine, as the overhead would be dwarfed by the data itself.
For me using Java 1.8 the number is around 1.5MB. Something similar in Go is 60KB
Do you know of any real-world examples where the lack of compaction is impacting usage of Go ?
That's why you don't do it often.
> Do you know of any real-world examples where the lack of compaction is impacting usage of Go ?
The biggest problem with all nongenerational GCs, including Go's, is lack of bump allocation in the nursery. You really want a two-space copying collector (or a single-space one) so that allocation in the nursery can be reduced to 3 or 4 machine instructions. By allowing fragmentation in the nursery, Go pays a heavy cost in allocation performance relative to HotSpot.
You need to do it when you become too fragmented (or suffer the same potentially poor allocation performance as Go), how often that happens largely depends on what the application is doing.
>including Go's, is lack of bump allocation in the nursery.
Yes, but as I recall this is in the future roadmap for consideration/attempt.
And again, as with everything it's not a silver bullet, as you sacrifice the high cost of promotion (again expensive moving of memory) in order to have very fast allocations while the nursery isn't fragmented or full.
>Go pays a heavy cost in allocation performance relative to HotSpot.
But not the cost of compaction/promoting, which are also heavy when they need to be performed.
That said, I personally believe a bump allocator with generational copying will be a 'net win' if implemented in Go's GC, but all things considered I'd rather see some cold hard numbers confirming it.
Not according to the transactional collector proposal. By not unconditionally tenuring young objects, it sacrifices one of the main benefits of generational GC: bump allocation in the nursery.
> And again, as with everything it's not a silver bullet, as you sacrifice the high cost of promotion (again expensive moving of memory) in order to have very fast allocations while the nursery isn't fragmented or full.
You're questioning the generational hypothesis. Generational GC was invented in 1984. Whether generational GC works is something we've had over 30 years to figure out, and the answer has consistently been a resounding "yes, it works, and works well".
> all things considered I'd rather see some cold hard numbers confirming it.
Again, we have over 30 years of experience. Generational GC is not some new research idea that we have to try to see if it works. The odds that things will be different in Go than in the myriad of other languages that preceded it are incredibly slim.
That's hardly the end all of Go GC development, also as I understand it's not even certain it will be used in a Go release as it depends on it actually showing the benefits aren't just theorethical.
>Whether generational GC works is something we've had over 30 years to figure out, and the answer has consistently been a resounding "yes, it works, and works well".
This was not about generational GC's 'working' or not, it was if it is the best solution for the typical workloads of Go applications.
From what I understand, the upcoming transactional collector is written directly with Go's goroutines in mind.
I'm pretty skeptical that it will produce wins over a traditional generational GC. At best the "transactional hypothesis" will roughly approximate the generational hypothesis, without the primary benefit that truly generational GC gives you, namely bump allocation in the nursery. Time will tell.
Further, depending on your data access patterns you can see data access start to degrade over time as well because the memory locality is worse.
GC benchmarks are great at showing how well 1 part of memory management is behaving (ie the deallocation step) but it doesn't do much for talking about the other 2 parts, allocation and access.
That said, I use Go lang every day and the GC improvements to date have been great, especially given the kinds of memory patterns lots of the services I write have (small, short lived items that aren't really connected to each other). But there are definitely memory patterns where Hotspot will smoke the golang memory system and that doesn't begin to describe something like Zing.
And still I have heard the one of the best way to control GC in many trading systems where Zing might be popular is to just provision 100s of GBs of memory heap and simply restart server once trading day is over.
Now that I'm writing high throughput systems in go I use many of the same techniques that I did writing low latency systems on the JVM (arena allocation, memory locality, etc). This is because the other 2 parts of memory management, allocation and access, continue to be major drivers of performance even though the deallocation step is fundamentally different.
That is to say, GC times are not the only thing that matters when it comes to memory management and it is a relatively straightforward tradeoff between deallocation and allocation that the current golang GC is making.
Not sure how it could be done but having some numbers on this would be great.
"The results are only really meaningful together with a specification of how much memory was used. It is possible to trade memory for better time performance. This benchmark should be run in a 32 MB heap, though we don't currently know how to enforce that uniformly."
For some years, the benchmarks game did show an alternative task where memory was limited -- but only for half-a-dozen language implementations.
Figuring out an appropriate max heap size for each program was too hands-on trial and error.
"This is no substitute for real applications. No actual application is likely to behave in exactly this way. However, this benchmark was designed to be more representative of real applications than other Java GC benchmarks of which we are aware."
Java's GCs make no concrete claims because they scale from tiny to very large heaps with vastly different object populations and root set sizes.
Some java applications run with 100GB heaps or on 128-core NUMA machines with lots of threads.
10ms pause times are achievable with "modest" heap sizes (~single-digit GBs) if you have some cores to spare for a concurrent collector to do its work, well, concurrently.
If you don't have enough spare CPU time or have a larger heap or a workload without enough breathing room then it would be silly to make such guarantees.
Of course they could easily write "<10ms STW pauses. sometimes. read the fine print"
We've a culture of being willing to try new things at Twitch. When our twisted-python chat system no longer met our needs of being easy to iterate on we decided to rebuild it; it was a monolith and we decided to chunk it up to reflect needs of our users and the pace at which we could develop new features. Notably we wanted to no recycle TCP connections whenever a new feature was added (which was a short coming of the twisted-python solution - along with a bunch of global state that was becoming hard to reason about). As part of this re-work we had a pub-sub portion which was super simple and we decided to try this new exciting language with a lot of promise out on it - it worked amazingly well. Over the course of another year or so we ended up rebuilding all of the components in Go.
When we first evaluated rebuilding chat we assessed a few options:
- nodejs (we started with this, but random crashes and poor tooling at the time didn't work for us)
- erlang (notably could we use ejabberd as the hub of the system)
Ultimately we chose python because we knew python and we needed this to work right now. The move to go happened incrementally thereafter and was driven by:
- increase in trust
- great tooling
None of this can be pitched as "Go vs X", it is purely a tools and expediency orientated set of decisions.
So with the Go server, you're able to redeploy without closing open connections? Do you just run multiple versions in parallel and load balance over to the new version once connections close, or something else?
This allows us to almost never deploy changes to the first service, while frequently making changes to the second system. Of course when you do want to make changes to the first you have to reestablish all the TCP connections again, but if you engineer it correctly you can do it infrequently enough to be worthwhile.
Disclaimer: I don't actually work on the chat team, this is based upon various conversations with people on the chat team and may be incorrect in some specifics or out of date.
How is that different from NIH syndrome?
Every time I suggest bringing a language like Scala or Clojure into the mix (where they would provide real benefits over Java), I always get the "And where will we find programmers to maintain the code you write?" line from management. The answer, of course, is that there are likely legions of programmers like me, who hack around with FP languages in their spare time but whose only 'professional' experience is in mainstream languages.
I suspect the real reason is that most management is just too risk-averse to consider using technology that isn't mainstream.
I did that to some extent.
> i think this is what inhibits functional programming in general
Yes, functional aspect was the harder part to learn. It wasn't the syntax, which is what most people mention.
The other part that is hard is to learn to use concurrency construct -- actors. But Go has the same problem, solving problems with goroutines and channels is just as much of impedance mismatched as using actors.
Finding people should not be too hard. We were able to find a guy in Padova, Italy, who dove in and got started without much trouble as I was leaving.
In my experience, learning Go (and by that I mean fully grasping the ways of Go, goroutines, channels, selects, interfaces, type switches, etc) takes at least a year for someone whose background is C/Python/Ruby. Then may be it's just me.
I am polyglot (i write in different langauges) and it took me couple of weeks to master Go.
Read "The Go programming language" book, it's really well written and it touches everything you need to know about Go (or most).
The time to go from python dev but never touched go to working on a go code base is measured in weeks in my experience.
I'm pretty decent at C, C++, Python and Bash scripting, have participated in larger projects in Java, Perl, Pascal/Delphi and Ruby, and have toyed around with Rust, Haskell, Clojure, Angelscript, Crystal, Lua and probably a bunch more that I forget.
Go for me was a breeze, everything just clicked. It helps that it got a lot of it's inspiration from other languages I already knew pretty well. When started toying around with Haskell for example, this wasn't the case, it took me quite a while to get me up & running with the basics, and I still don't think I know basic Haskell. Go on the other hand was easy, and within a week I was diving into the stdlib sourcecode.
For reference, I felt comfortable in Java, Scala, C# and Perl all faster than Go.
If I had to do some parallel data crunching, I would probably use Go or something similar. To write an actual system, it's much easier just to stand on the shoulders of Erlang guys instead of developing everything by hand(i.e. whole supervision tree).
The problem with being a language advocate is than when "competing" languages improve you start thinking it's bad news for your side. I try to avoid that. If Go improves it's good news for everyone, if only because we all benefit from stronger competition.
My point was that strength of Erlang/OTP was never in processing power/speed, but in designing the runtime so it actually solves most problems regarding distribution. Go, as far as I understand it, was created with different goal in mind - to enable fast and parallel processing. It does not make it better or worse, just different. What I'm saying is that solving garbage collection issue (and only partially, when we're at it) is not what makes in competitive in comparison to Erlang, because Erlang was designed with totally different goal in mind.
The thing is, you choose right tech for the job and then reserve some time to bring everybody up to speed. From my own experience it is much cheaper than trying to use already know tech in ways it was not designed to work.
The flip side of that coin is you're liable to get someone reinventing the wheel - poorly - in whatever language doesn't have all those goodies.
"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang."
Edit: I'll add, though, that the Go people are pretty smart and seem like they're doing good things, so I wouldn't be too complacent in thinking Erlang is the only game in town. It still does get some things right that are hard to replicate in Go, though.
Today, for companies making a similar decision now, that argument is a bit different. Go 1.6/1.7 obviously has massive improvements in the areas the article outlines. But, in Erlang camp, we have Elixir making that more enticing.
I would argue Twitch made the right choice. They will have a magnitude easier time finding devs to support a Go system over an Erlang system. And their product never suffered for it. And they are clearly a force behind making Go better, which has helped more people than just them.
Odds are, Twitch wasn't trying to optimize over a long time horizon when they chose Go. They were once a scrappy startup, surely accumulating technical debt left and right to get product features out. Go was likely a locally optimal choice.
Twitch chat also requires heavy string processing, and that's an arena where, if I had to guess, Go has an edge over Erlang.
When people talk about Go's GC freezes, they're talking about the spinup/spindown time before the async GC kicks in. That part is incomparable to Erlang, but its a part which has gotten much faster recently, specifically through virtue of becoming smaller.
They resemble generational GC more than anything. Generational GC has some of the advantages of Erlang (though I think the traditional HotSpot generational GC will end up working better than the one Go is going with) in the minor collections, but not in the major collections.
I actually have experience that looping over all Erlang processes and running a GC on each of them is definitely human-clock-time orders of magnitude slower than a Go garbage collection across a similar set of data. But who cares? First of all, that was a bit of a desperation play on my part anyhow, run for diagnostic purposes in the REPL, not an operation you do all the time, and secondly, only one process at time was frozen then anyhow, so I didn't care that it took about 10 seconds. It didn't take my service down.
Which was my point in the first place, that "faster" and "slower" don't really apply here, because what they're doing is so different from each other. There's too many different possible definitions of faster. And you have to be careful to use one that matters to your code, not just an artificial benchmark that shows your preferred choice in the better light.
(For those who may be curious, the problem that led me to that play was some now long-fixed issues with large binaries.)
The two main reasons I remember are A) it was hard to maintain a group of engineers with acceptable competency in Erlang over time and B) the C++ code offered faster and more consistent performance albeit with somewhat less scalability in terms of sessions per host. We just added X% more servers to the channel pools and were happy to have the chat services in a language where more FB engineers could contribute. There's been a lot more changes in the architecture than just moving to C++ though, so it's hard to do a direct comparison between the products.
This doesn't take anything away from WhatsApp though, who has built a strong product and infrastructure on top of Erlang.