Hacker News new | comments | show | ask | jobs | submit login
Recycling memory buffers in Go (cloudflare.com)
150 points by jgrahamc 1305 days ago | hide | past | web | 42 comments | favorite

Buffer caching, the technique that never goes away.

GC does not mean your memory management is solved. It means that your memory management works well enough for development and sometimes for undemanding tasks. That said, generational GC can be like mediocre buffer caching "for free." But this still doesn't make all explicit buffer caching go away.

Given that buffer caching is "the techique that never goes away," why isn't some facility for handling this built into the runtimes of languages with GC? You could tell the GC that this particular type of object needs to be cached, with a cache of this size and growth policy, and then programmers never have to write and debug code that does this.

EDIT: Another way to think abut this: because GC is designed to be universal and foolproof, it's easy to get GC into a corner case where its behavior is very suboptimal. In the case of "memory pressure" you can think of this as the GC having to scan the graph of references way more often than it really needs to. However, the optimization techniques for many of these are fairly generic. What if optimization techniques could be built into languages and runtimes, then instrumentation could be used to empirically determine what optimizations are actually needed, then these could be simply "switched on" for particular classes or types.

That shouldn't usually require any explicit support from the runtime; it can exist as library code (and does - for instance Apache Commons: http://commons.apache.org/proper/commons-pool/ ).

> That shouldn't usually require any explicit support from the runtime; it can exist as library code

Well, of course. People writing their own logically implies that such can exist as library code. But if you build in support from the runtime, then it might be possible to squeeze out a bit more performance. For example, perhaps it's possible to tell the GC that certain objects don't have to be considered as roots. (And to the extent that this is possible with current libraries, it's because there is support built into the runtime.)

EDIT: Also, having these mechanisms as library code probably opens up more opportunities for introducing implementation errors.

Generic buffer caching support looks an awful lot like generational GC. It's faster when you hand-roll it because you know how you're going to use the buffers and which buffers are going to be reused.

I'm proposing something that's not quite as foolproof and automatic as generational GC, but more optimal and a lot quicker and safer than hand-rolling. Also something for which profiling is enough to "know how you're going to use the buffers and which buffers are going to be reused."

So basically, take something that's a programming problem and reduce it to "profile, then throw some switches."

I guess I'm just not seeing how that's going to result in anything different than a userspace freelist library.

If you do everything correctly, not too much. But you can't assume the former. Also, having the thing baked into the language/runtime will encourage use. It's along the lines of what happens with refactoring facilities that aren't part of the browser/editor/ide. They stay "that esoteric thing over there" and don't get used that much.

Is the Go GC Collector not good enough so you have to resort to this? Would be good if you had the option (like in Java) to specify heap min and max sizes - and if set to the same value your heap would never resize.

Maybe I didn't read the code right but where does it check whether a retrieved buffer is the right size? The makebuffer() routine appears to generate random size buffers but that length then doesn't get checked.

My experience from embedded systems is that buffer rings are great if you can predict two or three common sizes that cover your typical allocation needs.

I think even for people that work mostly in GC environments it's a good idea to read through the basic mechanism of heap management and allocation strategies. This article is a good intro to the basic idea and I like the charting and analysis being done.

The sample code does not check the buffer size. This is a slightly artificial example to show the technique and have some interesting graphs to back it up.

There's a proposal to add something like this in Go itself: https://code.google.com/p/go/issues/detail?id=4720

unfortunately, it looks like it isn't going anywhere right now:


This is nice, but I can't help but feel it's a little wrong that we're still making free-lists like in the olden days.

You absolutely don't need to do any of this to write Go programs that outperform Ruby or Python.

Generally, sure. But it's still pretty easy to write garbage-generating Go applications that exhaust memory because the GC isn't compacting. In some applications this is a nobrainer: use a free list. In some others, like when you don't actually know what kind of buffer size you'll need, it's not so easy (although you could argue you should just fixed-sized chunks for whatever you're doing, and recycle those, but that's a lot of work.)

What I'm saying boils down to: Free lists are definitely useful, but the GC could be better.

"Could be better" is of course true for every single situation, no matter how good.

But even Java, which has what is generally regarded as the most advanced GC ever written, still has lots of applications doing the free-list of ByteBuffers or even better, the free-list of off-heap-allocated DirectByteBuffers which don't have to be moved during tenured generation compactions.

It seems pretty reasonable to me to have one or two freelists at the spots you're frequently allocating 64kb buffers, along with using the GC for the other 5,000 allocations in your project.

Sure. No disagreement.

Let me put it another way: Go's GC is still in its infancy compared to those in most other languages. It's perfectly fine to use, but there are some big things (e.g. no compaction) that stick out when using the language, whether you're working with free-lists or something else.

I think you're going to find that even with no attention to allocation at all, your Golang programs are going to outperform their Ruby and Python equivalents.

That's a dicier assertion to make versus other languages, notably Java, but if you're coming to Golang from a high level language like Python, you're probably going to be happy with the performance you get out of naive code.

You're right (except for some cases like crypto and regex where other languages "cheat" by using C bindings or unstable versions), but that isn't where it has bit me. In general all my Go programs have been more than fast enough, but on several occasions I've had memory starvation problems that I would not be having in other languages (even if there is a way to do it without doing a lot of allocation and fragmentation in Go.)

I love Go, but "it's fast" isn't a good reason not to improve a subpar GC. (Which fortunately isn't a position the Go team is taking. I know everyone there would like to see a lot of interesting things happen with the GC. It just hasn't been one of the more pressing issues so far, and I've agreed. It's also a very difficult and "boring" thing to work on for most people, so it's not surprising that somebody hasn't just added an Azul C4-style concurrent GC in their spare time.)

Azul C4 requires a kernel module or a specialized kernel to support all the MMU tricks that it relies on to achieve maximum performance. I don't think it's really an appropriate choice for general purpose GC in a language that's designed to be easy to deploy.

Java HotSpot's collector makes the most sense IMHO: stop the world compacting in the young generation, mostly concurrent in the tenured generation. It achieves better throughput than Azul C4 too, according to the C4 paper.

I vaguely remember reading somewhere that they figured out a way to run their collector on stock Linux; I think the price they paid in exchange was (even) lower throughput (trying to track down reference so I know I wasn't daydreaming) (yes, I daydream about GCs).

A while ago, I read through the kernel patch they provided, and it was basically adding batch APIs for mremap so you don't have to have a TLB flush for every call. Also because all mutator threads needed to be paused at a safepoint while the remaps happen, so the batch API has much shorter pause time.

Neither of HotSpot's collectors guarantee maximum latency, though, which is Azul C4's interesting feature.

I would pay throughput to achieve (soft) real-time GC.

> In general all my Go programs have been more than fast enough, but on several occasions I've had memory starvation problems that I would not be having in other languages

The problem with GC's languages, is that GC is "good enough" for prototyping and for undemanding stuff, but there are still these corner cases. Really, it's no different than other forms of memory management in this regard. It's just particularly beginner friendly.

Yes, Golang is very young and its gc is one of the more 'immature' portions. However, in comparison to the tools that it is regularly displacing the overall speed is still a huge win. Eventually, Golang will get a better gc and in the meantime we can already enjoy faster programs. Nobody is arguing that the gc of today should be Golang's gc forever.

Have you noticed the starvation in 64 bit programs or 32 bit programs?

Both. They weren't issues with precision/no gc at all, but that the arena contents became fragmented, as far as I could tell.

Just to clarify, by outperform, are you speaking wrt. memory usage?

If I recall correctly, there are still some very strong warnings against running Go on memory-constrained systems (< 1 GB).

People pick Ruby and Python without having unrealistic expectations for performance and throughput. We already know their implementations suck and being the high-level languages that they are, we know how difficult they are to optimize.

However, if performance is a concern, then picking Go seems to me like quite a step back from say Ruby.

There are other alternatives as well, that are higher level and that are much better at performance than Ruby/Python. E.g. Java, C#, Scala, Clojure, Ocaml, Haskell.

Personally I love Scala and the JVM, but the one thing I don't like is the GC. The JVM has the best GCs ever, but sometimes you just want to do without one at all.

This is why I don't like Go. While it's a tasteful language, it's too low level, but it still depends on a GC (that's not precise, compacting or generational).

Mozilla's Rust is much more promising for me.

I agree, given the type of issues they are addressing (very high throughput application servers), I don't think a general purpose allocator (garbage collected or not) will ever get the performance they want. Recycling buffers is not a bad strategy for this type of application, no matter the language.

When malloc() has to deal with defragmenting allocations of varying length, I believe it does it by allocating from several pools each holding blocks of a fixed size, picking the smallest category that will fit. It wouldn't be hard to extend the buffered-channel technique to do this.

The olden days are not that old. I find myself using static arrays, free lists, malloc and its OO brethren as well as GC when available. All of those things have been around for as long as I can remember.

It's nice not to have to worry about freeing allocated memory but if you write a lot of low-level code there's a certain level of unease about letting the language or OS take on something that can have such a large performance impact.

This technique puts a lock in your serving path which will necessarily limit parallelism. Go channel operations are not "non-blocking" in every sense. Even if you have them in a select block your thread has to take the mutex to find out if the buffer is empty (on receive) or full (on send).

FWIW: Node.js uses a buffer cache like this by default: https://github.com/joyent/node/blob/master/lib/buffer.js

It's a good thing Node.js doesn't have to worry about synchronization between threads... :)

Would the down-voters care to explain? I've pointed out how another platform deals with allocating raw memory buffers as it's relevant to the discussion.

I've made no claims of it being superior, and in fact, am largely using Go myself these days.

Using a shared buffer list isn't very efficient in a language with concurrency since you have to synchronize to take from/add to the list.

All heap access must be synchronized. Using pooled buffers may actually be faster because you can ensure that only one thread is accessing a specific pool. Just create a new pool, when a new thread is created.

Or, you know, each thread could allocate only when it needs something and handle its own recycling. Node.js has a particular advantage in that it isn't concurrent in any way (which, of course, has many other downsides.)

You can certainly build concurrent systems on top of Node, what it doesn't have is preemptive multi-threading, which sucks IMHO, but speaking of Go I also don't like that it does M:N multi-threading instead of 1:1 like the JVM, as the scheduling done is suboptimal.

Oh man, I'm pretty keen to write many types of subsystems, but thread scheduling?

I'll take stuff I don't want to mess with for 300, Alex.

> You can certainly build concurrent systems on top of Node

You sound so confident. Do you have any example of a system in JavaScript that has concurrent semantics?

coincidentally …. this is how we pooled the buffers on Vine for Android to make Camera initializations efficiently.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact