GC does not mean your memory management is solved. It means that your memory management works well enough for development and sometimes for undemanding tasks. That said, generational GC can be like mediocre buffer caching "for free." But this still doesn't make all explicit buffer caching go away.
Given that buffer caching is "the techique that never goes away," why isn't some facility for handling this built into the runtimes of languages with GC? You could tell the GC that this particular type of object needs to be cached, with a cache of this size and growth policy, and then programmers never have to write and debug code that does this.
EDIT: Another way to think abut this: because GC is designed to be universal and foolproof, it's easy to get GC into a corner case where its behavior is very suboptimal. In the case of "memory pressure" you can think of this as the GC having to scan the graph of references way more often than it really needs to. However, the optimization techniques for many of these are fairly generic. What if optimization techniques could be built into languages and runtimes, then instrumentation could be used to empirically determine what optimizations are actually needed, then these could be simply "switched on" for particular classes or types.
Well, of course. People writing their own logically implies that such can exist as library code. But if you build in support from the runtime, then it might be possible to squeeze out a bit more performance. For example, perhaps it's possible to tell the GC that certain objects don't have to be considered as roots. (And to the extent that this is possible with current libraries, it's because there is support built into the runtime.)
EDIT: Also, having these mechanisms as library code probably opens up more opportunities for introducing implementation errors.
So basically, take something that's a programming problem and reduce it to "profile, then throw some switches."
My experience from embedded systems is that buffer rings are great if you can predict two or three common sizes that cover your typical allocation needs.
I think even for people that work mostly in GC environments it's a good idea to read through the basic mechanism of heap management and allocation strategies. This article is a good intro to the basic idea and I like the charting and analysis being done.
What I'm saying boils down to: Free lists are definitely useful, but the GC could be better.
But even Java, which has what is generally regarded as the most advanced GC ever written, still has lots of applications doing the free-list of ByteBuffers or even better, the free-list of off-heap-allocated DirectByteBuffers which don't have to be moved during tenured generation compactions.
It seems pretty reasonable to me to have one or two freelists at the spots you're frequently allocating 64kb buffers, along with using the GC for the other 5,000 allocations in your project.
Let me put it another way: Go's GC is still in its infancy compared to those in most other languages. It's perfectly fine to use, but there are some big things (e.g. no compaction) that stick out when using the language, whether you're working with free-lists or something else.
That's a dicier assertion to make versus other languages, notably Java, but if you're coming to Golang from a high level language like Python, you're probably going to be happy with the performance you get out of naive code.
I love Go, but "it's fast" isn't a good reason not to improve a subpar GC. (Which fortunately isn't a position the Go team is taking. I know everyone there would like to see a lot of interesting things happen with the GC. It just hasn't been one of the more pressing issues so far, and I've agreed. It's also a very difficult and "boring" thing to work on for most people, so it's not surprising that somebody hasn't just added an Azul C4-style concurrent GC in their spare time.)
Java HotSpot's collector makes the most sense IMHO: stop the world compacting in the young generation, mostly concurrent in the tenured generation. It achieves better throughput than Azul C4 too, according to the C4 paper.
A while ago, I read through the kernel patch they provided, and it was basically adding batch APIs for mremap so you don't have to have a TLB flush for every call. Also because all mutator threads needed to be paused at a safepoint while the remaps happen, so the batch API has much shorter pause time.
I would pay throughput to achieve (soft) real-time GC.
The problem with GC's languages, is that GC is "good enough" for prototyping and for undemanding stuff, but there are still these corner cases. Really, it's no different than other forms of memory management in this regard. It's just particularly beginner friendly.
If I recall correctly, there are still some very strong warnings against running Go on memory-constrained systems (< 1 GB).
However, if performance is a concern, then picking Go seems to me like quite a step back from say Ruby.
There are other alternatives as well, that are higher level and that are much better at performance than Ruby/Python. E.g. Java, C#, Scala, Clojure, Ocaml, Haskell.
Personally I love Scala and the JVM, but the one thing I don't like is the GC. The JVM has the best GCs ever, but sometimes you just want to do without one at all.
This is why I don't like Go. While it's a tasteful language, it's too low level, but it still depends on a GC (that's not precise, compacting or generational).
Mozilla's Rust is much more promising for me.
It's nice not to have to worry about freeing allocated memory but if you write a lot of low-level code there's a certain level of unease about letting the language or OS take on something that can have such a large performance impact.
I've made no claims of it being superior, and in fact, am largely using Go myself these days.
I'll take stuff I don't want to mess with for 300, Alex.